Let AI lead your code reviews

Published April 19, 2024

Crafting AI Agents: Real-World Lessons from the Front Lines

As many of us start diving into the world of AI agents, it’s increasingly clear that the path to creating effective agents may not be so simple. It’s a far cry from the launch of ChatGPT when so many of us thought that AI would soon be able to do everything and we were fast approaching AGI. At Bito, our foray into developing these intelligent agent systems has been a real journey of discovery and learning – seriously a lot of learning! This isn’t just about pioneering cool new technology; it’s about building tools that work reliably under the stringent demands of enterprises.

Let’s talk about what we’ve learned along the way, from ensuring consistency in AI responses to navigating the pitfalls of complex reasoning, and how a blend of traditional coding and AI might just be the secret to success.

The Quest for Consistency

Consistency is the cornerstone of trust in any AI implementation. Imagine an AI that reviews code — one day it flags a security issue, and the next, it overlooks the same flaw. This kind of inconsistency is a deal-breaker in a business context, where reliability isn’t just preferred; it’s imperative. Our firsthand experience underscored this when we launched our first code review agent. The probabilistic nature of AI, while a marvel in many respects, poses significant challenges here, demanding innovative approaches to ensure predictable, reliable outputs. We found for example, to reduce variability we needed to reduce the temperature (the variability/creativity of the LLM) and provide more code context. This helped the LLM have more data to predict from and improved the consistency of its recommendations.

Hallucinations – Imagination Run Amok

In an enterprise context, business users are not excited to see AI outputs that are completely made up or based on entirely wrong assumptions that the AI made. For example, in the developer context, asking the AI what a function does and getting a completely fictional response that is completely incorrect is simply not acceptable. It both erodes trust in any output the tool is providing (now you don’t know which outputs you can trust and which you can’t), and if you are making any downstream decision making from the output it can really set off a bad chain reaction. At Bito we found that the two best ways to combat this were 1) prompt engineering, and 2) additional context. Prompt engineering was used to encourage the LLM not to provide answers to things it was not entirely confident about, and to make sure the answer was firmly based on an input. This approach isn’t foolproof, but like a stern warning to a child, it helps. Additional context was probably more useful. So in our code review agent if a diff of a file has one line, by providing the entire function to the LLM including the changed line, we were able to eliminate the LLM guessing what might be happening in various parts of the function.

Tackling Complex Reasoning

The ability of an AI to navigate complex reasoning tasks is both its greatest strength and its most notable weakness. In creating AI agents, we’ve encountered scenarios where the AI excels in certain steps only to falter at others, revealing a gap in its ability to maintain a coherent chain of reasoning throughout a task. This limitation becomes particularly evident in structured output generation (such as a JSON), where a single flaw can derail the entire output for some other agent to handle. Addressing this can be difficult and leads us to our next big learning.

A Hybrid Solution

The evolution of our approach at Bito has led us to a hybrid model, merging traditional coding techniques with AI’s dynamic capabilities. Initially, the industry buzz suggested leaning heavily on AI, expecting it to carry the brunt of the workload. However, reality taught us differently. By anchoring our agents in code—allocating 70% to 80% of the task to traditional programming, supplemented by AI for specific reasoning tasks—we’ve achieved a significant leap in both reliability and output quality. This balance has not only mitigated the risks associated with AI’s probabilistic nature but has also enhanced the manageability and predictability of our solutions.

The Journey Continues

As we chart the future of AI agents, our experiences may offer valuable lessons. The blend of code and AI, the emphasis on consistency, elimination of hallucinations, and the strategies to overcome reasoning challenges are just a few of our findings as we build agents. But we believe these are critical capabilities that shape the viability and success of AI agents. The road ahead is as promising as it is challenging, inviting us to continually adapt and refine our approaches. By sharing our journey, we hope to contribute to the broader conversation on AI, pushing the boundaries of what’s possible and paving the way for solutions that are not only innovative but truly effective in meeting the needs we have for AI agents. Please let us know what your learnings have been with AI agents, we are excited to hear about your experiences.

Amar Goel

Bito’s Co-founder and CEO. Dedicated to helping developers innovate to lead the future. A serial entrepreneur, Amar previously founded PubMatic, a leading infrastructure provider for the digital advertising industry, in 2006, serving as the company’s first CEO. PubMatic went public in 2020 (NASDAQ: PUBM). He holds a master’s degree in Computer Science and a bachelor’s degree in Economics from Harvard University.

Amar Goel

Amar is the Co-founder and CEO of Bito. With a background in software engineering and economics, Amar is a serial entrepreneur and has founded multiple companies including the publicly traded PubMatic and Komli Media.

Written by developers for developers

This article was handcrafted with by the Bito team.

Latest posts

Rust Code Review: Best Practices, Tools, and Checklist

PHP Code Review: Best Practices, Tools, and Checklist

Comparing Agentic AI Code Reviews with Linear Reviews

Kotlin Code Review: Best Practices, Tools, and Checklist

PEER REVIEW: Gaurav Nigam, VP of Engineering at WorkBoard

Rust Code Review: Best Practices, Tools, and Checklist

PHP Code Review: Best Practices, Tools, and Checklist

Comparing Agentic AI Code Reviews with Linear Reviews

Kotlin Code Review: Best Practices, Tools, and Checklist

PEER REVIEW: Gaurav Nigam, VP of Engineering at WorkBoard

From the blog

The latest industry news, interviews, technologies, and resources.

Published July 17, 2025

Rust Code Review: Best Practices, Tools, and Checklist

Software Engineering

Published July 11, 2025

PHP Code Review: Best Practices, Tools, and Checklist

Software Engineering

Published July 11, 2025

Comparing Agentic AI Code Reviews with Linear Reviews

Artificial Intelligence

Community

Company

Products

Resources

Community

Company

Products

Resources

Let AI lead your code reviews

Crafting AI Agents: Real-World Lessons from the Front Lines

Table of Contents

The Quest for Consistency

Hallucinations – Imagination Run Amok

Tackling Complex Reasoning

A Hybrid Solution

The Journey Continues

Amar Goel

Amar Goel

Written by developers for developers

Latest posts

Rust Code Review: Best Practices, Tools, and Checklist

PHP Code Review: Best Practices, Tools, and Checklist

Comparing Agentic AI Code Reviews with Linear Reviews

Kotlin Code Review: Best Practices, Tools, and Checklist

PEER REVIEW: Gaurav Nigam, VP of Engineering at WorkBoard

Top posts

Rust Code Review: Best Practices, Tools, and Checklist

PHP Code Review: Best Practices, Tools, and Checklist

Comparing Agentic AI Code Reviews with Linear Reviews

Kotlin Code Review: Best Practices, Tools, and Checklist

PEER REVIEW: Gaurav Nigam, VP of Engineering at WorkBoard

From the blog

Rust Code Review: Best Practices, Tools, and Checklist

PHP Code Review: Best Practices, Tools, and Checklist

Comparing Agentic AI Code Reviews with Linear Reviews

Increase velocity, save time, reduce bugs