AI coding tools break down in large, complex codebases, especially in real production systems.
In a recent interview with Business Insider, Boris Cherny, one of the engineers behind Claude Code, was direct about the current limits of AI coding tools. He said that these tools are still “not great at coding” once you move into real production systems, and that for high-stakes work he still prefers to write code by hand.
As he put it, “You can absolutely vibe code with AI models, but it’s not the thing you want to do all the time.” That statement makes sense because many engineering teams see the same pattern.
AI coding tools feel impressive at first. They help with boilerplate, simple refactors, and greenfield work. They speed up obvious tasks and reduce friction when the scope is small.
The problem shows up when these tools are introduced into large, complex codebases.
Most production systems are not clean or isolated. They span multiple repositories, services, and teams. They carry years of history, evolving conventions, and implicit contracts that are rarely written down. This is where today’s AI coding tools start to break down.
The issue is not that AI cannot write code. The issue is that it does not understand how a system works as a whole. It lacks codebase intelligence.
Scale changes everything for AI coding tools
Large codebases behave differently from small projects. They have shared libraries, versioned APIs, hidden dependencies, and logic that spans services. Changes rarely stay local. A small code change in one place can affect behaviour several layers away.
AI coding tools struggle here because they lack system understanding.
Once a task crosses file or service boundaries, failure modes show up quickly. The AI suggests internal APIs that look right but no longer exist. It follows patterns that used to be correct but have since changed. It generates logic that works in isolation but breaks upstream or downstream behaviour after deployment.
The overall output often looks fine locally, but it is wrong at the system level.
A recent JetBrains study by Olga Bedrina backs this. Over 600 developers cited things like “lack of context” and “limited understanding of complex code” as the top failure point in AI tooling, even more than hallucinations.
Here’s where things go wrong without codebase intelligence or system context:
- Misuses internal APIs
- Suggests outdated or incorrect patterns
- Misses upstream changes or data shape mismatches
- Fails to reason across service boundaries
For teams working across repos, this becomes a daily drag. We wrote about that in more depth here: 📎 Engineering teams with cross-service repos need Bito
Back to the topic – You spend time double checking suggestions, tracing dependencies manually, and undoing changes that looked safe at first but broke something else downstream. Over time, the tool stops being trusted for anything that touches real system behaviour.
This is not a model problem. It is a scale problem, and scale changes everything.
In the next section, we will look at why more prompts, bigger context windows, and better search do not fix this, and why the gap only widens as systems grow.
Why more prompts and context do not solve system understanding
When AI coding tools struggle in large codebases, the usual response is to add more context. Longer prompts, bigger context windows, more files, or better instructions. These changes help a bit, but they do not fix the underlying problem.
Large codebases do not fail because the AI lacks enough text. They fail because understanding a system is not a text problem but a structural one.
Context windows and code search flatten systems
A context window, no matter how large, is still a flat slice of code. It does not capture relationships between services, APIs, and dependencies. It cannot explain which components depend on each other, which contracts are stable, or how changes propagate across the system. You can fit more files into the prompt, but you still lose the shape of the system.
Code search has similar limits. It is useful for locating files and symbols, but it cannot explain behaviour. It shows where something lives, not how it flows through the system or what breaks when it changes. In large codebases, understanding behaviour matters more than finding files.
Retrieval improves recall, not reasoning
Embeddings and retrieval help pull in related code, but relevance in large systems is rarely about textual similarity. The most important context is often structural. Authentication layers, shared middleware, data contracts, and side effects usually sit far from the code being edited.
Without understanding these relationships, AI tools retrieve fragments and guess how they fit together. As systems grow, interactions multiply faster than line count.
Small changes start to have global impact.
Tools that rely on prompts, context windows, or search operate on pieces of code, not on the system itself.
To work reliably in large codebases, AI needs more than better retrieval.
It needs codebase intelligence.
The missing layer is codebase intelligence
So far, we’ve established a mutual understanding, i.e., the consistent failure across AI coding tools points to the same gap. They do not understand systems.
Codebase intelligence means knowing how a codebase behaves as a whole. It is not about reading more files or retrieving more snippets. It is about understanding relationships:
Which services call each other.
Which APIs form stable contracts.
Where data flows. What breaks when something changes.
In large software systems, this understanding does not live in any single file. It emerges from how components interact over time. Human engineers build this mental model slowly, through debugging, incidents, reviews, and repeated exposure to change.
AI coding tools like Cursor, Codex, etc. do not have access to that model.
Without codebase intelligence, AI can only operate locally. It can generate code that looks correct in isolation but fails once it interacts with the rest of the system. That gap becomes wider as the codebase grows, and more teams work in parallel.
If AI is going to work reliably in large codebases, it needs this missing layer.
Codebase intelligence is not documentation or tribal knowledge
Many teams assume this problem can be solved with better documentation or stricter conventions. In practice, that rarely works.
Documentation goes stale. Architecture diagrams lag reality. READMEs explain intent, not behaviour. Tribal knowledge lives in people’s heads and disappears when teams change. None of these keep up with fast-moving systems.
Codebase intelligence is different. It reflects the current state of the codebase. It captures how services, APIs, schemas, and dependencies actually connect today, not how they were designed months ago.
This is why the problem keeps resurfacing. Teams rely on static artifacts to explain dynamic systems. AI tools then consume those same incomplete signals and make confident but incorrect decisions.
To move forward, AI needs access to a living representation of the system. One that stays in sync with the code and encodes how the system really works.
In the next section, we will look at what changes when AI tools finally operate with codebase intelligence instead of guesswork.
What changes when AI dev tools understand the system
When AI tools operate with codebase intelligence, their behaviour changes in very concrete ways. They stop guessing. They stop treating code as isolated files. They start reasoning about impact.
Instead of suggesting code that only looks correct, the AI understands how changes propagate across services. It knows which APIs are stable contracts and which are internal details. It can see call flows, shared dependencies, and downstream effects before generating or modifying code.
This is what makes grounded code generation possible in large codebases. The output aligns with existing patterns, respects system boundaries, and avoids breaking behavior elsewhere. Engineers spend less time validating suggestions and more time building.
The same shift applies beyond code generation.
Debugging becomes faster because the AI can trace how failures move through the system. Onboarding improves because new engineers can ask system-level questions instead of piecing things together manually. Reviews get easier because impact is visible, not inferred.
None of this comes from bigger models or longer prompts. It comes from giving AI access to how the system actually works.
Large codebases expose the limits of today’s AI tools
We have established the core problem statement. And now we understand that AI coding tools do not fail because they are poorly built. They fail because large software systems are complex, interconnected, and constantly changing.
Tools that only see fragments of code cannot reason about systems at scale.
As AI generates more code, the cost of not understanding system behaviour increases. Local correctness is no longer enough. What matters is system-level correctness.
The next generation of AI developer tools will not be defined by how much code they can produce, but by how well they understand the systems that code runs in.
In 2026, codebase intelligence is not a nice-to-have. It is the missing foundation.
That is the direction tools like AI Architect are built for. Want to see AI Architect live in action? Book a demo: