Bito’s AI Architect cuts Claude Code’s token cost by 47% on SWE-Bench Pro. It gives the coding agent codebase context, a continuously updated, structured map of every repository served over MCP.
That matters because exploration is what makes coding agents expensive. Without codebase context, the agent greps, lists, and follows imports for dozens of steps before writing a single line. With it, the agent goes straight to the files that matter.
Across substantial multi file tasks, token cost drops 47% in aggregate and 68% on individual tasks, with 60% fewer reasoning steps and 49% fewer tool calls. The same SWE-Bench Pro evaluation showed AI Architect lifting task success from 51.9% to 70.1% with Claude Opus 4.6.
Why AI coding agents are expensive
When an agent picks up a task in an unfamiliar codebase, it has to build a mental model first. Without a map, the only way to do that is brute force, listing directories, grepping for symbols, opening files, and following imports.
Each exploration step does two costly things.
- Another round trip to the model, which generates more reasoning and more output tokens.
- More content dumped into the conversation. The agent re-reads the entire conversation on every later step, so a 50 KB file opened on step 8 is still being re-processed on step 80.
The second effect compounds. The cost of a long agent run grows faster than linearly with its length, because the context the model re-reads keeps getting bigger.
On the harder tasks there is a third trap, the search spiral.
The agent grinds through 40 to 90 steps of dead end searches, re-reads the same large files multiple times, and starts producing busywork like summary docs and verification scripts. Pure token burn that never lands the fix.
These are exactly the tasks AI Architect rescues.
How Bito’s AI Architect works
AI Architect continuously indexes every repository and exposes that index to any MCP compatible coding agent. When the agent starts a task, it consults the index and receives a compact structured briefing on the repository.
- The architecture and major frameworks in use
- The component and module breakdown, and where each one lives
- The file and directory layout
- Dependency relationships between modules
During the run, the agent calls back for targeted queries, references to a symbol across the codebase, the exact code at a specific location, conventions a new file must follow.
Armed with that, the agent skips the discovery phase. It knows where to look, opens the few files that actually matter, and starts working. The search spiral never starts, and the context never balloons.
The compact map costs a small fixed amount of context up front, a fraction of the repeated ad hoc exploration it replaces.
The evaluation
We ran the test the obvious way, the same coding agent, on the same engineering tasks drawn from real open-source projects, with and without AI Architect.
| Setup | Detail |
| Tasks | Real engineering tasks, features, bug fixes, refactors. This evaluation focuses on substantial multi file changes in large codebases, the kind that dominate day to day engineering work. |
| Codebases | Production open-source projects, Flipt, Teleport, and Tutanota web clients |
| Languages | Go, TypeScript, JavaScript |
| Agent | Anthropic Claude (Claude Code), identical version and settings in both arms |
| Variable | Whether AI Architect’s index was available over MCP |
| Measurements | Token usage, tool call counts, reasoning steps, from the agents’ own run logs |
Same model, same tasks, same harness.
Results, where the cost actually drops
AI Architect cuts coding agent token cost per task by roughly 47% in aggregate, with peaks of 68% on individual tasks. The agent runs faster, calls fewer tools, and writes fewer redundant edits, while shipping the same fix.
Token cost efficiency
Token usage drops across every category that grows with exploration.
| Token category | Without AI Architect | With AI Architect | Change |
| Context re-read across the run | 58.2M | 30.2M | 48% lower |
| New context written into the run | 1.78M | 1.03M | 42% lower |
| Tokens generated by the agent | 0.18M | 0.09M | 48% lower |
| Content pulled in by file or search exploration | 3.45M chars | 1.70M chars | 51% lower |
Decomposing where the saved cost comes from, two thirds of the win (66%) is the compounding effect of a shorter run carrying a leaner transcript that the model re-processes far fewer times. The rest splits between less new content written into context (22%) and fewer output tokens generated (11%).
The map the agent consults up front is a small fixed cost. The exploration it replaces, dozens of file dumps and search outputs that ride along in context and get re-processed on every later step, is a far larger and growing one.
Reasoning efficiency
Fewer round trips through the model means fewer reasoning steps to think through. On these tasks the agent went from an average of roughly 75 reasoning steps to roughly 30, a 60% reduction. Runs that without a map would sprawl into 60, 100, even 150 steps complete in 15 to 35.
Generated token volume falls in step (48% lower). Fewer reasoning turns, fewer tokens.
Tool call efficiency
The agent takes 49% fewer actions per task. The breakdown of which actions disappear is the whole story, overwhelmingly navigation, plus the busywork a stuck agent generates.
| Agent action | Without AI Architect | With AI Architect | Change |
| File reads | 28.3 per task | 10.7 per task | 62% fewer |
| Shell commands (grep, find, git) | 36.2 per task | 17.5 per task | 52% fewer |
| Code search (Glob) | 4.0 per task | 2.4 per task | 40% fewer |
| Text search (Grep) | 8.9 per task | 4.7 per task | 47% fewer |
| Code edits and file writes | 13.6 per task | 7.1 per task | 48% fewer |
The fix the agent ships is unchanged. The extra writes that vanish are baseline overhead, redundant re-edits and the throwaway artifacts a lost agent generates while casting around for a path.
Standout example: a Flipt task at 68% lower cost

Task, add audit configuration reporting to a service’s anonymous telemetry, a real change in the Flipt codebase. Token cost dropped 68% with AI Architect.
Without AI Architect: The agent spawned exploration sub agents and started globbing and grepping. Across the next 79 steps, it ran roughly 25 file reads and roughly 40 grep, find, and git commands. It re-read the same files multiple times.
It hunted for a struct that the task itself was meant to create. The first code edit landed on step 80, followed by a handful of edits and test runs.
With AI Architect: The agent consulted the repository map and ran 3 targeted searches. It read the 3 files that actually mattered. The first code edit landed on step 14, followed by the same edits and the same test runs.
Same agent. Same fix. Same tests passing. The only difference, one configuration had a map, and the other had to draw it from scratch and burned 60 plus steps doing it.
What this means for engineering teams
Coding agents running at any scale, in your IDE, in CI, in an agent platform, or as a product, pay the discovery overhead on every task. On the substantial tasks, it is the dominant cost.
AI Architect removes most of it.
- Roughly 47% lower spend at any scale. The bigger the deployment, the larger the absolute saving.
- Far leaner runs. 60% fewer reasoning steps. The long navigation heavy runs that blow up the bill get cut hardest.
- Same output. The agent writes the same code. Only the wasted exploration and busywork disappear.
- Trivial to adopt. AI Architect runs as an MCP server. Point your agent at it. No model changes, no prompt surgery, no workflow changes.
- Compounds with codebase size. The bigger and more complex the repository, the more navigation overhead there is to eliminate.
On SWE-Bench Pro, AI Architect resolves more tasks at roughly 47% lower token cost per task. More work shipped, less spent doing it.