We ran a controlled experiment. Same coding agent, same codebase, same task. One variable: Bito’s AI Architect providing context, on versus off. The task was implementing deterministic terms aggregation using the TPUT algorithm inside Elasticsearch, 3.85 million lines of Java across 29,000+ files.
The results were not a matter of degree.
Without AI Architect, the agent built a brute-force workaround instead of solving the actual problem. 6 files changed, severe memory risk on high-cardinality fields, no multi-shard tests.
With AI Architect, the agent implemented genuine multi-phase TPUT the way a senior Elasticsearch engineer would. 27 files changed, proper test coverage across all layers.
The experiment setup
| Parameter | Detail |
| Coding agent | Claude Code with Claude Opus 4.6 |
| Repository | Elasticsearch (github.com/elastic/elasticsearch) |
| Codebase scale | 3.85M lines of Java, 29,000 files |
| Task | Implement deterministic terms aggregation using the TPUT algorithm |
| Variable | Bito AI Architect providing codebase context, on versus off |
The task: implementing TPUT in Elasticsearch
Elasticsearch’s terms aggregation, one of its most frequently used features, returns results that are approximate and non-deterministic. Run the same query twice and you can get different answers.
This happens because Elasticsearch shards data across multiple nodes. Each shard independently computes its own local top-K, and the coordinating node merges those local results. A term that ranks just outside the top-K on every shard could be the true global leader, but it never gets reported. This is the classic distributed Top-K problem.
What TPUT requires
The TPUT algorithm (Threshold-based Pruning for Uncertain Top-k) solves this through multiple coordinated rounds: an initial scatter-gather, a threshold computation step, a refinement round where shards report exact counts for candidate terms, and a gap resolution phase. Implementing this correctly in Elasticsearch requires changes across the full search pipeline.
| Layer | Required change |
| SearchPhaseController | Orchestrate additional round-trips after the initial reduce |
| InternalTerms / StringTerms | Track which terms need exact counts in the refinement round |
| TermsAggregator | Support targeted “fetch exact count for these terms” query mode |
| Transport layer | New request and response types for refinement phases |
| Shard-level execution | Respond to targeted term-count lookups without re-running the full aggregation |
For a developer unfamiliar with this codebase, understanding the existing scatter-gather flow alone could take days. Implementing the change correctly across all layers, without breaking existing functionality, would take significantly longer.
What happened without AI Architect
1. Wrong assumption about the framework: Claude Code concluded that Elasticsearch’s aggregation framework does not natively support multi-round shard communication. It designed a workaround rather than solving the actual problem.
2. Brute-force workaround instead of TPUT: It created a new aggregation type called deterministic_terms that forces shard_size to Integer.MAX_VALUE, causing every shard to return all unique terms in a single pass. The agent’s own documentation acknowledges this is equivalent to running Phase 1 with shard_size = infinity, eliminating the need for Phase 2 entirely.
3. Severe memory risk with zero coordination logic: The approach trades severe memory and network overhead for correctness and sidesteps the actual engineering challenge rather than solving it.
Result: 6 files changed. Severe memory risk on high-cardinality fields. All 8 unit tests single-shard only.
What the agent missed: Pipeline extension points, multi-round shard coordination, threshold computation, and gap resolution.
What happened with AI Architect
1. Planned before writing a single line of code:
With Bito’s AI Architect providing deep codebase context through its knowledge graph, Claude Code generated a codebase context summary, an architecture design, and a file-level implementation plan before touching any production code.
2. Understood exactly where to extend the pipeline:
The agent identified that a new phase could be inserted between query-reduce and fetch, that FetchSearchPhase.innerRun() was the correct integration point, and that the transport layer already had established patterns for new shard-to-coordinator actions.
3. Built genuine multi-phase TPUT:
It executed the implementation across 12 incremental tasks, compiling after each major change. The result extended the existing terms aggregation with a new “mode”: “exact” parameter, added a full AggregationRefinementPhase to the search pipeline, implemented transport actions for the refinement round, and included Phase 3 gap resolution for complete TPUT correctness.
Result: 27 files changed. 5 test files covering coordinator, transport, service, and phase layers, with proper multi-shard coordination coverage.
Side-by-side comparison
| Dimension | Without AI Architect | With AI Architect |
| Approach | Created a new, separate deterministic_terms aggregation type | Extended the existing terms aggregation with “mode”: “exact” |
| TPUT algorithm | Brute-force shard_size = MAX_INT in a single pass | Genuine multi-phase TPUT with threshold computation and gap resolution |
| Pipeline integration | Minimal, no pipeline changes | Deep, new search phase, transport actions, shard-side execution |
| Files changed | 6 files (4 new, 2 modified) | 27 files (12 new, 15 modified) |
| Lines of code | 1,058 lines added | 1,938 lines added |
| API design | Separate aggregation type, parallel to existing terms API | Parameter on existing aggregation, consistent with Elasticsearch conventions |
| Memory safety | No safeguards, shard_size = MAX_INT by default | Threshold-based, only candidate terms cross the wire in Phase 2 |
| Test coverage | 8 unit tests, all single-shard | 5 test files covering coordinator, transport, service, and phase |
| Multi-shard tests | None | Mock-based phase tests covering shard coordination |
| Planning artifacts | None, jumped straight to coding | Context doc, architecture design, and implementation plan generated first |
What made the difference
Three specific behaviors changed when the agent had AI Architect’s codebase context.
1. The agent knew the pipeline was extensible
With AI Architect’s knowledge graph, the agent understood that a new search phase could be inserted between query-reduce and fetch, that FetchSearchPhase.innerRun() was the correct integration point, and that the transport layer already had established patterns for new shard-to-coordinator actions. It chose to extend the pipeline because it knew it could.
2. The agent followed codebase conventions
Because AI Architect exposed the codebase’s conventions and extension patterns, the agent extended the existing terms aggregation with a mode parameter rather than creating a parallel aggregation type. This is the same pattern a senior Elasticsearch contributor would follow, producing no code duplication, a backward-compatible API surface, and a lower maintenance burden.
3. The agent planned before coding
With access to deep codebase understanding from AI Architect, the agent generated a codebase context document, architecture design, and step-by-step implementation plan before writing any production code. It then executed 12 incremental tasks with compilation checks after each, formulating a real plan rather than probing through trial and error.
Conclusion
AI Architect gave Claude Code the architectural understanding to extend Elasticsearch’s pipeline correctly, follow codebase conventions, and plan before writing a single line of code. Without it, the same agent took the only path available to it: a brute-force workaround that technically works but fundamentally scales.
See the code from both runs at:
- With AI Architect — 27 files, genuine multi-phase TPUT: github.com/bito-vansh/elasticsearch/pull/1
- Without AI Architect — 6 files, brute-force workaround: github.com/bito-vansh/elasticsearch/pull/2
If your team is running Claude Code, Cursor, or any other coding agent on a large codebase, the architectural decisions your agent makes today depend entirely on what it understands about your system.
Your codebase deserves that level of architectural understanding. Connect AI Architect to your coding agent and see the difference: