AI Architect tops SWE-Bench Pro with 39% higher task success. See results

AI Architect tops SWE-Bench Pro

Blank-Page Paralysis: Why Coding Agents Stalled on a 710-File Codebase and AI Architect Delivered

Blank-Page Paralysis: Why Coding Agents Stalled on a 710-File Codebase and AI Architect Delivered

Table of Contents

Summary

This case study is drawn from the SWE-Bench Pro Evaluation, an independent benchmark conducted by The Context Lab that tests AI coding agents on real-world codebases. It examines a greenfield feature task inside Open Library’s 710+ file Python codebase and shows how the baseline coding agent (Claude Sonnet 4.5) produced zero lines of code, while the same baseline agent augmented with deep codebase context from Bito’s AI Architect delivered a complete backend feature, ~500 lines of production-quality code, with 42% fewer tool calls. 

The challenge

Open Library is a large, mature Python codebase with over 710 files, spanning multiple modules, a well-established set of database conventions, internal API patterns, and a network of cross-module integration hooks. It’s the kind of codebase where even experienced engineers spend hours just getting oriented before touching a single line. 

The task was to build a complete “Best Book Awards” backend from scratch inside this codebase, a classic greenfield feature, but in an unfamiliar system. The full scope included: designing a data model, writing a database schema with indexes and constraints, implementing CRUD APIs, adding “Already Read” validation logic, account anonymization hooks, work redirect integration, and analytics endpoints. 

The task demanded understanding Open Library’s module organization, database conventions, API patterns, validation workflows, and integration hooks, none of which are obvious from reading individual files. 

Why the baseline agent failed 

Claude Sonnet 4.5, running without deep codebase context, produced zero implementation. The execution log ends at line 325 with only wrapper initialization messages. No files were explored. No code was written. No tests were run. The agent appears to have been paralyzed by the task complexity, a “blank page problem” where it couldn’t determine where to start in a 710+ file codebase. 

Without architectural guidance, the agent had no way to find reference patterns (ratings.py, bookshelves.py), discover API conventions, or identify database schema patterns. 

The agent also burned roughly 96 tool calls in this state, nearly twice as many as Bito’s AI Architect would ultimately use, without producing a single artifact. This is a meaningful signal: without system understanding, tool calls become exploratory thrashing rather than targeted action. 

⚠ ROOT CAUSE: The coding agent faced blank-page paralysis in a 710+ file codebase. Without indexed knowledge to identify reference implementations, database conventions, or API patterns, it couldn’t formulate a starting point for a complex greenfield feature.

How Bito’s AI Architect solved it 

Where the baseline agent saw a blank page, the same baseline agent augmented with deep codebase context from Bito’s AI Architect saw a map. 

Before writing a single line of code, Bito’s AI Architect builds a knowledge graph of the entire codebase, a structured index of modules, their relationships, shared utilities, database schema conventions, API handler patterns, and cross-module dependencies. This graph isn’t a flat file listing; it’s a semantic representation of how the system is organized and how its parts interact. When a new task arrives, the agent queries this graph to surface the specific modules, patterns, and reference implementations most relevant to the work at hand. 

For this task, the knowledge graph immediately identified critical reference modules: ratings.py, which demonstrates Open Library’s conventions for user-to-work relationship models, bookshelves.py for CRUD patterns, and schema.sql for database conventions. These weren’t obvious files to find; they required understanding the system’s conceptual structure, not just its directory tree. A developer new to the codebase might spend hours discovering them manually. Bito’s AI Architect surfaced them in parallel, instantly. 

Armed with these patterns, the agent delivered a complete Bestbook class with all CRUD operations, database schema with indexes and constraints, API endpoints following Open Library conventions, workflow integrations, and a comprehensive test suite, roughly 500 lines of production-quality code.

KEY ARCHITECTURAL INSIGHT 

In large codebases, the fastest path to a new feature is finding existing features that solve similar problems. Bito’s AI Architect’s indexed knowledge instantly identified ratings.py and bookshelves.py as reference patterns, a discovery that would take hours of manual exploration.

Head-to-head comparison

 Claude Sonnet 4.5 (baseline agent) Bito’s AI Architect 
Code Exploration Paralyzed — no exploration attempted in 710+ file codebase Parallel exploration — Bito + local tools identified reference patterns instantly 
Reference Patterns None found — couldn’t discover ratings.py or bookshelves.py Both identified immediately — used as templates for implementation 
Implementation 0 files, 0 lines of code 5+ files, ~500 lines — complete feature with tests 
Efficiency ~96 tool calls (unproductive) ~56 tool calls (targeted, 42% fewer) 
Task Outcome FAILED — zero output PASSED — full backend feature delivered 

Conclusion 

The code in this task (a data model, a schema, some CRUD handlers) wasn’t hard to write. What was hard was knowing where to start in a 710-file system. The baseline agent never figured that out. Blank-page paralysis isn’t a failure of capability; it’s a failure of system context. 

Bito’s AI Architect solves this at the source. Its knowledge graph gives the agent a precise map of the codebase before work begins: conventions, reference patterns, integration hooks. It starts with a plan instead of a blank page. 

As codebases grow, the bottleneck in AI-assisted development isn’t model quality. It’s architectural reasoning. Teams that close that gap ship faster, with fewer wrong turns, and code that actually fits the system it lives in. 

Picture of Anand Das

Anand Das

Anand is Co-founder and CTO of Bito. He leads technical strategy and engineering, and is our biggest user! Formerly, Anand was CTO of Eyeota, a data company acquired by Dun & Bradstreet. He is co-founder of PubMatic, where he led the building of an ad exchange system that handles over 1 Trillion bids per day.

Picture of Amar Goel

Amar Goel

Amar is the Co-founder and CEO of Bito. With a background in software engineering and economics, Amar is a serial entrepreneur and has founded multiple companies including the publicly traded PubMatic and Komli Media.

Written by developers for developers red heart icon

This article is brought to you by the Bito team.

Latest posts

Why Coding Agents Get Lost in Your Codebase (Even After Indexing Everything) 

The TPUT Implementation Claude Code Got Wrong and AI Architect Got Right

How to Integrate Bito’s AI Architect with Claude Code

How to Integrate Bito’s AI Architect with Cursor

The 9-File Security Hardening That Coding Agents Missed and AI Architect Nailed

Top posts

Why Coding Agents Get Lost in Your Codebase (Even After Indexing Everything) 

The TPUT Implementation Claude Code Got Wrong and AI Architect Got Right

How to Integrate Bito’s AI Architect with Claude Code

How to Integrate Bito’s AI Architect with Cursor

The 9-File Security Hardening That Coding Agents Missed and AI Architect Nailed