Summary
This case study is part of the SWE-Bench Pro Evaluation, an independent benchmark conducted by The Context Lab that tests AI coding agents on real-world production codebases. The evaluation measures where deep, repository-level context changes outcomes, and this task is one of its clearest examples.
Restructuring 90 TypeScript files and updating 216 cross-package imports across a multi-application monorepo is not a task where a baseline coding agent (Claude Sonnet 4.5) can afford to guess. Without complete dependency mapping, any partial attempt breaks the build across every application simultaneously. Bito’s AI Architect completed it end-to-end: safely, systematically, and with zero broken references.
The challenge
ProtonMail’s calendar module had scattered utility functions, recurrence rules, alarms, encryption, and mail integrations across generic directories. This caused mixed responsibilities, unclear naming (the ICS directory was labeled icsSurgery/ rather than the straightforward ics/), maintenance friction, and onboarding difficulty for new developers.
The refactoring scope was massive: 90 TypeScript files requiring reorganization, 13,638 files scanned for import updates, and 216 import statements needing modification across 126 files in a monorepo with multiple applications (calendar, mail, account) and shared packages.
Moving a file without first knowing every location that imports it means introducing broken references across applications that may not share a common build boundary. The risk is not localized; it is systemic.
Why the baseline agent failed
No baseline agent was run for this task. The scope and risk of a 90-file refactoring was considered too complex for a standalone coding agent to attempt safely. An incomplete refactoring would break the entire monorepo build, affecting multiple applications simultaneously.
Without repository-level context to map all import dependencies across packages, any agent would risk creating orphaned code paths and broken builds across the calendar, mail, and account applications.
⚠ ROOT CAUSE: Large-scale refactoring across monorepo boundaries requires complete dependency mapping across multiple applications and packages. Without this holistic view, partial refactoring breaks more than it fixes.
How Bito’s AI Architect solved it
Where a baseline agent operates with a local, incremental view of the codebase, Bito’s AI Architect begins with a holistic one. Before any file is moved or any import is rewritten, AI Architect constructs a knowledge graph of the repository: a structured, queryable map of the codebase that encodes modules, their exported symbols, their dependency relationships, and the cross-package import chains that connect them.
This knowledge graph is not a simple directory listing or a grep-based search. It represents the semantic structure of the codebase: which modules exist, what they expose, and exactly which other modules depend on them, across every application and package boundary in the monorepo. When Bito’s AI Architect needs to understand the impact of moving a set of files, it queries this graph rather than exploring files reactively. The answer is complete and immediate, rather than partial and emergent.
For the ProtonMail calendar restructuring, Bito’s AI Architect provided a holistic repository view that made this refactoring possible. By mapping dependencies across all applications and packages, the agent created a complete migration plan: domain-driven directory design (13 directories), automated migration scripts, and systematic import updates.
The agent created Python scripts for bulk import updates and bash scripts for file reorganization, processing 13,638 files systematically. The result: 90 files reorganized, 216 imports updated, zero broken references, and comprehensive documentation for future maintainers.
KEY ARCHITECTURAL INSIGHT
Monorepo refactoring at scale requires repository-level dependency mapping that individual file exploration cannot provide. Bito AI Architect’s holistic view identified all 5 packages importing calendar modules, enabling complete coverage that prevented broken builds.
Head-to-head comparison
| Claude Sonnet 4.5 (baseline agent) | Bito’s AI Architect | |
| Code Exploration | Not attempted — scope too risky without dependency mapping | Complete repo analysis — Bito mapped all 5 importing packages |
| Scope Awareness | Unable to determine full impact of file moves | 13,638 files scanned, 126 requiring changes identified upfront |
| Methodology | N/A — too risky to attempt | 10-step systematic plan with automated migration scripts |
| Files Reorganized | 0 | 90 files into 13 domain-specific directories |
| Task Outcome | Not attempted | PASSED — 216 imports updated, zero broken references |
Conclusion
Some tasks are impossible without architectural context: not difficult, but truly impossible.
A 90-file monorepo refactoring with 216 cross-package import dependencies requires holistic repository mapping that no amount of file-by-file exploration can provide. Without a complete dependency map built before the first file moves, the refactoring fails in ways that are hard to diagnose and time-consuming to reverse. Bito’s AI Architect makes the impossible tractable.
For engineering teams, the implication is direct: as codebases grow, the bottleneck is not code generation; it is architectural reasoning. Tasks like refactoring, cross-package migrations, and dependency cleanup are precisely where system-level context determines whether an AI agent is an asset or a liability. Bito’s AI Architect was built for that gap.