Summary
As part of Bito’s SWE‑bench Pro evaluation, AI coding agents were tested on real production engineering tasks from leading open-source repositories. One such task focused on ProtonMail’s calendar subscription modal, where missing input validation for URL created both security and reliability risks.
The baseline coding agent (Claude Sonnet 4.5) identified the gap but responded with unnecessary abstractions that couldn’t be safely shipped. Instead of extending the existing validation logic, it built a parallel system. Whereas the baseline agent augmented with deep codebase context from Bito’s AI Architect delivered a minimal, pattern-consistent fix that passed all validation scenarios.
The challenge
ProtonMail’s calendar subscription modal had validation gaps: no URL length limit (DoS risk from extremely long URLs), inconsistent warnings, and unclear warning priority when multiple issues apply. The fix required enforcing a 2000-character limit, establishing clear warning priority, and centralizing a scattered ResizeObserver mock.
The task tested whether agents add complexity or leverage existing patterns to solve the problem.
Why the baseline agent failed
Claude Sonnet 4.5 over-engineered the solution: it created 3 new helper functions, a new enum (URL_WARNING_TYPE), a useMemo hook, an Alert component import, and ~50 lines of new code with ~10 edit operations. It never ran tests to verify the implementation.
The irony: the component already had a getError() function that handled multiple error cases. Adding one more case would have been a single line. Instead, the agent built a parallel warning system alongside the existing error system.
⚠ ROOT CAUSE: The coding agent over-engineered the solution by creating parallel systems (new enum, new functions, new hooks) instead of extending the existing getError() pattern. More code meant more potential bugs, and without test verification, those bugs remained hidden.
How Bito’s AI Architect solved it
Bito’s AI Architect approach was fundamentally different. Thanks to its codebase knowledge graph that provides system-wide context to the coding agent.
Using its knowledge graph, Bito’s AI Architect mapped the relationships between:
- The Calendar subscription modal component
- The existing validation flow, including the getError() function
- The UI state logic controlling warnings and disabled submission
- The test suite and ResizeObserver mocks
- And other components in the codebase using the same validation patterns
This knowledge graph is not just a file index. It captures how logic flows through the system, which functions act as validation authorities, which modules consume their output, and where changes would have the safest and most consistent effect.
Bito’s AI Architect’s methodology favored extending existing patterns over creating new systems. The treatment agent made 6 focused edits: added one constant (CALENDAR_URL: 2000), one inline boolean (isTooLong), one line to the existing getError() function, one condition to the disabled check, and centralized the ResizeObserver mock. Total: ~15 lines.
The 6-line solution was simpler, safer, and consistent with existing patterns in the codebase.
KEY ARCHITECTURAL INSIGHT
The best code change is the smallest one that solves the problem. Bito’s AI Architect’s pattern-reuse philosophy recognized that getError() already handled multiple cases, extending it by one line was simpler than building a parallel warning system.
Head-to-head comparison
| Claude Sonnet 4.5 (baseline agent) | Bito’s AI Architect | |
| Code Exploration | Identified validation gap but missed existing getError() pattern | Recognized getError() as the right extension point immediately |
| Approach | Over-engineered — 3 helpers, 1 enum, useMemo, Alert component (~50 lines) | Minimal — extended existing getError() with 1 case (~15 lines) |
| New Abstractions | URL_WARNING_TYPE enum, hasValidExtension(), isGooglePublicLink(), getURLWarning() | Zero — reused existing patterns entirely |
| Verification | None — tests never run | All scenarios verified — submission blocked for invalid URLs |
| Task Outcome | FAILED — unverified, over-complex solution | PASSED — minimal, pattern-consistent fix |
Conclusion
This ProtonMail case highlights why system-wide context is essential for AI-driven development. Without it, agents reinvent existing logic, producing bloated and error-prone code.
The baseline agent ignored ProtonMail’s existing getError() validation and added a separate system, making the solution longer, harder to maintain, and risky to ship.
Bito’s AI Architect maps the full codebase, giving the agent system-level context and full awareness of upstream and downstream dependencies across components. The result is minimal, safe, and verifiable changes.
This matters because most engineering work happens in existing systems, where the goal is not to build new features from scratch, but to fix or improve code that already exists.