AI Architect tops SWE-Bench Pro with 39% higher task success. See results

AI Architect tops SWE-Bench Pro

AI Architect Uncovered Domain-Specific Error Handling That a Standalone Coding Agent Couldn’t

AI Architect Uncovered Domain-Specific Error Handling That a Standalone Coding Agent Couldn’t

Table of Contents

Summary 

In a Teleport P0 diagnostic task from the SWE-Bench Pro evaluation, a baseline coding agent (Claude Sonnet 4.5) failed to correctly classify SQL Server connection errors because it relied on simple string matching. 

Bito’s AI Architect succeeded by surfacing SQL Server’s typed error codes from the driver layer, implementing code-based classification, and delivering a production-grade, locale-independent solution, 18% faster. 

Modern AI coding benchmarks often test whether an agent can reproduce patterns. This task tested something deeper: whether an AI system understands how production infrastructure software actually works. 

The challenge 

Gravitational Teleport is an infrastructure access platform used by enterprises to secure access to databases, Kubernetes clusters, and SSH servers. When a user cannot connect to a SQL Server database, they need precise, actionable diagnostics, not generic “connection failed” messages. 

The task required implementing SQL Server connection testing in Teleport’s Discovery diagnostic flow: a SQLServerPinger that correctly categorizes connection failures into three distinct buckets — connection refused, authentication failure, and invalid database name, each with targeted remediation guidance. 

This isn’t just a coding challenge. It demands deep knowledge of SQL Server’s error reporting system, the go-mssqldb driver’s internal type hierarchy, and Teleport’s established patterns for database connectivity diagnostics. 

Why the baseline agent failed 

Claude Sonnet 4.5 followed a logical approach: it studied the existing PostgreSQL and MySQL pinger implementations, then replicated the pattern for SQL Server. But it made a critical architectural mistake, it relied entirely on string pattern matching against error messages. 

SQL Server error messages vary by driver version, localization, and wrapping context. A message that reads “login failed” in English might read differently on a German-localized server. Without knowledge of SQL Server’s structured error codes (18456 for auth failure, 4060 for invalid database), the coding agent produced an implementation that would pass basic unit tests but fail against real-world SQL Server instances. 

⚠ ROOT CAUSE: The coding agent lacked domain-specific knowledge of SQL Server’s typed error system (mssql.Error with numeric .Number codes). It could only pattern-match on error strings — an approach that breaks with message variations, localization, and driver wrapping.

How Bito AI Architect solved it 

Bito’s AI Architect didn’t start by copying patterns, it first built contextual understanding of the problem using its knowledge graph, which semantically links code definitions, driver APIs, error taxonomies, and test requirements. 

This allowed the system to recognize that SQL Server errors are richly typed objects with numeric codes, and that these codes should be the primary means of classification. With that insight, it generated a two-layer error handling strategy, using numeric error codes (such as 18456, 18452, 4060, 911) as the authoritative classifier and string matching only as a fallback. 

The result was a production-grade implementation with 13 test cases across 2 test files, handling typed errors, wrapped errors, localized messages, and edge cases that the coding agent’s approach would have missed entirely. 

KEY ARCHITECTURAL INSIGHT 

SQL Server errors are structured objects with definitive numeric codes, not just text strings. Bito’s AI Architect surfaced this domain knowledge from the driver’s type hierarchy, enabling code-based error classification that is locale-independent and production-hardened.

Head-to-head comparison 

 Claude Sonnet 4.5 (baseline agent) Bito’s AI Architect 
Code Exploration Manual Glob/Grep/Read — copied PostgreSQL/MySQL string-matching pattern Indexed repo + driver analysis — discovered mssql.Error typed structure upfront 
Error Strategy String matching only — fragile across locales and driver versions Typed error codes (18456, 4060, 911) as primary, string fallback as safety net 
Domain Knowledge None — no awareness of SQL Server error code semantics Full — error taxonomy surfaced before coding began 
Test Coverage ~5 cases in 1 file using mock strings 13 cases in 2 files including typed errors, wrapped errors, edge cases 
Task Outcome FAILED PASSED — 18% faster execution 
Production Resilience Breaks with localized or wrapped error messages Locale-independent, driver-version resilient 

Conclusion 

This case study shows that solving real engineering problems requires more than pattern replication, it requires understanding system context, domain knowledge, and error semantics. A generic coding agent can replicate patterns they see in existing code, but when a task demands domain-specific knowledge, like SQL Server’s error code taxonomy, pattern matching alone produces fragile solutions.

Bito’s AI Architect bridges this gap by surfacing architectural and domain knowledge that transforms a string-matching hack into production-grade error handling.

Picture of Anand Das

Anand Das

Anand is Co-founder and CTO of Bito. He leads technical strategy and engineering, and is our biggest user! Formerly, Anand was CTO of Eyeota, a data company acquired by Dun & Bradstreet. He is co-founder of PubMatic, where he led the building of an ad exchange system that handles over 1 Trillion bids per day.

Picture of Amar Goel

Amar Goel

Amar is the Co-founder and CEO of Bito. With a background in software engineering and economics, Amar is a serial entrepreneur and has founded multiple companies including the publicly traded PubMatic and Komli Media.

Written by developers for developers red heart icon

This article is brought to you by the Bito team.

Latest posts

Why Coding Agents Get Lost in Your Codebase (Even After Indexing Everything) 

The TPUT Implementation Claude Code Got Wrong and AI Architect Got Right

How to Integrate Bito’s AI Architect with Claude Code

How to Integrate Bito’s AI Architect with Cursor

The 9-File Security Hardening That Coding Agents Missed and AI Architect Nailed

Top posts

Why Coding Agents Get Lost in Your Codebase (Even After Indexing Everything) 

The TPUT Implementation Claude Code Got Wrong and AI Architect Got Right

How to Integrate Bito’s AI Architect with Claude Code

How to Integrate Bito’s AI Architect with Cursor

The 9-File Security Hardening That Coding Agents Missed and AI Architect Nailed

From the blog

The latest industry news, interviews, technologies, and resources.

Code Indexing

Why Coding Agents Get Lost in Your Codebase (Even After Indexing Everything) 

arrow bito ai
The TPUT Implementation Claude Code Got Wrong and AI Architect Got Right

The TPUT Implementation Claude Code Got Wrong and AI Architect Got Right

arrow bito ai
How to Integrate Bito's AI Architect with Claude Code

How to Integrate Bito’s AI Architect with Claude Code

arrow bito ai