Summary
In a Teleport P0 diagnostic task from the SWE-Bench Pro evaluation, a baseline coding agent (Claude Sonnet 4.5) failed to correctly classify SQL Server connection errors because it relied on simple string matching.
Bito’s AI Architect succeeded by surfacing SQL Server’s typed error codes from the driver layer, implementing code-based classification, and delivering a production-grade, locale-independent solution, 18% faster.
Modern AI coding benchmarks often test whether an agent can reproduce patterns. This task tested something deeper: whether an AI system understands how production infrastructure software actually works.
The challenge
Gravitational Teleport is an infrastructure access platform used by enterprises to secure access to databases, Kubernetes clusters, and SSH servers. When a user cannot connect to a SQL Server database, they need precise, actionable diagnostics, not generic “connection failed” messages.
The task required implementing SQL Server connection testing in Teleport’s Discovery diagnostic flow: a SQLServerPinger that correctly categorizes connection failures into three distinct buckets — connection refused, authentication failure, and invalid database name, each with targeted remediation guidance.
This isn’t just a coding challenge. It demands deep knowledge of SQL Server’s error reporting system, the go-mssqldb driver’s internal type hierarchy, and Teleport’s established patterns for database connectivity diagnostics.
Why the baseline agent failed
Claude Sonnet 4.5 followed a logical approach: it studied the existing PostgreSQL and MySQL pinger implementations, then replicated the pattern for SQL Server. But it made a critical architectural mistake, it relied entirely on string pattern matching against error messages.
SQL Server error messages vary by driver version, localization, and wrapping context. A message that reads “login failed” in English might read differently on a German-localized server. Without knowledge of SQL Server’s structured error codes (18456 for auth failure, 4060 for invalid database), the coding agent produced an implementation that would pass basic unit tests but fail against real-world SQL Server instances.
⚠ ROOT CAUSE: The coding agent lacked domain-specific knowledge of SQL Server’s typed error system (mssql.Error with numeric .Number codes). It could only pattern-match on error strings — an approach that breaks with message variations, localization, and driver wrapping.
How Bito AI Architect solved it
Bito’s AI Architect didn’t start by copying patterns, it first built contextual understanding of the problem using its knowledge graph, which semantically links code definitions, driver APIs, error taxonomies, and test requirements.
This allowed the system to recognize that SQL Server errors are richly typed objects with numeric codes, and that these codes should be the primary means of classification. With that insight, it generated a two-layer error handling strategy, using numeric error codes (such as 18456, 18452, 4060, 911) as the authoritative classifier and string matching only as a fallback.
The result was a production-grade implementation with 13 test cases across 2 test files, handling typed errors, wrapped errors, localized messages, and edge cases that the coding agent’s approach would have missed entirely.
KEY ARCHITECTURAL INSIGHT
SQL Server errors are structured objects with definitive numeric codes, not just text strings. Bito’s AI Architect surfaced this domain knowledge from the driver’s type hierarchy, enabling code-based error classification that is locale-independent and production-hardened.
Head-to-head comparison
| Claude Sonnet 4.5 (baseline agent) | Bito’s AI Architect | |
| Code Exploration | Manual Glob/Grep/Read — copied PostgreSQL/MySQL string-matching pattern | Indexed repo + driver analysis — discovered mssql.Error typed structure upfront |
| Error Strategy | String matching only — fragile across locales and driver versions | Typed error codes (18456, 4060, 911) as primary, string fallback as safety net |
| Domain Knowledge | None — no awareness of SQL Server error code semantics | Full — error taxonomy surfaced before coding began |
| Test Coverage | ~5 cases in 1 file using mock strings | 13 cases in 2 files including typed errors, wrapped errors, edge cases |
| Task Outcome | FAILED | PASSED — 18% faster execution |
| Production Resilience | Breaks with localized or wrapped error messages | Locale-independent, driver-version resilient |
Conclusion
This case study shows that solving real engineering problems requires more than pattern replication, it requires understanding system context, domain knowledge, and error semantics. A generic coding agent can replicate patterns they see in existing code, but when a task demands domain-specific knowledge, like SQL Server’s error code taxonomy, pattern matching alone produces fragile solutions.
Bito’s AI Architect bridges this gap by surfacing architectural and domain knowledge that transforms a string-matching hack into production-grade error handling.