AI Architect tops SWE-Bench Pro with 39% higher task success. See results

AI Architect tops SWE-Bench Pro

The Plaintext Token Leak That Coding Agents Missed and AI Architect Fixed

The Plaintext Token Leak That Coding Agents Missed and AI Architect Fixed

Table of Contents

Summary 

This case study is part of the SWE-Bench Pro Evaluation, an independent benchmark conducted by The Context Lab that tests AI coding agents on real-world codebases. It examines a security-critical task in Teleport, an open-source infrastructure access platform: masking provisioning tokens in log output to prevent plaintext secret exposure. 

Where a baseline coding agent (Claude Sonnet 4.5) added 30 lines of error-wrapping code that left tokens still visible in plaintext, the same baseline agent augmented with deep codebase context from Bito’s AI Architect traced the actual data flow, identified the real exposure point, and resolved the vulnerability with a 6-line function that works completely and correctly.

The challenge 

Provisioning tokens were being logged in plaintext in Teleport, exposing sensitive secrets to anyone with log file access. The fix required masking or obfuscating token values in all log output while preserving enough visibility for debugging (e.g., abc1**** instead of abc123456789). 

The key question wasn’t how to mask a token, it was where tokens actually enter the log system. Fixing the wrong interception point means the vulnerability persists. 

Why the baseline agent failed 

Claude Sonnet 4.5 took an indirect approach: it built a complex error-wrapping function (~30 lines) that attempted to sanitize tokens after they appeared in error messages. This included type preservation logic for trace.NotFound, trace.AccessDenied, and other error types. 

But the actual exposure came from log.Debugf() calls that directly output validateRequest.Token. Error wrapping never intercepts debug log statements. The coding agent’s entire approach targeted the wrong layer, like installing a water filter on the wrong pipe. 

⚠ ROOT CAUSE: The coding agent targeted error messages instead of log statements. The actual token exposure came from log.Debugf() calls that directly print token values, an interception point that error wrapping cannot reach.

How Bito’s AI Architect solved it 

Bito’s AI Architect does not explore a codebase file by file. Before generating any code, it constructs a knowledge graph of the repository: a structured representation of how modules, functions, and data flows relate to one another across the entire system. This graph captures not just what each file contains, but how values move between them, where they originate, and where they are consumed. 

Bito’s AI Architect’s architectural understanding revealed that log.Debugf() calls were the actual exposure points. The treatment agent implemented a simple, public MaskToken() function (~6 lines) and applied it directly at the two log call sites in trustedcluster.go. 

Prevention at the source (masking before logging) replaced remediation after the fact (cleaning up error messages). The result was simpler code, complete coverage, and a fix that actually addresses the vulnerability. 

KEY ARCHITECTURAL INSIGHT 

Prevention beats remediation. Masking tokens at the log.Debugf() call site is both simpler (~6 lines vs. ~30) and more complete than trying to sanitize error messages downstream. Bito’s AI Architect traced the actual data flow to find the real exposure point.

Head-to-head comparison 

 Claude Sonnet 4.5 (baseline agent) Bito’s AI Architect 
Code Exploration Focused on error handling — missed log.Debugf() as exposure point Traced token data flow — identified exact log statements 
Approach Indirect — error wrapping after tokens reach error messages (~30 lines) Direct — MaskToken() at call site before tokens reach logs (~6 lines) 
Coverage Incomplete — error wrapping doesn’t intercept debug log statements Complete — both log.Debugf() calls masked at source 
Complexity High — error type preservation, incomplete switch cases Low — single public function, applied at 2 call sites 
Task Outcome FAILED — tokens still logged in plaintext PASSED — tokens masked as abc1******** 

Conclusion 

The right fix at the wrong layer is no fix at all. This task comes down to a single question: not how to mask a token, but where in the system that masking needs to happen. Claude Sonnet 4.5 answered a plausible version of that question and got it wrong, producing a complex 30-line error-wrapping approach that never touched the actual exposure point. Bito’s AI Architect traced the token’s data flow through the full system context, identified the log.Debugf() call sites as the real vulnerability, and resolved it with a 6-line function that actually closes the gap. 

In a security context, fixing the wrong layer is indistinguishable from doing nothing. For engineering teams working on complex codebases, the bottleneck in catching subtle vulnerabilities is rarely the ability to write a fix. It is the system understanding to know exactly where that fix belongs. That is the gap Bito’s AI Architect closes. 

Picture of Anand Das

Anand Das

Anand is Co-founder and CTO of Bito. He leads technical strategy and engineering, and is our biggest user! Formerly, Anand was CTO of Eyeota, a data company acquired by Dun & Bradstreet. He is co-founder of PubMatic, where he led the building of an ad exchange system that handles over 1 Trillion bids per day.

Picture of Amar Goel

Amar Goel

Amar is the Co-founder and CEO of Bito. With a background in software engineering and economics, Amar is a serial entrepreneur and has founded multiple companies including the publicly traded PubMatic and Komli Media.

Written by developers for developers red heart icon

This article is brought to you by the Bito team.

Latest posts

Why Coding Agents Get Lost in Your Codebase (Even After Indexing Everything) 

The TPUT Implementation Claude Code Got Wrong and AI Architect Got Right

How to Integrate Bito’s AI Architect with Claude Code

How to Integrate Bito’s AI Architect with Cursor

The 9-File Security Hardening That Coding Agents Missed and AI Architect Nailed

Top posts

Why Coding Agents Get Lost in Your Codebase (Even After Indexing Everything) 

The TPUT Implementation Claude Code Got Wrong and AI Architect Got Right

How to Integrate Bito’s AI Architect with Claude Code

How to Integrate Bito’s AI Architect with Cursor

The 9-File Security Hardening That Coding Agents Missed and AI Architect Nailed

From the blog

The latest industry news, interviews, technologies, and resources.

Code Indexing

Why Coding Agents Get Lost in Your Codebase (Even After Indexing Everything) 

arrow bito ai
The TPUT Implementation Claude Code Got Wrong and AI Architect Got Right

The TPUT Implementation Claude Code Got Wrong and AI Architect Got Right

arrow bito ai
How to Integrate Bito's AI Architect with Claude Code

How to Integrate Bito’s AI Architect with Claude Code

arrow bito ai