All posts
|6 min read

I Cut My AI Agent's Token Usage by 88%. The Fix Was Stupidly Simple.

My AI agent was burning 37,800 tokens just loading context. I got it down to 4,300 by changing what I fed it. No fine-tuning, no prompt engineering, no tricks.

I was watching my AI agent's token usage and noticed something ridiculous: 80% of the tokens in every session were spent just loading context. The agent was reading source files to understand my project before it could do any actual work.

On my mid-size TypeScript project (16 core files, 3,400 lines), the agent consumed 37,800 tokens every session just to orient itself. Then it would do 500 tokens of actual work. Next session — same 37,800 tokens again.

I fixed it by changing what the agent receives. Instead of raw source code, I feed it pre-processed knowledge. The result: 4,300 tokens for the same (actually better) understanding. Here's exactly what changed.

What 37,800 tokens of raw scanning looks like

This is the typical flow when an AI agent meets your codebase for the first time — or, more accurately, for the hundredth time because it forgot everything:

Step 1: Read package.json (what is this project?)     850 tokens
Step 2: Read tsconfig.json (what's the setup?)         340 tokens
Step 3: Read entry point (where does execution start?) 5,432 tokens
  → agent now reads worker.ts (531 lines, the biggest file)
Step 4: Follow imports from worker.ts                  3,766 tokens
  → reads routes.ts
Step 5: Follow more imports                            4,142 tokens
  → reads compute.ts
Step 6: Read config files to understand env setup      841 tokens
Step 7: Read database layer                            1,572 tokens
Step 8: Read MCP server                                3,264 tokens
Step 9: Read SDK types and client                      5,826 tokens
Step 10: Read indexers                                 7,613 tokens
Step 11: Re-read parts of worker.ts it forgot          ~2,700 tokens
Step 12: Grep for a specific function                  ~1,432 tokens
─────────────────────────────────────────────────────────────────
Total: ~37,800 tokens
Tool calls: 25-35
Time: 5-10 minutes of the agent fumbling around

And the brutal part? The agent still doesn't know: - Which files are risky (change frequently, have bug history) - Which files are secretly coupled (change together despite no imports) - What architectural decisions were made and why - What coding patterns the team follows

It read 37,800 tokens of raw code and missed the most important stuff.

What 4,300 tokens of structured knowledge looks like

Same codebase. Different format. Instead of feeding the agent source code, I feed it extracted knowledge:

cortex.yaml (project manifest):           180 tokens

constitution.md (project understanding): 850 tokens → "This is a TypeScript API server on Cloudflare Workers. 7 modules: config, db, api, scoring, mcp, indexer, sdk. Dual entry points: worker.ts (production) and routes.ts (dev). WARNING: worker.ts and routes.ts must stay in sync."

overview.md (architecture + data flow): 620 tokens → how requests flow through the system, what each module owns

graph.json (dependency graph): 1,200 tokens → every import edge, every call edge, module boundaries

symbols.json (symbol index): 980 tokens → every function, type, and class with file:line and signature

temporal.json (git behavioral data): 470 tokens → hotspots, coupling pairs, bug history, stability ratings ───────────────────────────────────────────────────────────────── Total: 4,300 tokens Tool calls: 1 (load the knowledge tier) Time: instant ```

88.6% fewer tokens. And this version is strictly better — it includes historical data and behavioral signals that raw code scanning can never provide.

Why the difference is so large

Source code is written for compilers and humans who read files sequentially. It contains:

  • Implementation details (for-loops, variable assignments, error handling branches)
  • Formatting (indentation, blank lines, comments)
  • Boilerplate (imports, type annotations, export statements)

For an AI agent trying to understand your architecture, 80% of every source file is noise. The agent doesn't need to read your for-loop to know that computeScore() takes a wallet address and returns a trust score. It needs the signature, the location, and the relationships.

Structured knowledge strips the noise and keeps the signal:

Raw source (500 lines, 5,432 tokens):
  export async function computeScore(
    wallet: string,
    transactions: Transaction[],
    ...hundreds of lines of implementation...
  ): Promise<Score> {
    // the agent reads all of this

Structured knowledge (1 line, ~50 tokens): computeScore | function | src/scoring/compute.ts:42-89 signature: (wallet: string, txns: Transaction[]) => Promise<Score> exported: true | callers: worker.ts, routes.ts, mcp/server.ts ```

50 tokens versus 5,432. The agent gets the same understanding of what this function is and how it's used. If it needs the implementation details for editing, it can read the specific file:line range — but only when it actually needs to.

The three-tier approach

Not all knowledge is needed at the same time. I organize it into three tiers:

HOT (~4,300 tokens) — loaded every session, always available: Architecture understanding, symbol index, dependency graph, temporal analysis. This is the "mental model" tier — everything the agent needs to navigate the codebase intelligently.

WARM (~700 tokens per module) — loaded when working in a specific area: Deep module documentation including purpose, data flow, gotchas, and edge cases. The agent loads this when it zooms into a module. Adding one module brings total to ~5,000 tokens.

COLD (on-demand) — searched when a specific question arises: Architectural decisions ("why did we choose Hono over Express?"), coding patterns ("how do we handle errors here?"), session logs ("what was changed last time?"). The agent queries these as needed.

Session start (HOT):               4,300 tokens  ← full codebase understanding
Working on scoring module (+WARM):  5,000 tokens  ← deep module knowledge added

vs. raw scan every session: 37,800 tokens ← and still missing temporal data ```

The weekly math

For a team running 5 AI sessions per week on this codebase:

Raw scanning:   5 × 37,800 = 189,000 tokens/week for context loading
Structured:     5 × 4,300  = 21,500 tokens/week for context loading
Saved:          167,500 tokens/week

The dollar savings are modest (a couple bucks at current API pricing). The real savings are cognitive: every session starts with understanding instead of exploration. The agent doesn't spend its first 25 tool calls figuring out your project. It starts working immediately.

How to build this for your own codebase

The approach isn't complicated:

  1. 1.**Extract symbols** from source using a parser (tree-sitter is the standard). Get every function, type, and class with its location and signature.
  2. 2.**Build the graph** from import statements. Map every dependency edge.
  3. 3.**Analyze git history** for co-change coupling, hotspots, and bug patterns.
  4. 4.**Generate summaries** per module — what it does, not how it's implemented.
  5. 5.**Store as flat files** alongside your project. Commit to git so everyone benefits.

CodeCortex automates all five steps (npm install -g codecortex-ai && codecortex init), but the principle works regardless of tooling. The key insight is that AI agents should receive structured knowledge, not raw source code.

Stop feeding your agent files. Start feeding it understanding.