New ETH Zurich Research Proves Your AI Coding Brokers are Failing As a result of Your AGENTS.md Recordsdata are too Detailed

Within the high-stakes world of AI, ‘Context Engineering’ has emerged as the most recent frontier for squeezing efficiency out of LLMs. Trade leaders have touted AGENTS.md (and its cousins like CLAUDE.md) as the last word configuration level for coding brokers—a repository-level ‘North Star’ injected into each dialog to information the AI via complicated codebases.

However a current examine from researchers at ETH Zurich simply dropped an enormous actuality test. The findings are fairly clear: in the event you aren’t deliberate along with your context information, you might be doubtless sabotaging your agent’s efficiency whereas paying a 20% premium for the privilege.

The Knowledge: Extra Tokens, Much less Success

The ETH Zurich analysis group analyzed coding brokers like Sonnet-4.5, GPT-5.2, and Qwen3-30B throughout established benchmarks and a novel set of real-world duties referred to as AGENTBENCH. The outcomes had been surprisingly lopsided:

The Auto-Generated Tax: Robotically generated context information really decreased success charges by roughly 3%.
The Price of ‘Assist‘: These information elevated inference prices by over 20% and necessitated extra reasoning steps to resolve the identical duties.
The Human Margin: Even human-written information solely offered a marginal 4% efficiency acquire.
The Intelligence Cap: Apparently, utilizing stronger fashions (like GPT-5.2) to generate these information didn’t yield higher outcomes. Stronger fashions usually have sufficient ‘parametric data’ of widespread libraries that the additional context turns into redundant noise.

Why ‘Good’ Context Fails

The analysis group highlights a behavioral entice: AI brokers are too obedient. Coding brokers are inclined to respect the directions present in context information, however when these necessities are pointless, they make the duty tougher.

As an illustration, the researchers discovered that codebase overviews and listing listings—a staple of most AGENTS.md information—didn’t assist brokers navigate sooner. Brokers are surprisingly good at discovering file constructions on their very own; studying a handbook itemizing simply consumes reasoning tokens and provides ‘psychological’ overhead. Moreover, LLM-generated information are sometimes redundant if you have already got first rate documentation elsewhere within the repo.

The New Guidelines of Context Engineering

To make context information really useful, it’s good to shift from ‘complete documentation’ to ‘surgical intervention.’

1. What to Embody (The ‘Important Few’)

The Technical Stack & Intent: Clarify the ‘What’ and the ‘Why.’ Assist the agent perceive the aim of the challenge and its structure (e.g., a monorepo construction).
Non-Apparent Tooling: That is the place AGENTS.md shines. Specify how one can construct, check, and confirm modifications utilizing particular instruments like uv as a substitute of pip or bun as a substitute of npm.
The Multiplier Impact: The info exhibits that directions are adopted; instruments talked about in a context file are used considerably extra usually. For instance, the device uv was used 160x extra ceaselessly (1.6 instances per occasion vs. 0.01) when explicitly talked about.+1

2. What to Exclude (The ‘Noise’)

Detailed Listing Timber: Skip them. Brokers can discover the information they want with no map.
Type Guides: Don’t waste tokens telling an agent to “use camelCase.” Use deterministic linters and formatters as a substitute—they’re cheaper, sooner, and extra dependable.
Job-Particular Directions: Keep away from guidelines that solely apply to a fraction of your points.
Unvetted Auto-Content material: Don’t let an agent write its personal context file with no human evaluate. The examine proves that ‘stronger’ fashions don’t essentially make higher guides.

3. How you can Construction It

Hold it Lean: The final consensus for high-performance context information is beneath 300 traces. Skilled groups usually hold theirs even tighter—beneath 60 traces. Each line counts as a result of each line is injected into each session.
Progressive Disclosure: Don’t put every part within the root file. Use the primary file to level the agent to separate, task-specific documentation (e.g., agent_docs/testing.md) solely when related.
Pointers Over Copies: As a substitute of embedding code snippets that can ultimately go stale, use pointers (e.g., file:line) to indicate the agent the place to seek out design patterns or particular interfaces.

Key Takeaways

Unfavorable Affect of Auto-Era: LLM-generated context information have a tendency to scale back job success charges by roughly 3% on common in comparison with offering no repository context in any respect.
Important Price Will increase: Together with context information will increase inference prices by over 20% and results in a better variety of steps required for brokers to finish duties.
Minimal Human Profit: Whereas human-written (developer-provided) context information carry out higher than auto-generated ones, they solely provide a marginal enchancment of about 4% over utilizing no context information.
Redundancy and Navigation: Detailed codebase overviews in context information are largely redundant with current documentation and don’t assist brokers discover related information any sooner.
Strict Instruction Following: Brokers usually respect the directions in these information, however pointless or overly restrictive necessities usually make fixing real-world duties tougher for the mannequin.

Take a look at the Paper. Additionally, be at liberty to comply with us on Twitter and don’t overlook to hitch our 120k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you possibly can be a part of us on telegram as effectively.