
Amid a push towards AI brokers, with each Anthropic and OpenAI transport multi-agent instruments this week, Anthropic is greater than prepared to indicate off a few of its extra daring AI coding experiments. However as standard with claims of AI-related achievement, you’ll discover some key caveats forward.
On Thursday, Anthropic researcher Nicholas Carlini revealed a weblog submit describing how he set 16 situations of the corporate’s Claude Opus 4.6 AI mannequin free on a shared codebase with minimal supervision, tasking them with constructing a C compiler from scratch.
Over two weeks and practically 2,000 Claude Code classes costing about $20,000 in API charges, the AI mannequin brokers reportedly produced a 100,000-line Rust-based compiler able to constructing a bootable Linux 6.9 kernel on x86, ARM, and RISC-V architectures.
Carlini, a analysis scientist on Anthropic’s Safeguards group who beforehand spent seven years at Google Mind and DeepMind, used a brand new function launched with Claude Opus 4.6 known as “agent groups.” In apply, every Claude occasion ran inside its personal Docker container, cloning a shared Git repository, claiming duties by writing lock recordsdata, then pushing accomplished code again upstream. No orchestration agent directed site visitors. Every occasion independently recognized no matter downside appeared most blatant to work on subsequent and began fixing it. When merge conflicts arose, the AI mannequin situations resolved them on their very own.
The ensuing compiler, which Anthropic has launched on GitHub, can compile a spread of main open supply tasks, together with PostgreSQL, SQLite, Redis, FFmpeg, and QEMU. It achieved a 99 p.c cross charge on the GCC torture take a look at suite and, in what Carlini known as “the developer’s final litmus take a look at,” compiled and ran Doom.
It’s value noting {that a} C compiler is a near-ideal process for semi-autonomous AI mannequin coding: The specification is a long time previous and well-defined, complete take a look at suites exist already, and there’s a known-good reference compiler to examine in opposition to. Most real-world software program tasks have none of those benefits. The onerous a part of most improvement isn’t writing code that passes assessments; it’s determining what the assessments must be within the first place.









