Moonshot AI has launched Kimi K2.5 as an open supply visible agentic intelligence mannequin. It combines a big Combination of Specialists language spine, a local imaginative and prescient encoder, and a parallel multi agent system known as Agent Swarm. The mannequin targets coding, multimodal reasoning, and deep internet analysis with sturdy benchmark outcomes on agentic, imaginative and prescient, and coding suites.
Mannequin Structure and Coaching
Kimi K2.5 is a Combination of Specialists mannequin with 1T complete parameters and about 32B activated parameters per token. The community has 61 layers. It makes use of 384 consultants, with 8 consultants chosen per token plus 1 shared skilled. The eye hidden dimension is 7168 and there are 64 consideration heads.
The mannequin makes use of MLA consideration and the SwiGLU activation perform. The tokenizer vocabulary dimension is 160K. The utmost context size throughout coaching and inference is 256K tokens. This helps lengthy device traces, lengthy paperwork, and multi step analysis workflows.
Imaginative and prescient is dealt with by a MoonViT encoder with about 400M parameters. Visible tokens are educated along with textual content tokens in a single multimodal spine. Kimi K2.5 is obtained by continuous pretraining on about 15T tokens of combined imaginative and prescient and textual content information on high of Kimi K2 Base. This native multimodal coaching is essential as a result of the mannequin learns joint construction over photographs, paperwork, and language from the beginning.
The launched checkpoints assist commonplace inference stacks reminiscent of vLLM, SGLang, and KTransformers with transformers model 4.57.1 or newer. Quantized INT4 variants can be found, reusing the tactic from Kimi K2 Pondering. This enables deployment on commodity GPUs with decrease reminiscence budgets.
Coding and Multimodal Capabilities
Kimi K2.5 is positioned as a powerful open supply coding mannequin, particularly when code era will depend on visible context. The mannequin can learn UI mockups, design screenshots, and even movies, then emit structured frontend code with structure, styling, and interplay logic.
Moonshot exhibits examples the place the mannequin reads a puzzle picture, causes concerning the shortest path, after which writes code that produces a visualized answer. This demonstrates cross modal reasoning, the place the mannequin combines picture understanding, algorithmic planning, and code synthesis in a single movement.
As a result of K2.5 has a 256K context window, it may preserve lengthy specification histories in context. A sensible workflow for builders is to combine design property, product docs, and present code in a single immediate. The mannequin can then refactor or prolong the codebase whereas protecting visible constraints aligned with the unique design.


Agent Swarm and Parallel Agent Reinforcement Studying
A key characteristic of Kimi K2.5 is Agent Swarm. It is a multi agent system educated with Parallel Agent Reinforcement Studying, PARL. On this setup an orchestrator agent decomposes a posh purpose into many subtasks. It then spins up area particular sub brokers to work in parallel.
Kimi staff experiences that K2.5 can handle as much as 100 sub brokers inside a activity. It helps as much as 1,500 coordinated steps or device calls in a single run. This parallelism provides about 4.5 instances sooner completion in contrast with a single agent pipeline on large search duties.
PARL introduces a metric known as Crucial Steps. The system rewards insurance policies that cut back the variety of serial steps wanted to unravel the duty. This discourages naive sequential planning and pushes the agent to separate work into parallel branches whereas nonetheless sustaining consistency.
One instance by the Kimi staff is a analysis workflow the place the system wants to find many area of interest creators. The orchestrator makes use of Agent Swarm to spawn a lot of researcher brokers. Every agent explores totally different areas of the online, and the system merges outcomes right into a structured desk.


Benchmark Efficiency
On agentic benchmarks, Kimi K2.5 experiences sturdy numbers. On HLE Full with instruments the rating is 50.2. On BrowseComp with context administration the rating is 74.9. In Agent Swarm mode the BrowseComp rating will increase additional to 78.4 and WideSearch metrics additionally enhance. The Kimi staff compares these values with GPT 5.2, Claude 4.5, Gemini 3 Professional, and DeepSeek V3, and K2.5 exhibits the best scores among the many listed fashions on these particular agentic suites.
On imaginative and prescient and video benchmarks K2.5 additionally experiences excessive scores. MMMU Professional is 78.5 and VideoMMMU is 86.6. The mannequin performs nicely on OmniDocBench, OCRBench, WorldVQA, and different doc and scene understanding duties. These outcomes point out that the MoonViT encoder and lengthy context coaching are efficient for actual world multimodal issues, reminiscent of studying complicated paperwork and reasoning over movies.


For coding benchmarks it lists SWE Bench Verified at 76.8, SWE Bench Professional at 50.7, SWE Bench Multilingual at 73.0, Terminal Bench 2.0 at 50.8, and LiveCodeBench v6 at 85.0. These numbers place K2.5 among the many strongest open supply coding fashions presently reported on these duties.
On lengthy context language benchmarks, K2.5 reaches 61.0 on LongBench V2 and 70.0 on AA LCR beneath commonplace analysis settings. For reasoning benchmarks it achieves excessive scores on AIME 2025, HMMT 2025 February, GPQA Diamond, and MMLU Professional when utilized in pondering mode.
Key Takeaways
- Combination of Specialists at trillion scale: Kimi K2.5 makes use of a Combination of Specialists structure with 1T complete parameters and about 32B energetic parameters per token, 61 layers, 384 consultants, and 256K context size, optimized for lengthy multimodal and gear heavy workflows.
- Native multimodal coaching with MoonViT: The mannequin integrates a MoonViT imaginative and prescient encoder of about 400M parameters and is educated on about 15T combined imaginative and prescient and textual content tokens, so photographs, paperwork, and language are dealt with in a single unified spine.
- Parallel Agent Swarm with PARL: Agent Swarm, educated with Parallel Agent Reinforcement Studying, can coordinate as much as 100 sub brokers and about 1,500 device calls per activity, giving round 4.5 instances sooner execution versus a single agent on large analysis duties.
- Sturdy benchmark ends in coding, imaginative and prescient, and brokers: K2.5 experiences 76.8 on SWE Bench Verified, 78.5 on MMMU Professional, 86.6 on VideoMMMU, 50.2 on HLE Full with instruments, and 74.9 on BrowseComp, matching or exceeding listed closed fashions on a number of agentic and multimodal suites.
Take a look at the Technical particulars and Mannequin Weight. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as nicely.









