On the edge, our E2B and E4B fashions redefine on-device utility, prioritizing multimodal capabilities, low-latency processing and seamless ecosystem integration over uncooked parameter rely.
Highly effective, accessible, open
To energy the following era of pioneering analysis and merchandise, we have sized the Gemma 4 fashions particularly to run and fine-tune effectively on {hardware} — from billions of Android gadgets worldwide, to laptop computer GPUs, all the way in which as much as developer workstations and accelerators.
By utilizing these extremely optimized fashions, you may fine-tune Gemma 4 to realize state-of-the-art efficiency in your particular duties. We have already seen unbelievable success with this method; for example, INSAIT created a pioneering Bulgarian-first language mannequin (BgGPT), and we labored with Yale College on Cell2Sentence-Scale to find new pathways for most cancers remedy, amongst many others.
Here’s what makes Gemma 4 our most succesful open mannequin household but:
- Superior reasoning: Able to multi-step planning and deep logic, Gemma 4 demonstrates important enhancements in math and instruction-following benchmarks that require it.
- Agentic workflows: Native assist for function-calling, structured JSON output, and native system directions allows you to construct autonomous brokers that may work together with completely different instruments and APIs and execute workflows reliably.
- Code era: Gemma 4 helps high-quality offline code, turning your workstation right into a local-first AI code assistant.
- Imaginative and prescient and audio: All fashions natively course of video and pictures, supporting variable resolutions, and excelling at visible duties like OCR and chart understanding. Moreover, the E2B and E4B fashions characteristic native audio enter for speech recognition and understanding.
- Longer context: Course of long-form content material seamlessly. The sting fashions characteristic a 128K context window, whereas the bigger fashions supply as much as 256K, permitting you to go repositories or lengthy paperwork in a single immediate.
- 140+ languages: Natively educated on over 140 languages, Gemma 4 helps builders construct inclusive, high-performance purposes for a world viewers.
Versatile fashions for various {hardware}
We’re releasing the Gemma 4 mannequin weights in sizes tailor-made for particular {hardware} and use circumstances, making certain you get frontier-class reasoning wherever you want it:
26B and 31B fashions: Frontier intelligence, offline in your private computer systems
Optimized to offer researchers and builders with state-of-the-art reasoning on accessible {hardware}, our unquantized bfloat16 weights match effectively on a single 80GB NVIDIA H100 GPU. For native setups, quantized variations run natively on client GPUs to energy your IDEs, coding assistants and agentic workflows. Our 26B Combination of Specialists (MoE) deal with latency, activating solely 3.8 billion of its whole parameters throughout inference to ship exceptionally quick tokens-per-second, whereas our 31B Dense is maximizing uncooked high quality and supplies a strong basis for fine-tuning.








