Google Introduces Faster Gemma 4 AI Models With Multi-Token Prediction

 


Google Introduces Faster Gemma 4 AI Models With Multi-Token Prediction

Google AI Gemma has announced a major performance upgrade for its Gemma 4 family of open AI models through the release of new Multi-Token Prediction (MTP) drafters, designed to significantly speed up AI inference without reducing output quality.

According to Google, the new MTP technology can deliver up to three times faster performance by using a speculative decoding system that predicts multiple tokens simultaneously instead of generating text one token at a time.

The company said the upgrade addresses one of the biggest challenges in artificial intelligence processing — latency caused by memory bandwidth limitations during inference. Traditional large language models often spend more time moving parameters between memory and processors than performing actual computations.

With MTP drafters, a lightweight “drafter” model predicts several future tokens in advance while the larger Gemma 4 target model verifies them in parallel. This approach allows AI systems to generate longer sequences much faster while maintaining the same reasoning and accuracy.

Google stated that the technology is especially useful for developers building real-time AI applications such as coding assistants, autonomous AI agents, voice systems, and mobile AI experiences running directly on devices.

The company highlighted several benefits of the new system, including faster local AI development on consumer GPUs, improved responsiveness for chat and voice applications, enhanced battery efficiency on edge devices, and zero degradation in AI reasoning quality.

Google also revealed that its Gemma 4 models have already surpassed 60 million downloads just weeks after launch, reflecting growing adoption among developers and researchers worldwide.

The MTP drafters are now available under the same open-source Apache 2.0 license as Gemma 4, allowing developers to access the technology through platforms including Hugging Face, Kaggle, Ollama, MLX, VLLM, and Google AI Edge Gallery for Android and iOS devices.