On Tuesday, Google discloses Gemini 2.5, a new family of AI reasoning models that pauses to “think” before answering a question
To kick off the new own family of models, Google is intoducing Gemini 2.5 Pro Experimental, a multimodal, reasoning AI model that the enterprise claims is its most intelligent model yet. This model will be available on Tuesday in the enterprise’s developer platform, Google AI Studio, as well as in the Gemini app for subscribers to the enterprise’s $20-a-month AI plan, Gemini developed.
Moving ahead, Google stated all of its new AI models will have reasoning capabilities baked in.
Since OpenAI introducing the first AI reasoning model in September 2024, o1, the tech industry has raced to match or surpass that model’s capabilities with their own. Today, Anthropic, DeepSeek, Google, and AI all have AI reasoning models, which use more computing power and time to truth-check and cause by troubles earlier than delivering a solution.
Reasoning techniques have supported AI models acquire new heights in math and coding tasks. Many within the tech world believe reasoning models will be a key element of AI retailers, independent systems that may carry out tasks largely sans human intervention. However, those models are also more costly.
Google has experimented with AI reasoning models before, formerly liberating a “thinking” model of Gemini in December. But Gemini 2.5 represents the enterprise’s most critical try but at besting OpenAI’s “o” collection of models.
Google claims that Gemini 2.5 Pro outperforms its preceding frontier AI models, and some of the main competing AI models, on numerous benchmarks. Specifically, Google stated it designed Gemini 2.5 to excel at developing visually convincing web apps and agentic coding programs.
On an assessment measuring code editing, called Aider Polyglot, Google stated Gemini 2.5 Pro scores 68.6%, outperforming top AI models from OpenAI, Anthropic, and Chinese AI lab DeepSeek.
However, on another test measuring software dev abilities, SWE-bench Verified, Gemini 2.5 Pro scores 63.8%, outperforming OpenAI’s o3-mini and DeepSeek’s R1, but underperforming Anthropic’s Claude 3.7 Sonnet, which scored 70.3%.
On Humanity’s Last Exam, a multimodal test including of thousands crowdsourced questions referring to mathematics, humanities, and the natural sciences, Google says Gemini 2.5 Pro rankings 18.8%, acting better than maximum rival flagship models.
To begin, Google stated Gemini 2.5 Pro is shipping with a 1 million token context window, which means the AI model can take in more or less than 750,000 words in a single cross. That’s longer than the complete “Lord of The Rings” ebook series. And soon, Gemini 2.5 Pro will guide double the enter length (2 million tokens).
Google didn’t put up API pricing for Gemini 2.5 Pro. The enterprise stated it’ll share more in the coming weeks.