Google has introduced Gemini 2.5, a new line of AI reasoning models that deliberates before responding to a query, on Tuesday.
The company is launching Gemini 2.5 Pro Experimental, a cutting-edge multimodal reasoning AI model, to initiate the new model series. This model, which Google claims is its most advanced to date, will be accessible on Tuesday via the Google AI Studio developer platform and the Gemini app for subscribers to the company’s $20-per-month AI plan, Gemini Advanced.
Going forward, Google has announced that all of its upcoming AI models will incorporate reasoning capabilities.
Since OpenAI released the first AI reasoning model, o1, in September 2024, the tech industry has been working to match or surpass its capabilities with their own models. Currently, Anthropic, DeepSeek, Google, and xAI all possess AI reasoning models that utilize extra computing power and time to fact-check and reason through problems before providing an answer, similar to the initial AI reasoning model.
The integration of reasoning techniques has enabled AI models to achieve significant breakthroughs in math and coding tasks. Many in the tech world believe that reasoning models will be essential for AI agents, which are autonomous systems capable of performing tasks with minimal human intervention. However, these models are also more expensive to develop and maintain.
Google has previously experimented with AI reasoning models, including a “thinking” version of Gemini released in December. Nevertheless, Gemini 2.5 represents the company’s most concerted effort yet to outperform OpenAI’s “o” series of models.
According to Google, Gemini 2.5 Pro outperforms its previous frontier AI models, as well as some leading competing models, in several benchmark tests. Specifically, the company designed Gemini 2.5 to excel in creating visually appealing web apps and agentic coding applications.
In the Aider Polyglot evaluation, which measures code editing capabilities, Google claims that Gemini 2.5 Pro achieves a score of 68.6%, surpassing top AI models from OpenAI, Anthropic, and DeepSeek.
However, in the SWE-bench Verified test, which assesses software development abilities, Gemini 2.5 Pro scores 63.8%, outperforming OpenAI’s o3-mini and DeepSeek’s R1 but underperforming Anthropic’s Claude 3.7 Sonnet, which achieved a score of 70.3%.
On the Humanity’s Last Exam, a multimodal test comprising thousands of crowdsourced questions related to mathematics, humanities, and natural sciences, Google states that Gemini 2.5 Pro scores 18.8%, performing better than most rival flagship models.
Initially, Gemini 2.5 Pro will have a 1 million token context window, enabling the AI model to process approximately 750,000 words in a single instance, which is longer than the entire “Lord of The Rings” book series. Furthermore, Gemini 2.5 Pro will soon support double the input length, with a 2 million token capacity.
Google has not disclosed the API pricing for Gemini 2.5 Pro, but the company plans to share more information in the coming weeks.
Source Link