Google’s recently launched Gemini 2.5 Pro has firmly established itself by taking the top spot in the WebDev Arena, a platform for benchmarking the performance of AI models in coding. This remarkable result comes as Google continues its ambitious quest to position this AI model as a leader in both coding and reasoning tasks.
Since its release earlier this year, Gemini 2.5 Pro has not only held the number one spot, but has done so in a variety of categories including coding, style management, and creative writing. The model boasts a massive context window of a whopping 3.7 million tokens, soon to be expanded to 128 million. This allows it to tackle complex codebases and challenging projects that would put even the strongest competitors to shame. In comparison, robust models like ChatGPT and Claude XNUMX can only handle up to XNUMXK tokens.
What really sets Gemini apart is that it has the highest ‘IQ’ of any AI model. TrackingAI subjected the model to formal MENSA testing using standardized questions. Gemini 2.5 Pro outperformed its competitors, even with custom questions that are not publicly available in training data. With an IQ of 115 in offline tests, it is among the ‘smartest of the smart’, as the average human intelligence ranges between 85 and 114. However, this should be taken with a grain of salt, as AI systems do not have the same kind of intelligence as humans.
When it comes to benchmarks specifically for AI Gemini 2.5 Pro scored an impressive 86,7% on the AIME 2025 math test and 84,0% on the GPQA science assessment. On the Humanity's Last Exam (HLE), the model scored 18,8%, putting it well ahead of OpenAI's o3 mini (14%) and Claude 3.7 Sonnet (8,9%).
Gemini 2.5 Pro is now available for free (with limitations) to all users, and Google has described this release as an “experimental version” in a family of “thinking models” designed to think about answers rather than simply generate text. While it doesn't support every benchmarking wins, the model attracts attention from developers for its versatility. For example, it generated as many as 1000 lines of code to fix a broken HTML5 code, resulting in superior quality and understanding compared to Claude 3.7 Sonnet.
For working developers, Gemini 2.5 Pro imports $2,50 per million tokens and exports $15,00 per million tokens, making it a cost-effective choice compared to its competitors while offering impressive capabilities. The model can handle up to 30.000 lines of code in its advanced plan, making it suitable for enterprise-level projects. Additionally, its multimodal capabilities—working with text, code, audio, images, and video—are a rock-solid asset that other models can’t match.
With a dash of humor: who would have thought that AI could not only code, but also get creative with a dozen other media?
What makes Gemini 2.5 Pro so special compared to other AI models?
With its massive context window of up to two million tokens, it can handle more complex and larger projects than its competitors, making it a differentiator in the coding arena.
How does Gemini's IQ score compare to human intelligence?
Gemini has an IQ score of 115, which means it scores above the average human level, but this is more of a metaphor for achievement than a direct comparison with human intelligence.
Is Gemini 2.5 Pro available for free for all users?
Yes, the model is available for free with certain usage restrictions, making it accessible to a wide range of developers.