Skip to main content

In a rather unexpected move, Anthropic utilized the popular game Pokémon to test its latest AI model, demonstrating its capabilities in a unique way.

According to a blog post released on Monday, Anthropic’s newest model, Claude 3.7 Sonnet, was put to the test on the classic Game Boy game Pokémon Red. The model was equipped with basic memory, screen pixel input, and function calls to press buttons and navigate the screen, enabling it to play Pokémon continuously.

One of the distinctive features of Claude 3.7 Sonnet is its ability to engage in “extended thinking,” allowing it to tackle complex problems by applying more computing power and taking more time, similar to models like OpenAI’s o3-mini and DeepSeek’s R1.

This capability proved to be particularly useful in Pokémon Red, as evidenced by the model’s performance.

In comparison to its predecessor, Claude 3.0 Sonnet, which was unable to progress beyond the starting house in Pallet Town, Claude 3.7 Sonnet successfully battled three Pokémon gym leaders and earned their respective badges.

Anthropic Pokemon Red
Image Credits:Anthropic

Although the exact amount of computing required for Claude 3.7 Sonnet to achieve these milestones is unclear, as well as the time it took to reach each one, Anthropic did reveal that the model performed 35,000 actions to reach the last gym leader, Surge.

It is likely that this information will be uncovered by a developer in the near future.

While Pokémon Red serves as more of a toy benchmark, there is a long history of using games for AI benchmarking purposes, with several new apps and platforms emerging in recent months to test models’ game-playing abilities on various titles, including Street Fighter and Pictionary.


Source Link