Recently, Google DeepMind made an announcement about a new line of Gemini models, specifically designed for use in robotics. The Gemini Robotics model is a vision-language-action (VLA) model, which processes natural language and images to generate actions, thereby enabling robots to execute physical movements and tasks. Additionally, there is the Gemini Robotics-ER model, which is a reasoning model that improves capabilities such as recognizing objects and their components within 3D space.
Observe the capabilities of robots when utilizing these Gemini models, ranging from creating origami to preparing lunches, and even spelling out words using Scrabble tiles.
Source Link