Introduction to Google’s AI Plans
Recently, on the podcast Possible, co-hosted by LinkedIn co-founder Reid Hoffman, Demis Hassabis, CEO of Google DeepMind, revealed Google’s intention to combine its Gemini AI models with its Veo video-generating models. This integration aims to enhance Gemini’s understanding of the physical world.
According to Hassabis, “We’ve always built Gemini, our foundation model, to be multimodal from the beginning,” and this approach is driven by the vision of creating a “universal digital assistant” that can provide assistance in the real world.
## The Evolution of AI Models
The AI industry is progressively moving towards the development of “omni” models, which are capable of understanding and synthesizing various forms of media. Google’s latest Gemini models have the ability to generate audio, images, and text. Similarly, OpenAI’s default model in ChatGPT can natively create images, including Studio Ghibli-style art. Moreover, Amazon has announced plans to launch an “any-to-any” model later this year.
## Training Data for Omni Models
The development of omni models requires a vast amount of training data, including images, videos, audio, text, and more. Hassabis suggested that the video data for Veo is primarily sourced from YouTube, a platform owned by Google.
Hassabis explained, “By watching YouTube videos, [Veo 2] can figure out the physics of the world.”
Previously, Google informed TechCrunch that its models “may be” trained on “some” YouTube content, in accordance with its agreement with YouTube creators. It has been reported that Google broadened its terms of service last year to allow the company to tap into more data for training its AI models.
Source Link