Skip to main content

The focus of generative AI has primarily been on text-based interfaces for generating text, images, and other content. However, the next significant trend appears to be voice-based AI, and it is gaining momentum rapidly. In a recent development, Google has announced that it will be integrating Chirp 3, its advanced speech-to-text and HD text-to-speech models, into its Vertex AI development platform, starting next week.

Previously, Google had quietly announced that Chirp 3 would be introducing 8 new voices for 31 languages. The potential applications of this platform include building voice assistants, creating audiobooks, developing support agents, and generating voice-overs for videos. This announcement was made at an event held at Google’s DeepMind offices in London.

Google’s efforts in voice AI are coinciding with similar advancements by other companies. Recently, Sesame, the startup behind the highly realistic AI voices “Maya” and “Miles,” announced the launch of its model for developers to build customized apps and services using its technology.

Notably, Google will be imposing usage restrictions on Chirp 3 to prevent misuse. According to Thomas Kurian, CEO of Google Cloud, the company is working with its safety team to address these concerns. “We’re just working through some of these things with our safety team,” he stated at a news event.

ElevenLabs, a major startup in the AI voice services sector, has raised hundreds of millions of dollars in funding to expand its operations. This investment will likely contribute to the growth and development of AI voice services in the industry.

The integration of Chirp 3 into Vertex AI will bring it alongside other cutting-edge AI models, including newer versions of Google’s flagship LLM, Gemini, as well as its image-generation model, Imagen, and the video generation tool, Veo 2. The latter is a pricey service, costing $0.50 per second.

While it remains to be seen whether Chirp 3 will be able to produce voices as realistic as those generated by other AI models, such as Sesame’s, Demis Hassabis, CEO of DeepMind, emphasized that this is a long-term effort. “In the near term… this idea that [AI is] a silver bullet to everything in the next couple of years, I don’t see that happening just yet. We’re still quite a few years away from something like AGI happening,” he said.

Hassabis further noted that the impact of AI will be felt over the next decade, making this an interesting moment in time. As he said, “It’s going to change things… over the next decade, so the medium to longer term.”

Google launched Vertex AI in 2021 as a platform for developers to build machine learning services in the cloud. This was before the surge in interest in AI and generative AI, which followed the launch of OpenAI’s GPT services. Since then, the company has been focusing on Vertex AI as it attempts to catch up with other companies, such as Microsoft and Amazon, which are also developing generative AI tooling for developers.

In addition to building generative AI models using Gemini, developers can use Vertex AI to classify data, train models, and set up models for production. It will be interesting to see whether Google expands its walled garden to include models beyond those created by the company itself.

Google has been developing its “Chirp” voice services for years, dating back to its early efforts to compete with Amazon’s Alexa service. The company has been using the name “Chirp” as a code name for its voice-based initiatives since 2016.


Source Link