Skip to main content

The podcast recording and editing platform Podcastle has entered the AI-powered text-to-speech market with the launch of its own AI model, dubbed Asyncflow v1.0. To facilitate integration, an API for developers will also be made available, enabling them to seamlessly incorporate the text-to-speech model into their applications.

With the introduction of this new model, Podcastle now offers over 450 AI voices capable of narrating text. The company has developed the technology and model in a way that minimizes training and inference costs, thereby gaining a competitive edge. This approach allows Podcastle to provide a more cost-effective solution compared to its competitors.

By entering this market, Podcastle joins a growing list of startups, including ElevenLabs, Speechify, and WellSaid, that have developed technologies and AI models to convert text into voice clips narrated by AI. These innovations have far-reaching applications across various industries, including marketing, advertising, content creation, education, and corporate training.

According to Podcastle’s founder, Arto Yeritsyan, the company had always aspired to create a text-to-speech model but was initially hindered by the high costs and data requirements associated with development. However, recent advancements in large language models have enabled the company to overcome these challenges and develop a high-quality voice model without requiring extensive data.

“We had always envisioned building a robust text-to-speech model since our inception. Unfortunately, the development costs were prohibitively high. Fortunately, the recent breakthroughs in large language models have enabled us to develop a high-quality voice model without needing vast amounts of data,” Yeritsyan explained.

The company’s efforts were also facilitated by its successful $13.5 million Series A funding round last year. This investment has provided the necessary resources for Podcastle to further develop its AI-powered offerings.

Notably, Podcastle’s pricing strategy is more competitive, with a charge of around $40 for 500 minutes of text-to-speech conversion, whereas ElevenLabs charges $99 for the same service. This significant price difference is likely to make Podcastle a more attractive option for businesses and individuals seeking affordable text-to-speech solutions.

In addition to the new text-to-speech model, Podcastle’s voice cloning feature is also undergoing an upgrade. The training process has been simplified, reducing the required recording time from approximately 70 sentences to just a few seconds. This improvement is made possible by the integration of Podcastle’s Magic Dust AI, which enhances audio recording quality.

Image Credits: Podcastle

During our testing of the new voice cloning feature, the generated voice sounded somewhat robotic, although it successfully mimicked our tone. Podcastle has acknowledged this limitation and plans to improve the feature over time. Additionally, users can train multiple samples of their voice to achieve different results, allowing for greater flexibility and customization.

Podcastle believes that its comprehensive suite of tools, which includes audio, video, podcast, and AI-powered narration capabilities, all accessible through a redesigned website, will provide a significant advantage over its competitors. Yeritsyan noted that while the majority of users currently utilize Podcastle for audio content creation, video is rapidly gaining popularity as well. This trend is expected to continue, with Podcastle well-positioned to meet the evolving needs of its users.


Source Link