Indian AI start-up Sarvam has launched a new version of its text-to-speech AI model with improvements in natural speech generation across Indian regions, scripts, and accents.
The new model, called Bulbul V3, offers more than 35 high-quality voices sourced from professional voice artists with support for over 11 Indian languages, Sarvam said in a blog post on Thursday, February 5. The company plans to extend support for all 22 scheduled Indian languages in the near future.
Bulbul V3 is built on top of a large language model (LLM) that analyses text and converts it into AI-generated speech with prosodic elements such as pauses, emphasis, pacing, and tone modulation, making the output sound more natural. In low-latency streaming output mode, the AI model lets users generate and play back audio in real-time.
“This is critical for conversational applications, live interactions, and any experience where responsiveness directly impacts user engagement,” Sarvam said. “Indian speech is complex by default. People switch languages mid-sentence. Accents vary by region. Names, abbreviations, and emotions matter as much as words. To work in India, voice has to handle all of this without breaking,” the start-up added.
The AI model also lets users clone and create custom AI-generated voices. The consent-based, voice cloning feature comes with built-in safeguards and is designed for high-volume enterprise use cases, as per Sarvam.
Bulbul V3 is Sarvam’s latest AI model launched as part of a planned 14-day rollout of AI tools, with one new release each day, in the run up to the widely anticipated India-AI Impact Summit 2026 to be held in New Delhi later this month. Sarvam is also one of 12 start-ups and entities that has been selected by the Indian government to develop sovereign LLMs under the Rs 10,300-crore India AI Mission. These indigenously developed AI models are expected to be unveiled at the Summit, which will be carried out from February 16 to February 20, 2026.
For those looking to experiment with the new model, Bulbul V3 can be accessed via the Sarvam Dashboard. The company is also offering developers unlimited API access to the new AI voice-generation model up till February 28, 2026.
As part of its testing, Sarvam said that Bulbul V3 was evaluated by an independent third-party in a blind A/B human listening study across 11 languages. The test involved comparing paired audio samples generated by Bulbul V3 and competitors’ speech models using identical input text.
While ElevenLabs v3 alpha topped the list for audio quality, Bulbul V3 outperformed Cartesia Sonic-3 and other rival models in general (full-band) evaluations, Sarvam said. The company further claimed that its new AI model beat all other models in 8 kHz (telephony) evaluations.
Bulbul V3 also showed “the lowest rates of word skips and mispronunciations, while maintaining comparable performance on extra-content errors,” Sarvam said.
Here’s a list of AI models and tools released by Sarvam in recent days:
Sarvam Vision: It is a 3 billion-parameter vision-language model capable of a range of visual understanding tasks, including image captioning, scene text recognition, chart interpretation, and complex table parsing.
Sarvam Samvaad: Conversational AI agents that can be integrated with customers’ enterprise tools in order to take action and deliver insights based on proprietary data.
Sarvam Audio: It is an audio extension of Sarvam 3B, a 3-billion-parameter language model pre-trained on English and 22 Indian languages.
Sarvam Dub: It is an AI dubbing model with zero-shot voice cloning, precise timing control, and powered by cross-lingual speech models that allows creators to dub podcasts, educational courses, etc in multiple Indian languages.
Contact to : xlf550402@gmail.com
Copyright © boyuanhulian 2020 - 2023. All Right Reserved.