Voice AI has become the new battleground in the race to build the future of human-machine interactions, as evidenced by Meta‘s recent acquisition of PlayAI and surging investment levels with $371M in equity funding so far this year, already on par with full-year 2024 totals.
Investors and big tech alike are betting that voice will be the dominant interface for interacting with AI, enabling a move away from traditional browser and mobile interfaces toward natural conversational interaction.
Recent technological advancements have made this vision increasingly viable, with voice capabilities now delivering near-instantaneous responses with sub-300ms latency that matches human conversational flow. This speed breakthrough is critical to unlocking voice AI’s full potential, as Chris McCann at Race Capital, a backer of PlayAI, explains:
“Voice is how people naturally communicate – but most voice AI systems still sound robotic or have high latency in their responses. We believed fast, expressive voice tech would be critical to making AI feel human and useful in the enterprise, especially for IVR, customer support, and sales.”
With voice becoming an increasingly fundamental modality for the AI-powered future and big tech competing to win the AI device race, owning the building blocks that shape human-AI communication is becoming mission-critical. Expect a wave of acquisitions as companies scramble to secure voice AI capabilities.
Using CB Insights’ Mosaic score which measures company health, we identified the top M&A targets in the voice AI space and what makes them such compelling targets (see below graphic).
- Voice synthesis platform ElevenLabs tops the market with a Mosaic score of 955, making it an attractive acquisition target. Proprietary voice generation technology is becoming as valuable as foundational AI models, positioning the highest-quality voice synthesis as core infrastructure rather than a feature add-on.
- Enterprise-focused Cresta delivers immediate ROI, with some customers reporting 50% cost reductions in contact centers, and positioning it perfectly for companies looking to leverage voice AI to immediately impact enterprise productivity.
- Ultra-low latency startups like Cartesia have an edge, as their ability to deliver sub-100ms capabilities positions them as essential for truly conversational AI experiences that matches human conversation patterns.
Investors also see companies owning the full-stack as a having key technological advantage compared to those relying on third-party components. This was part of the rationale for investing into PlayAI according to Chris McCann of Race Capital:
“Most voice AI startups rely on open source or other third-party components. PlayAI built the full stack in-house—their own TTS engine, real-time streaming, and sub-100ms latency. That gave them full control and a clear technical edge, which let them power real-time agents for support, sales, and IVR across several Fortune 500s.”
As the AI arms race continues, acquisitions will continue to be focused on talent, tech, and infrastructure rather than existing revenues. Companies that secure advanced voice AI capabilities now will dominate the next phase of AI adoption – whether they integrate into their existing offerings or cash-in on selling the tooling back to others.
For information on reprint rights or other inquiries, please contact reprints@cbinsights.com.
If you aren’t already a client, sign up for a free trial to learn more about our platform.