← All case studies

Healthcare · Voice AI · Women's SRH · India · 2025 – 2026 · 2-3 months

Myna Voice AI

Engineered Myna Voice: A low-latency conversational voice AI enabling phone-based SRH care for low-literacy rural women.

Myna Voice AI

5,269

Total Calls Completed

588

Unique Callers

2,100+

Monthly Call Volume (May '26)

Myna Voice AI product screens

The Problem

Rural women in India heavily rely on voice notes in WhatsApp conversations with Myna Saheli, revealing a strong preference for speaking over typing—especially among low-literacy users who struggle with text-based interfaces. While the WhatsApp bot handles text and voice messages asynchronously, many users needed real-time voice support for urgent health questions, follow-up calls, and appointment-related conversations. Traditional IVR systems were too rigid, and early voice AI experiments using Exotel appflows suffered from 3-4 second latency even for simple greetings like 'hi' or 'hello,' breaking the natural flow of conversation and reducing trust. Third-party platforms (Bolna AI, Millis AI, ElevenLabs) either didn't meet latency requirements or were prohibitively expensive for scale.

Our Approach

  1. 01

    Architected a custom voice AI pipeline to replace the high-latency Exotel appflows system: incoming audio → Sarvam AI STT (speech-to-text) → jargon filtering → OpenAI LLM with RAG-based medical knowledge base → Sarvam AI TTS (text-to-speech) → Exotel voice delivery.

  2. 02

    Attempted latency optimization via caching generic greeting responses ('hi,' 'hello'), but found caching unreliable for natural conversation and unable to handle the variability of real user queries.

  3. 03

    Evaluated and ultimately rejected Bolna AI, Millis AI, and ElevenLabs due to cost constraints, inflexibility, or failure to meet sub-second response requirements for real-time healthcare conversations.

  4. 04

    Rebuilt the entire voice infrastructure on Pipecat (chosen over LiveKit for its simpler learning curve and faster implementation), achieving near-instant response times while maintaining full conversation context and database storage of all call transcripts.

  5. 05

    Integrated WhatsApp as the trigger mechanism: users send a specific command in their Myna Saheli chat to initiate an outbound voice call from the system, seamlessly bridging text and voice channels within the same care journey.

  6. 06

    Implemented intent-based orchestration to route conversations dynamically—question answering, appointment scheduling, follow-up reminders, and crisis detection—with prompt guardrails ensuring all responses remain focused on SRH topics.

  7. 07

    Built a unified analytics layer storing every voice interaction in PostgreSQL, enabling Mixpanel-powered tracking of call completion rates, average duration, repeat engagement patterns, and language-split analysis (Hindi, Marathi, English).

  8. 08

    Deployed the voice AI on AWS with auto-scaling worker pools to handle peak call volumes (2,100+ calls/month by May 2026), maintaining sub-second latency during high-traffic appointment reminder campaigns.

Outcome

Myna Voice AI successfully bridged the literacy and engagement gap, enabling 588 unique low-literacy women to access SRH care through 5,269 phone-based conversations. With an average of 9 repeat interactions per user and 104-second call durations, the platform proved that voice-first healthcare can drive sustained engagement where text-based apps fail. Monthly call volumes grew from pilot usage to over 2,100 calls by May 2026, reducing counselor workload for routine questions, improving follow-up completion rates, and enabling healthcare access for users without smartphones—directly supporting the foundation's mission to reach underserved women through the channel they trust most: their voice.

Next case study