Monarch Voice API
A voice API for African AI systems that can speak and listen.
Today we’re releasing the Monarch Voice API.
Voice is becoming a core part of software. Products increasingly need to generate speech, transcribe audio, support voice notes, create spoken content, handle multilingual communication, and make interfaces more usable across a wider range of contexts. For many teams, this is no longer an edge case. It is part of the normal product surface.
Monarch Voice API is our voice layer for that shift.
It gives developers, startups, and businesses a way to build speech generation and transcription directly into their products and workflows. That includes customer support systems, education products, internal tools, media workflows, assistants, field operations, automation systems, and any product where voice input or output improves the experience.
This release expands what can be built through the API and makes it possible to treat voice as a normal application primitive .
Workflows
Monarch Voice API launches with support for three core voice workflows:
Speech generation from text
Speech-to-text transcription
Voice discovery and selection
These are foundational building blocks for a very wide range of applications.
A product can generate spoken onboarding, explanations, responses, alerts, lessons, or narration from text. It can accept audio from users, customers, teams, or recorded systems and turn that into structured text for search, automation, review, analytics, or downstream processing. It can also select from a growing voice catalog depending on tone, use case, language, and audience.
This is useful across both direct product experiences and internal business workflows.
Samples
Here are a few voice samples to show how different voices, and speech styles can be used across real products and workflows.
Sample 1
Sample 2
Sample 3
Voice is not only about whether speech can be generated or transcribed. It is also about tone, clarity, pacing, pronunciation, and fit for the actual use case. Listening to the voices directly makes it easier to choose what works for a given product, workflow, audience, or market.
We’ll continue improving language support, voice quality, pronunciation handling, and the overall usefulness of the API as more teams build with it.
Languages
Monarch Voice API launches with broad multilingual support across both speech generation and transcription.
Monarch Voice API supports a growing set of African language workflows. Current support includes 13 African languages at launch:
Swahili, Afrikaans, Chichewa, Hausa, Igbo, Somali, Luo, Wolof, Xhosa, Zulu, Northern Sotho, Ganda, and Amharic.
For speech generation, the current stack also supports 32 global languages. These include:
English, Japanese, Chinese, German, Hindi, French, Korean, Portuguese, Italian, Spanish, Indonesian, Dutch, Turkish, Filipino, Polish, Swedish, Bulgarian, Romanian, Arabic, Czech, Greek, Finnish, Croatian, Malay, Slovak, Danish, Tamil, Ukrainian, Russian, Hungarian, Norwegian, and Vietnamese.
For transcription, the current stack supports 90+ languages, making it useful for multilingual audio workflows, uploaded recordings, voice-first interfaces, call analysis, internal operations, and products serving users across multiple language environments.
That matters because voice products are not only about raw language count. They are about whether the system can be useful in the actual linguistic environments people work in. Many businesses, products, and teams operate across a mix of major global languages and regionally important local ones. In practice, that means a useful voice API needs to support both broad international usage and more specific language realities.
Applications
Voice changes what kinds of software can be built and how software can be used.
For startups, it lowers the barrier to building products that speak, listen, transcribe, narrate, guide, and respond. That opens up new kinds of product experiences without requiring a team to build a full voice stack internally.
For businesses, it creates practical workflows that can save time, improve usability, and expand how information moves through the organization. Spoken customer input can become structured text. Internal recordings can become searchable. Text-heavy material can become audio. Support systems can work across both typed and spoken interactions. Products can become easier to use in environments where audio is more natural than typing.
For media and content workflows, voice makes it easier to turn scripts, articles, lessons, guides, or written materials into spoken assets. For operations and field workflows, it creates a way to collect and process spoken updates more efficiently. For education and training, it makes it easier to deliver content through both text and audio channels.
This is not a niche capability anymore. It is becoming part of the basic infrastructure of modern software.
The API
Monarch Voice API is now available through the following public endpoints:
POST /v1/voice/speech
Generate speech from text.
POST /v1/voice/transcribe
Transcribe audio into text.
GET /v1/voice/voices
List available voices.
These endpoints cover the main interaction patterns required to add voice output, voice input, and voice selection into real products and workflows.
Here’s how it works:
A developer sends text to the speech endpoint and receives generated audio.
A developer sends audio to the transcription endpoint and receives text.
A developer queries the voices endpoint to discover available voices and choose.
That is the core interaction model, but the practical range of what can be built on top of it is much broader.
A support product can accept voice notes, transcribe them, and route them through customer systems.
A startup can add spoken output to assistants, onboarding flows, or product experiences.
A learning product can convert written content into listenable lessons.
A content or media team can generate narration.
An internal workflow tool can turn recorded meetings, updates, or field reports into searchable text.
A business operating across different markets can support multilingual speech workflows.
Availability
Monarch Voice API is now live.
You can start using it now and explore the API docs to integrate speech generation, transcription, and voice workflows into your product.