unrealspeech.com
Pricing plans
Detailed pricing plans are not available yet for this tool.
Detailed overview
API Docs
Pricing
Studio
Blog
High Volume Inquiry
Sign In
Fast & Affordable Text-to-Speech API
11x cheaper than Eleven Labs
Stream audio in 300ms
Request up to 10-hour audio
Includes per-word timestamps
Get a Free API Key →
Live Demo
Non-FictionFictionNewsBlogConversation
Amid the intricate labyrinth of human neurons lies a molecule that has confounded and fascinated scientists for ages: the neurotransmitter known as dopamine. Often heralded as the pleasure molecule, dopamine's role is far more nuanced than just mediating euphoria.
0/250
Voice
Sierra
AutumnMelodyHannahEmilyIvyKaitlynLunaWillowLaurenSierraNoahJasperCalebRonanEthanDanielZaneRowan
Language
🇺🇸 US English
🇺🇸 US English🇬🇧 UK English🇨🇳 Mandarin Chinese🇮🇳 Hindi🇪🇸 Spanish🇧🇷 Portuguese🇯🇵 Japanese🇫🇷 French🇮🇹 Italian
Format
Standard
StandardMP3Phone CallPCM µ-law
Normal
Speed
1.0
Pitch
Bitrate
192k
192k128k96k64k48k32k16k
Synthesize
Speed
0 s
Filesize
0 kb
Per-word Timestamps
Highlight words in sync with the speech.
Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU.
How To Get Timestamps
NEW: Use a websocket connection to stream both audio and timestamps using /streamWithTimestamps
Use /speech or /synthesisTasks with
TimestampType set to word or sentence (see docs).
The response will have a TimestampsUri containing JSON like this:
[
{
"word": "Our",
"start": 0.5,
"end": 0.6666666666666667,
"text_offset": 0
},
{
"word": "model",
"start": 0.6666666666666667,
"end": 1.075,
"text_offset": 4
},
{
"word": "achieves",
"start": 1.075,
"end": 1.5166666666666666,
"text_offset": 10
},
...
]
API Docs →
Demo Code →
The more you use it, the cheaper it gets
Start for free. Stay for discounts.
Number of Characters
Per month
625M
Audio Duration
Estimated
~14K hours
Enterprise Plan
Includes 625M characters
$4999
a month
High Volume Inquiry
Additional Usage
$8 per 1M characters
—
Comparison
vs
Amazon
AmazonMicrosoftGoogleElevenLabsPlay.ht
$4999
a month
$10.0K
a month
Scarlett
ScarlettDan
Score
Fiction
4.72
Non-Fiction
4.37
Conversation
3.91
Neural
NeuralStandard
Score
Fiction
3.00
Non-Fiction
2.51
Conversation
2.63
Price Comparison
Unreal
ElevenLabs
Play.ht
Amazon
Microsoft
Google
Calculated using public prices. Custom plans may be cheaper.
* 1 minute of audio = roughly 750 characters (~150 WPM)
Get Started for Free →
Need a custom solution? Contact
us
"Unreal Speech saved us 75% on our text-to-speech cost. It sounds better than Amazon Polly, and is much
cheaper. We switched over at high volumes, and often processing 10,000+ pages per hour. Unreal was able to
handle the volume, while delivering a high quality listening experience."
Derek Pankaew — CEO,
Listening.com
7B
Characters per month
0.3s
Latency
99.9%
Uptime
Code Samples
Get started quickly with our simple text-to-speech API.
/stream
/speech
/synthesisTasks
/streamWithTimestamps
SDK
Python
PythonNode.jsReact NativeBash
Copy
# Endpoint: /stream
# - Convert up to 1,000 characters ASAP
# - Synchronous, instant response (0.3s)
# - Streams back raw audio data (no timestamps)
import requests
response = requests.post(
'https://api.v8.unrealspeech.com/stream',
headers = {
'Authorization' : 'Bearer YOUR_API_KEY'
},
json = {
'Text': '''''', # Up to 1,000 characters
'VoiceId': '', # af, af_bella, af_sarah, am_adam, am_michael, bf_emma, bf_isabella, bm_george, bm_lewis, af_nicole, af_sky
'Bitrate': '192k', # 320k, 256k, 192k, ...
'Speed': '0', # -1.0 to 1.0
'Pitch': '1', # 0.5 to 1.5
'Codec': 'libmp3lame', # libmp3lame or pcm_mulaw
}
)
with open('audio.mp3', 'wb') as f:
f.write(response.content)
# Endpoint: /speech
# - Up to 3,000 characters
# - Synchronous, takes ~1s per 700 chars
# - Returns MP3 and JSON timestamp URLs
import requests
response = requests.post(
'https://api.v8.unrealspeech.com/speech',
headers = {
'Authorization' : 'Bearer YOUR_API_KEY'
},
json = {
'Text': '''''', # Up to 3,000 characters
'VoiceId': '', # Scarlett, Dan, Liv, Will, Amy
'Bitrate': '192k', # 320k, 256k, 192k, ...
'Speed': '0', # -1.0 to 1.0
'Pitch': '1', # 0.5 to 1.5
'TimestampType': 'sentence' # word or sentence
}
)
print(response.json())
# Endpoint: /synthesisTasks
# - Up to 500,000 characters
# - Asynchronous, takes ~1s per 800 chars
# - Returns a TaskId (use it to check status)
import requests
response = requests.post(
'https://api.v8.unrealspeech.com/synthesisTasks',
headers = {
'Authorization' : 'Bearer YOUR_API_KEY'
},
json = {
'Text': '''''', # Up to 500,000 characters
'VoiceId': '', # Scarlett, Dan, Liv, Will, Amy
'Bitrate': '192k', # 320k, 256k, 192k, ...
'Speed': '0', # -1.0 to 1.0
'Pitch': '1', # 0.5 to 1.5
'TimestampType': 'sentence', # word or sentence
#'CallbackUrl': '', # pinged when ready
}
)
print(response.json())
# Endpoint: /streamWithTimestamps
# - Convert text with real-time word-level timestamps
# - Streams audio with precise word timing information
# - Perfect for word-by-word highlighting
import websocket
import json
audio_chunks = []
timestamps = []
def on_message(ws, message):
try:
data = json.loads(message)
if data.get("type") == "progress" and "message" in data:
msg_val = data["message"]
if isinstance(msg_val, list):
timestamps.extend(msg_val)
elif data.get("type") == "complete":
ws.close()
except Exception:
# Assume binary audio data
audio_chunks.append(message)
def on_open(ws):
config = {
"Text": "",
"VoiceId": "",
"Model": "kokoro",
"Codec": "libmp3lame",
"SampleRate": 24000,
"Speed": 0,
"Bitrate": "192k",
"Pitch": 1.0,
"TimestampType": "word"
}
ws.send(json.dumps(config))
ws = websocket.WebSocketApp(
"wss://api.v8.unrealspeech.com/streamWithTimestamps",
on_open=on_open,
on_message=on_message
)
ws.run_forever()
// Endpoint: streamWithTimestamps (WebSocket)
# - Convert text with real-time word-level timestamps
# - Streams audio with precise word timing information
# - Perfect for word-by-word highlighting
import websocket
import json
import threading
import time
# Files to store output
audio_file = 'output.mp3'
timestamps_file = 'timestamps.json'
audio_chunks = []
timestamps = []
def on_message(ws, message):
if isinstance(message, str):
# Handle JSON messages (timestamps or status)
data = json.loads(message)
if data.get('type') == 'progress' and 'message' in data:
# Add timestamp information
msg_val = data['message']
if isinstance(msg_val, list):
timestamps.extend(msg_val)
elif data.get('type') == 'complete':
# Save the audio and timestamps when complete
with open(audio_file, 'wb') as f:
import io
buffer = io.BytesIO()
for chunk in audio_chunks:
buffer.write(chunk)
f.write(buffer.getvalue())
# Save timestamps (filter out empty ones)
filtered = [ts for ts in timestamps if ts and 'word' in ts]
with open(timestamps_file, 'w') as f:
json.dump(filtered, f, indent=2)
print("Audio saved to:", audio_file)
print("Timestamps saved to:", timestamps_file)
ws.close()
else:
# Handle binary audio data
audio_chunks.append(message)
def on_open(ws):
# Send configuration when connection opens
config = {
'Text': 'Welcome to Unreal Speech.',
'VoiceId': '',
'Model': 'kokoro',
'Codec': 'libmp3lame',
'SampleRate': 24000,
'Bitrate': '192k',
'Pitch': 1.0,
'TimestampType': 'word'
}
ws.send(json.dumps(config))
# Connect to WebSocket endpoint
websocket.enableTrace(False)
ws = websocket.WebSocketApp(
"wss://api.v8.unrealspeech.com/streamWithTimestamps",
header={'Authorization': 'Bearer YOUR_API_KEY'},
on_open=on_open,
on_message=on_message
)
# Run WebSocket connection
ws.run_forever()
// Short endpoint: /stream
// - Up to 1,000 characters
// - Synchronous, instant response (0.3s+)
// - Streams back raw audio data
const axios = require('axios');
const fs = require('fs');
const headers = {
'Authorization': 'Bearer YOUR_API_KEY',
};
const data = {
'Text': '', // Up to 1,000 characters
'VoiceId': '', // Scarlett, Dan, Liv, Will, Amy
'Bitrate': '192k', // 320k, 256k, 192k, ...
'Speed': '0', // -1.0 to 1.0
'Pitch': '1', // 0.5 to 1.5
'Codec': 'libmp3lame', // libmp3lame or pcm_mulaw
};
axios({
method: 'post',
url: 'https://api.v8.unrealspeech.com/stream',
headers: headers,
data: data,
responseType: 'stream'
}).then(function (response) {
response.data.pipe(fs.createWriteStream('audio.mp3'))
});
// Medium endpoint: /speech
// - Up to 3,000 characters
// - Synchronous, takes ~1s per 700 chars
// - Returns MP3 and JSON timestamp URLs
const axios = require('axios');
const headers = {
'Authorization': 'Bearer YOUR_API_KEY',
};
const data = {
'Text': '', // Up to 3,000 characters
'VoiceId': '', // Scarlett, Dan, Liv, Will, Amy
'Bitrate': '192k', // 320k, 256k, 192k, ...
'Speed': '0', // -1.0 to 1.0
'Pitch': '1', // 0.5 to 1.5
'TimestampType': 'sentence', // word or sentence
};
axios({
method: 'post',
url: 'https://api.v8.unrealspeech.com/speech',
headers: headers,
data: data,
}).then(function (response) {
console.log(JSON.stringify(response.data));
});
// Long endpoint: /synthesisTasks
// - Up to 500,000 characters
// - Asynchronous, takes ~1s per 800 chars
// - Returns a TaskId (use to check status)
const axios = require('axios');
const headers = {
'Authorization': 'Bearer YOUR_API_KEY',
};
const data = {
'Text': '', // Up to 500,000 characters
'VoiceId': '', // Scarlett, Dan, Liv, Will, Amy
'Bitrate': '192k', // 320k, 256k, 192k, ...
'Speed': '0', // -1.0 to 1.0
'Pitch': '1', // 0.5 to 1.5
'TimestampType': 'sentence', // word or sentence
//'CallbackUrl': '', // pinged when ready
};
axios({
method: 'post',
url: 'https://api.v8.unrealspeech.com/synthesisTasks',
headers: headers,
data: data,
}).then(function (response) {
console.log(JSON.stringify(response.data));
});
Sample Response
{
'SynthesisTask': {
'CreationTime': '2023-09-01T15:05:22.15Z',
'OutputUri': 'https://unreal-tts-live-demo.s3-us-west-1.amazonaws.com/d8ef514d.mp3',
'RequestCharacters': 14,
'TaskId': 'd8ef514d',
'TaskStatus': 'scheduled',
'TimestampsUri': 'https://unreal-tts-live-demo.s3-us-west-1.amazonaws.com/d8ef514d.json',
'VoiceId': 'Scarlett'
}
}
Check Task Status
Make a GET request with TaskId in the URL to check the
status.
const axios = require('axios');
const headers = {
'Authorization': 'Bearer YOUR_API_KEY',
};
axios({
method: 'get',
url: 'https://api.v8.unrealspeech.com/synthesisTasks/d8ef514d',
headers: headers,
}).then(function (response) {
console.log(JSON.stringify(response.data));
});
// Short endpoint: /stream
// - Up to 1,000 characters
// - Synchronous, instant response (0.3s+)
// - Streams back raw audio data
import React from "react";
import { View, Button, ActivityIndicator } from "react-native";
import { useUnrealSpeech, blobToDataURI } from "react-native-unrealspeech";
import { Audio } from "expo-av";
export default function App() {
const { stream, requestState } = useUnrealSpeech("YOUR_API_KEY");
const play = async (response: any) => {
try {
if (!response.ok) {
throw new Error("Network response was not ok");
}
const blobData = await response.blob();
const blob = new Blob([blobData], { type: "audio/mp3" });
const uri = await blobToDataURI(blob);
const { sound } = await Audio.Sound.createAsync({ uri }, {});
console.log(uri);
await sound.playAsync();
} catch (error) {
console.error("Error occurred while playing sound:", error);
}
};
const handleSpeak = async () => {
try {
const response = await stream(
"", // Up to 1,000 characters
"", // Scarlett, Dan, Liv, Will, Amy
"192k", // 320k, 256k, 192k, ...
0, // Speed: -1.0 to 1.0
1, // Pitch: 0.5 to 1.5
"libmp3lame" // Codec: libmp3lame or pcm_mulaw
);
play(response);
} catch (error) {
console.error("Error occurred while handling speak:", error);
}
};
return (
{requestState === "loading" ? (
) : (
)}
);
}
// Medium endpoint: /speech
// - Up to 3,000 characters
// - Synchronous, takes ~1s per 700 chars
// - Returns MP3 and JSON timestamp URLs
import React from "react";
import { StatusBar } from "expo-status-bar";
import { StyleSheet, Button, View, ActivityIndicator } from "react-native";
import { useUnrealSpeech } from "react-native-unrealspeech";
export default function App() {
const { speech, requestState } = useUnrealSpeech("YOUR_API_KEY");
const handleSpeak = async () => {
const mySpeech = await speech(
"", // Up to 3,000 characters
"", // Scarlett, Dan, Liv, Will, Amy
"192k", // 320k, 256k, 192k, ...
"sentence" // word or sentence
);
console.log(mySpeech);
};
return (
{requestState === "loading" ? (
) : (
)}
);
}
const styles = StyleSheet.create(
{
container: {
flex: 1,
backgroundColor: "#fff",
alignItems: "center",
justifyContent: "center"
}
}
);
// Long endpoint: /synthesisTasks
// - Up to 500,000 characters
// - Asynchronous, takes ~1s per 800 chars
// - Returns a TaskId (use to check status)
import React from "react";
import { StatusBar } from "expo-status-bar";
import { StyleSheet, Button, View, ActivityIndicator } from "react-native";
import { useUnrealSpeech } from "react-native-unrealspeech";
export default function App() {
const {
createSynthesisTask,
getSynthesisTaskStatus,
speech,
stream,
status,
requestState
} = useUnrealSpeech("YOUR_API_KEY");
const handleSpeak = async () => {
const task = await createSynthesisTask(
"", // Up to 500,000 characters
"", // Scarlett, Dan, Liv, Will, Amy
"192k", // 320k, 256k, 192k, ...
"sentence" // word or sentence
);
if (status) {
const response = await getSynthesisTaskStatus(status);
console.log(response);
}
};
return (
{requestState === "loading" ? (
) : (
)}
);
}
const styles = StyleSheet.create(
{
container: {
flex: 1,
backgroundColor: "#fff",
alignItems: "center",
justifyContent: "center"
}
}
);
# Short endpoint: /stream
# - Up to 1,000 characters
# - Synchronous, instant response (0.3s+)
# - Streams back raw audio data
curl -X POST "https://api.v8.unrealspeech.com/stream" -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" --data '{"Text": "", "VoiceId": "", "Bitrate": "128k", "Speed": "0", "Pitch": "1", "Codec": "libmp3lame"}' --output audio.mp3
# Medium endpoint: /speech
# - Up to 3,000 characters
# - Synchronous, takes ~1s per 700 chars
# - Returns MP3 and JSON timestamp URLs
curl -X POST "https://api.v8.unrealspeech.com/speech" -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" --data '{"Text": "", "VoiceId": "", "Bitrate": "192k", "Speed": "0", "Pitch": "1", "TimestampType": "sentence"}'
# Long endpoint: /synthesisTasks
# - Up to 500,000 characters
# - Asynchronous, takes ~1s per 800 chars
# - Returns a TaskId (use to check status)
curl -X POST "https://api.v8.unrealspeech.com/synthesisTasks" -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY" --data '{"Text": "", "VoiceId": "", "Bitrate": "128k", "Speed": "0", "Pitch": "1", "TimestampType": "sentence"}'
Sample Response
{
'SynthesisTask': {
'CreationTime': '2023-09-01T15:05:22.15Z',
'OutputUri': 'https://unreal-tts-live-demo.s3-us-west-1.amazonaws.com/d8ef514d.mp3',
'RequestCharacters': 14,
'TaskId': 'd8ef514d',
'TaskStatus': 'scheduled',
'TimestampsUri': 'https://unreal-tts-live-demo.s3-us-west-1.amazonaws.com/d8ef514d.json',
'VoiceId': 'Scarlett'
}
}
Check Task Status
Make a GET request with TaskId in the URL to check the
status.
curl -X GET "https://api.v8.unrealspeech.com/synthesisTasks/d8ef514d" -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_API_KEY"
API Docs →
Ready to dive in?
Get a Free API Key →
Super Fast Text-to-Speech API
Build real-time apps. Generate long-form audio fast.
/stream
Play audio in
0.3s
/synthesisTasks
Create 10-hour audio in
15 min
Get Started for Free →
FAQ
Do you offer voices in other
languages?
Yes, we provide 48 voices across 8 different languages, including US English, UK
English, Mandarin Chinese, Hindi, Spanish, Portuguese, Japanese, French and Italian.
Can I create custom voices (voice
cloning)?
Not right now, but we're working on it!
What happens if I use all of my monthly
characters?
Additional usage over the monthly allowance will be charged daily at the
rate of your current plan:
Basic – $16 per 1M characters
Plus – $12 per 1M characters
Pro – $10 per 1M characters
Enterprise – $8 per 1M characters
What happens to unused characters at the end
of the month?
Free plan – Characters are reset on the 1st of every month.
Paid plan – Unused characters roll over to the next billing cycle.
Can I use generated audio
commercially?
Yes, audio generated with Unreal Speech can be used commercially. The
following terms apply, based on your subscription plan:
Free plan – You must attribute Unreal Speech when publishing audio by including a link to
"unrealspeech.com" in the description.
Paid plan – You do not need to include any attribution.
How do I update my payment method?
Go to your Dashboard and choose "Manage
Subscription".
How do I cancel my subscription?
You can cancel your subscription at any time. Go to your Dashboard and choose "Manage Subscription".
Do you have an affiliate program?
Yes! You can earn 15% recurring on all paid referrals. Click here to sign up.
Made with ❤️ in SF Powered by Kokoro TTS
API Docs
Pricing
Studio
Contact
Blog
Compare
Browse AI Apps
Affiliate
Recent Posts
Integrating FastSpeech 2 for Text-to-Speech Synthesis with Fairseq and Hugging Face
Exploring the Potential of GPT-SoVITS-Fork for Text-to-Speech Applications
Exploring the GPT-SoVITS Kancolle Zuikaku TTS Model: A Comprehensive Guide
Exploring Voice Synthesis with ESPnet: A Deep Dive into the kan-bayashi_csmsc_fastspeech Model
Introducing OpenVoice: Revolutionizing Text-to-Speech with Instant Voice Cloning and Multilingual Capabilities
How to Leverage Twelve Labs API for Effortless YouTube Video Summaries, Chapters, and Highlights
API Docs
Pricing
Studio
Blog
Sign In
Sign In
Sign in with GoogleGoogleSign in with emailEmailBy continuing, you are indicating that you accept our Terms of Service and Privacy Policy.
Copied to clipboard!
---
API Docs
Pricing
Studio
Blog
Sign In
Pricing that Scales with You
Start for free. Stay for volume discounts.
Free
$0
250K characters
6 hours of audio
Select
Basic
$49 $4.99/mo
Discount for first 6 months
3M characters
67 hours of audio
Select
Plus
$499/mo
42M characters
933 hours of audio
Select
Pro
$1499/mo
150M characters
3K hours of audio
Select
Enterprise
$4999/mo
625M characters
14K hours of audio
Select
Custom
High Volume Inquiry
1B+ characters
Volume discounts
High Volume Inquiry
* Audio duration is approximate
* 1 minute of audio is roughly 750 characters
Made with ❤️ in SF Powered by Kokoro TTS
API Docs
Pricing
Studio
Contact
Blog
Compare
Browse AI Apps
Affiliate
Recent Posts
Integrating FastSpeech 2 for Text-to-Speech Synthesis with Fairseq and Hugging Face
Exploring the Potential of GPT-SoVITS-Fork for Text-to-Speech Applications
Exploring the GPT-SoVITS Kancolle Zuikaku TTS Model: A Comprehensive Guide
Exploring Voice Synthesis with ESPnet: A Deep Dive into the kan-bayashi_csmsc_fastspeech Model
Introducing OpenVoice: Revolutionizing Text-to-Speech with Instant Voice Cloning and Multilingual Capabilities
How to Leverage Twelve Labs API for Effortless YouTube Video Summaries, Chapters, and Highlights
API Docs
Pricing
Studio
Blog
Sign In
Sign In
Sign in with GoogleGoogleSign in with emailEmailBy continuing, you are indicating that you accept our Terms of Service and Privacy Policy.
Copied to clipboard!
---
Unreal Speech StudioStudio
Dark Mode
🇺🇸 US English
Language
🇺🇸 US English🇬🇧 UK English🇨🇳 Mandarin🇮🇳 Hindi🇪🇸 Spanish🇧🇷 Portuguese🇯🇵 Japanese🇫🇷 French🇮🇹 Italian
Chloe
Female
Welcome to Kokoro TTS Studio! This free, efficient text-to-speech application uses advanced AI to transform text into natural-sounding audio. You can type, paste, or edit text here.
It offers 48 natural-sounding AI voices across 8 languages: American, British, Chinese, Hindi, Spanish, Portuguese, Japanese and French.
For commercial use cases, please check out our API documentation.
Clear
388 / 500 characters
Generate
Get API Key
Your Voice-overs
Your recordings will appear here.
Kokoro TTS Studio: Free Online Text-to-Speech Demo
Welcome to Kokoro TTS Studio powered by Unreal Speech - the ultimate playground for the revolutionary 82M parameter open-source text-to-speech engine! Simply type your text, choose from our extensive library of 48 natural-sounding voices across 8 languages, and instantly generate high-quality speech that rivals premium commercial services. Convert text to speech in your browser, right now and download your audio files for free.
What is Kokoro TTS: The Free Open-Source Speech Generator
Kokoro TTS is a groundbreaking open-source text-to-speech model that's revolutionizing the AI voice landscape. With a remarkably tiny footprint of just 82 million parameters (a fraction of what other models use), Kokoro delivers astonishingly natural speech synthesis that outperforms models 5-15× its size in both quality and speed.
Why Kokoro TTS Is Making Waves in the AI Community
Incredibly Compact Yet Powerful: At just 82M parameters, Kokoro is dramatically smaller than competing models like XTTS v2 (467M) and MetaVoice (1.2B), yet produces equal or better voice quality
Lightning-Fast Performance: Generates speech up to 210× real-time on GPU and 3-11× real-time on CPU—making it perfect for real-time applications and batch processing
Resource-Efficient Design: Runs smoothly on consumer hardware without requiring expensive cloud infrastructure or specialized equipment
Truly Open Source & Free: Licensed under Apache 2.0, allowing both commercial and non-commercial use with no restrictions
Award-Winning Quality: Achieved 1st place in the HuggingFace TTS Spaces Arena for single-speaker speech quality, beating models many times its size
Multilingual Support: Speaks multiple languages fluently, including English (US/UK), French, Hindi, Spanish, Japanese, Chinese, Italian, and Portuguese
Try the live demo above to experience the remarkable quality yourself! Simply type your text above, select a voice, and click "Generate" to hear Kokoro TTS in action.
Explore 48 Voices Across 8 Languages
Kokoro TTS Studio supports English (US/UK), French, Hindi, Spanish, Japanese, Chinese, Italian, and Portuguese. Browse our extensive library of realistic voices spanning multiple languages and accents. Each voice has been carefully trained to deliver natural intonation and clarity:
Display Name
Voice ID
Language
Gender
Description
Hannah
af_bella
English (US)
Female
American female with reserved personality
Kaitlyn
af_nicole
English (US)
Female
American female with whisper-like voice, looks casual
Lauren
af_sarah
English (US)
Female
American female, probably an educator; confident
Sierra
af_sky
English (US)
Female
American female with high level of composure
Noah
am_adam
English (US)
Male
American male with confident personality
Daniel
am_michael
English (US)
Male
American male with confident personality
Chloe
bf_emma
English (UK)
Female
British female
Amelia
bf_isabella
English (UK)
Female
British female with calm personality
Edward
bm_george
English (UK)
Male
British male, mature voice
Oliver
bm_lewis
English (UK)
Male
British male with confident personality
Élodie
ff_siwis
French
Female
Young French female voice
Ananya
hf_alpha
Hindi
Female
Young Hindi female voice
Priya
hf_beta
Hindi
Female
Young Hindi female voice
Arjun
hm_omega
Hindi
Male
Young Hindi male voice
Sakura
jf_alpha
Japanese
Female
Young Japanese female voice
Hana
jf_gongitsune
Japanese
Female
Young Japanese female voice
Haruto
jm_kumo
Japanese
Male
Young Japanese male voice
Lucía
ef_dora
Spanish
Female
Young Spanish female voice
Mateo
em_alex
Spanish
Male
Young Spanish male voice
Giulia
if_sara
Italian
Female
Young Italian female voice
Luca
im_nicola
Italian
Male
Young Italian male voice
Mei
zf_xiaobei
Chinese
Female
Young Chinese female voice
Lian
zf_xiaoni
Chinese
Female
Young Chinese female voice
Wei
zm_yunjian
Chinese
Male
Young Chinese male voice
Camila
pf_dora
Portuguese
Female
Young Portuguese female voice
Thiago
pm_alex
Portuguese
Male
Young Portuguese male voice
And many more voices available! This demo features our most popular voices, with additional options continuously being added.
How Kokoro TTS Works: The Technical Breakthrough
Kokoro achieves its remarkable efficiency through a revolutionary architectural design that combines the best elements of StyleTTS 2 and iSTFTNet in a decoder-only approach:
Innovative Architecture
Hybrid Design: Merges StyleTTS 2's transformer-based decoder with iSTFTNet's efficient vocoder for optimal quality-to-size ratio
Decoder-Only Architecture: Eliminates the computationally expensive diffusion-based style modeling and separate text encoders that other models require
Streamlined Waveform Generation: Uses iSTFTNet for fast and efficient audio synthesis without quality compromise
High-Quality Training Data: Trained exclusively on carefully curated, permissive/non-copyrighted audio data focused on long-form narration
This innovative approach enables Kokoro to generate 24kHz high-fidelity audio with minimal computational resources, redefining what's possible in open-source text-to-speech technology.
Comparison with Traditional TTS Architectures
Unlike classical TTS models such as Tacotron 2 (which uses slow attention-based mel generation and requires a separate vocoder) or FastSpeech 2 (which relies on a two-stage pipeline and teacher-forced alignments), Kokoro's streamlined architecture generates speech in one efficient pass.
By removing diffusion processes and autoregressive bottlenecks, Kokoro achieves superior speed without sacrificing quality. This makes it uniquely positioned for both real-time applications and batch processing of long-form content.
Why Choose Kokoro TTS?
⚡ Unmatched Performance and Efficiency
Kokoro TTS delivers remarkable speed that makes it perfect for real-time applications and large-scale content production:
Outstanding GPU Performance: ~210× real-time on high-end GPUs (RTX 4090), ~90× real-time on consumer GPUs like the 3090 Ti
Impressive CPU Performance: 3-11× real-time on modern CPUs, making it viable even without dedicated graphics hardware
Ultra-Low Latency: Synthesizes typical sentences in just 40-70ms on GPU, enabling truly interactive applications
Exceptional Throughput: Handles 500+ simultaneous requests with response times around 2 seconds, ideal for high-traffic services
💸 Cost-Effective Alternative to Premium Services
Free and Open Alternative to ElevenLabs: Achieve professional-grade voice synthesis without expensive subscription fees
No Per-Character Pricing: Generate unlimited audio without worrying about usage-based pricing models
Local Processing Option: Run entirely on your own hardware without relying on internet connectivity or cloud services
Fully Commercial-Ready: Apache 2.0 license permits unrestricted use in commercial products and services
🎯 Perfect For a Wide Range of Applications
Content Creation: Generate professional voiceovers for videos, podcasts, YouTube content, and social media
Audiobook Production: Convert ebooks, articles, and long-form content to engaging audio in minutes instead of hours
Gaming & VR: Add dynamic voice lines to games and virtual reality experiences with minimal latency
Accessibility Tools: Build screen readers and assistive technology that sounds natural and engaging
Voice Assistants & Chatbots: Create responsive AI interfaces with human-like speech capabilities
E-Learning & Education: Develop engaging learning materials with clear, natural audio narration
IVR & Telephony Systems: Improve customer experience with natural-sounding automated phone systems
Localization & Dubbing: Translate and voice content across multiple languages efficiently
What Users Are Saying About Kokoro TTS
Kokoro TTS has garnered enthusiastic praise from developers, content creators, and AI enthusiasts across the community:
"This thing is crazy for 82M! I generated a six-hour audiobook from a full book in just four minutes. The consistency is incredible across long texts."
"Kokoro is the best open-source TTS I've used... really tiny, so with the right hardware it's really fast. Voice quality rivals commercial services I've paid for."
"I've tried Fastspeech, VoiceCraft, Coqui before... these required chunking input into short pieces and post-processing to remove pauses. Kokoro just works on long texts without too many issues."
"Voice is pleasant and for long texts it reads in a very stable manner without odd pauses or glitches. It's become my go-to for all my content creation needs."
"As someone building accessibility tools, Kokoro has been a game-changer. The voices sound natural enough that my users actually enjoy listening to them, unlike the robotic alternatives."
How to Use Kokoro TTS in Your Projects
Want to integrate Kokoro TTS into your own applications? You have several flexible options:
1. Unreal Speech API (Fastest & Easiest)
For production-ready implementation with minimal setup, use the Unreal Speech API powered by Kokoro TTS.
Note: You'll need to sign in to get a free API key, which you'll then find on your dashboard. See the API docs for more info.
# Endpoint: /stream
# - Convert up to 1,000 characters ASAP
# - Synchronous, instant response (0.3s)
# - Streams back raw audio data (no timestamps)
import requests
response = requests.post(
'https://api.v8.unrealspeech.com/stream',
headers = {
'Authorization' : 'Bearer YOUR_API_KEY'
},
json = {
'Text': '''Your text goes here''', # Up to 1,000 characters
'VoiceId': 'af_bella', # Choose from available voice IDs
'Bitrate': '192k', # 320k, 256k, 192k, ...
'Speed': '0', # -1.0 to 1.0
'Pitch': '1', # 0.5 to 1.5
'Codec': 'libmp3lame', # libmp3lame or pcm_mulaw
}
)
with open('audio.mp3', 'wb') as f:
f.write(response.content)
Why Choose Unreal Speech API:
11× cheaper than ElevenLabs
Stream audio in just 300ms
Request up to 10-hour audio files
Includes per-word timestamps
Production-ready infrastructure
No need to manage your own hardware
2. Python Implementation (Self-Hosted)
For those who prefer to run Kokoro locally or in their own infrastructure:
from kokoro import pipeline
# Create the TTS pipeline
audio_generator = pipeline(
"This is a demonstration of the Kokoro TTS system, which produces remarkably natural speech from a compact 82 million parameter model.",
voice="af_bella",
speed=1.0
)
# Process the generated audio
for _, _, audio in audio_generator:
with open("kokoro_demo.wav", "wb") as f:
f.write(audio)
3. Command Line Usage
For quick generation from the terminal:
kokoro-tts -v af_bella "Hello, this is Kokoro speaking. I'm a compact but powerful text-to-speech system." -o output.wav
System Requirements
Kokoro TTS is remarkably efficient, making it accessible on a wide range of hardware:
CPU: Modern multi-core CPU for real-time speeds (8+ cores recommended for optimal performance)
GPU: Even mid-range cards like GTX 1060 (6GB) can handle Kokoro efficiently, with high-end cards achieving 100-200× real-time speeds
Memory: ~2GB RAM for model and audio processing (more for handling very long texts)
Disk Space: ~350MB for model plus a few MB for voice files
Supported Platforms: Windows, macOS, Linux, and cloud environments
Kokoro TTS vs. Other Text-to-Speech Models
See how Kokoro compares to other popular TTS solutions:
Feature
Kokoro TTS
XTTS v2
MetaVoice
ElevenLabs
Model Size
82M parameters
467M parameters
1.2B parameters
Proprietary
Speed
Up to 210× real-time
~30× real-time
~20× real-time
Cloud-based
Local Deployment
✅ Yes
✅ Yes
✅ Yes
❌ No
Quality Ranking
1st in TTS Spaces Arena
Lower ranked
Lower ranked
High quality
Commercial License
✅ Apache 2.0
❌ Restricted
❌ Restricted
❌ Paid service
Voice Cloning
❌ No (without fine-tuning)
✅ Yes
✅ Yes
✅ Yes
Cost
Free (open-source)
Free
Free
Subscription-based
Long-form Handling
✅ Excellent
⚠️ Requires chunking
⚠️ Variable quality
✅ Good
Resource Usage
✅ Very low
⚠️ Moderate
❌ High
N/A (cloud)
Multilingual
✅ 8+ languages
✅ Multiple
✅ Multiple
✅ 29+ languages
Technical Deep Dive: What Makes Kokoro Special
For those interested in the technical details, Kokoro's success stems from several key innovations:
Efficient Architecture
Kokoro's hybrid model combines the strengths of StyleTTS 2 and iSTFTNet while eliminating their inefficiencies. By removing the diffusion-based style modeling of StyleTTS2 and using iSTFTNet for efficient waveform generation, Kokoro dramatically reduces complexity while preserving quality.
Unlike traditional two-stage TTS pipelines (text-to-spec followed by vocoding), Kokoro streamlines the process with a transformer-based decoder that can directly produce audio features with integrated vocoding. This avoids Tacotron's alignment issues and slow iterative output.
Benchmarks & Performance
Kokoro has proven its merit in head-to-head evaluations, achieving 1st place in the HuggingFace TTS Spaces Arena for single-speaker speech quality. Listeners consistently ranked Kokoro's output above much larger models in blind tests.
In Elo-style comparisons of naturalness, Kokoro-82M emerged as a top model, even beating systems trained on vastly more data. For example, "Fish Speech" (trained on ~1 million hours) failed to match Kokoro's naturalness, despite Kokoro being trained on <100 hours of curated data.
Training Efficiency
Kokoro's training process was remarkably cost-effective, requiring only ~500 GPU hours on A100 hardware (approximately $400). This efficiency demonstrates that with the right architecture and high-quality data, smaller models can achieve state-of-the-art results.
Limitations and Future Improvements
While Kokoro TTS is impressive, we believe in transparency about its current limitations:
Limited Expressiveness: Speech can sound somewhat neutral in emotional range compared to professional voice actors
No Built-in Voice Cloning: Cannot mimic new voices without fine-tuning (unlike some commercial options)
Multilingual Quality Variations: While supporting multiple languages, quality may vary across non-English languages
Short Input Quirks: Performs best with longer texts rather than single words or very short phrases
The Kokoro community is actively working on addressing these limitations in future updates, with plans for more expressive models and improved voice variety.
Get Started with Kokoro TTS Today
Try our live demo above and experience the future of open-source text-to-speech technology. With Kokoro TTS, you can generate professional-quality voiceovers, create accessible content, and build voice-enabled applications without breaking the bank.
Ready for Production Use?
For production-ready API access with enterprise reliability, ultra-fast response times, and cost-effective pricing, check out Unreal Speech - the premium Kokoro-powered TTS API that's:
11× cheaper than ElevenLabs
Streams audio in just 300ms
Supports requests up to 10 hours long
Includes precise per-word timestamps
Backed by enterprise-grade infrastructure
Get Started for Free →
Frequently Asked Questions About Kokoro TTS
What makes Kokoro TTS different from other text-to-speech services?
Kokoro TTS stands out for its remarkable efficiency—achieving professional-quality speech with just 82 million parameters (compared to models 5-15× larger). This lightweight design enables fast processing through our API while still outperforming much larger models in quality benchmarks. Our online demo lets you experience Kokoro's capabilities instantly and download the generated MP3s. Unlike most commercial services, the underlying Kokoro model is open-source under the Apache 2.0 license, while our Unreal Speech API provides a production-ready implementation with affordable pricing.
Which languages and voices does Kokoro TTS support?
Kokoro TTS currently offers 48 voices across 8 languages. You can generate speech in American English, British English, French, Hindi, Spanish, Japanese, Chinese, and Portuguese. Each language includes multiple male and female voices with different characteristics and speaking styles. The voice selection is constantly expanding, with regular updates adding new options and improving existing ones.
Can I download and use the generated speech files for my projects?
Yes, all audio generated by Kokoro TTS Studio can be freely downloaded as MP3 files and used in both personal and commercial projects. You can use these audio files for YouTube videos, podcasts, e-learning content, audiobooks, or any other application. The
following terms apply, based on your subscription plan:
Free plan – You must attribute Unreal Speech by including a link to "unrealspeech.com" in the description.
Paid plan – You do not need to include any attribution.
How do I get the best quality results from Kokoro TTS?
For optimal results with Kokoro TTS, use longer sentences or paragraphs rather than single words (the model performs better with context). Include proper punctuation to help with natural pausing and intonation. Experiment with different voices—some may pronounce certain words or phrases more naturally than others depending on your text. For professional applications requiring even higher quality or custom voices, consider Unreal Speech's API which builds upon Kokoro's technology with enterprise-grade reliability.
Can I run Kokoro TTS offline on my own computer?
Yes, Kokoro TTS can be installed and run locally on your computer without an internet connection. The model is small enough (about 350MB) to run efficiently on most modern computers, even without a dedicated GPU. For local installation, you can use the Python implementation (pip install kokoro) or command-line tools. This makes Kokoro ideal for privacy-conscious users, offline applications, or scenarios where consistent generation without reliance on external services is important.
Unreal Speech Studio
Powered by Unreal Speech
Help Center
Contact Us
Privacy Policy
Terms of Service
Sign In
Sign in to enable these extra features:
Download audio files
250,000 free characters
Upgrade to Pro
Get these features when you upgrade:
Download MP3 (high quality)
1 million free characters
30,000 chars per voiceover
Upgrade to Pro
3-day free trial, then $10/monthCancel anytime
Download Audio
By downloading, you agree to attribute Unreal Speech by including a link to unrealspeech.com in the description of any published content.
Download
---
API Docs
Pricing
Studio
Blog
Sign In
Dashboard
PersonalDeveloper
Current Plan
Free
Plan and Billings
Upgrade
Remaining Number of Characters
• • •
Got a question, or need help?
High Volume Inquiry →
Go through the onboarding again:
Start Onboarding →
We value your feedback!
Share your experience with our service on G2.com and earn a reward of 3 million characters! Contact us with your review confirmation to claim your reward.
Write a Review →
Unreal Speech Studio
Make studio-quality voice overs for podcasts, videos & more.
Open Studio →
API Key
Check out the docs for API info and code samples:
API Docs →
Made with ❤️ in SF Powered by Kokoro TTS
API Docs
Pricing
Studio
Contact
Blog
Compare
Browse AI Apps
Affiliate
Recent Posts
Integrating FastSpeech 2 for Text-to-Speech Synthesis with Fairseq and Hugging Face
Exploring the Potential of GPT-SoVITS-Fork for Text-to-Speech Applications
Exploring the GPT-SoVITS Kancolle Zuikaku TTS Model: A Comprehensive Guide
Exploring Voice Synthesis with ESPnet: A Deep Dive into the kan-bayashi_csmsc_fastspeech Model
Introducing OpenVoice: Revolutionizing Text-to-Speech with Instant Voice Cloning and Multilingual Capabilities
How to Leverage Twelve Labs API for Effortless YouTube Video Summaries, Chapters, and Highlights
API Docs
Pricing
Studio
Blog
Sign In
Sign In
Copied to clipboard!
Same category tools
Alternatives

