assemblyai.comAI tool

AssemblyAI

assemblyai.com
Pricing plans

Detailed pricing plans are not available yet for this tool.

Detailed overview

AssemblyAI Documentation: Real-time TranscriptionLiveKit SDK: Building Voice AgentsVoice Agent Best Practices GuideIntroducing Medical Mode: Purpose-built accuracy for medical terminology   Learn moreThe best way to build Voice AI appsToday’s top Voice AI companies rely on AssemblyAI’s speech-to-text and speech understanding models to launch groundbreaking products fast and scale with ease. Streaming Speech-to-Text Speech-to-Text Voice Agent Try stating information like names, dates, and address, along with technical data like codes, commands, formulas, and special formatting to see how our model performs... Universal-3 Pro Streaming Context-awareAudio tagsVerbatimKeytermsSpeaker rolesCode switchingSourceClinical evaluation history: 00:0001:59 1x "prompt": "Produce a transcript for a clinical history evaluation. It's important to capture medication and dosage accurately. Every disfluency is meaningful data. Include: fillers (um, uh, er, erm, ah, hmm, mhm, like, you know, I mean), repetitions (I I I, the the), restarts (I was- I went), stutters (th-that, b-but, no-not), and informal speech (gonna, wanna, gotta)"Without prompting"I just want to move you along a bit further. Do you take any prescribed medicines? I know you've got diabetes and high blood pressure. I do. I take Ramipril. Okay. And I take Metformin, and there's another one that begins with G for the diabetes.  Glicoside."With context aware prompting"I just wanna move you along a bit further. Do you take any prescribed medicines? I know you've got diabetes and high blood pressure. I, I do. I take, um, I take Ramipril. Okay, mhm. And I take Metformin, and there's another one that begins with G for the diabetes. So glycosi — glycosi— glycoside."SourceNon-speech audio event:00:0001:59 1x "prompt": "Produce a transcript suitable for conversational analysis. Every disfluency is meaningful data. Include: Tag sounds: [beep]"Without audio tagging"Your call has been forwarded to an automatic voice message system. At the tone, please record your message. When you have finished recording, you may hang up or press 1 for more options."With audio tagging"Your call has been forwarded to an automatic voice message system. At the tone, please record your message. When you have finished recording, you may hang up or press 1 for more options. [beep]"SourceSpeech with disfluencies:00:0001:59 1x "prompt": "Produce a transcript suitable for conversational analysis. Every disfluency is meaningful data. Include: fillers (um, uh, er, ah, hmm, mhm, like, you know, I mean), repetitions (I I, the the), restarts (I was- I went), stutters (th-that, b-but, no-not), and informal speech (gonna, wanna, gotta)"Without disfluency promptingDo you and Quentin still socialize when you come to Los Angeles, or is it like he's so used to having you here? No, no, no, we're friends. What do you do with him?With disfluency promptingDo you and Quentin still socialize, uh, when you come to Los Angeles, or is it like he's so used to having you here? No, no, no, we, we, we're friends. What do you do with him?SourceProper noun spelling:00:0001:59 1x "keyterms_prompt": ["Kelly Byrne-Donoghue"]Without keyterms prompting"Hi, this is Kelly Byrne Donahue"Without keyterms prompting"Hi, this is Kelly Byrne-Donahue"SourceCaputuring speaker roles:00:0001:59 1x "prompt": "Produce a transcript with every disfluency data. Additionally, label speakers with their respective roles. 1. Place [Speaker:role] at the start of each speaker turn. Example format: [Speaker:NURSE] Hello there. How can I help you today? [Speaker:PATIENT] I'm feeling unwell. I have a headache."}With traditional speaker labelsSpeaker A: 5Mg. And do you take it regularly?‍Speaker B: Oh yeah, yeah.‍Speaker  A: Good.‍Speaker B: Every evening.‍Speaker A: And no side effects with it?With speaker labels promptingSpeaker [Nurse]: 5Mg. And do you take it regularly?‍Speaker [Patient]: Oh yeah, yeah.‍Speaker  [Nurse]: Good.‍Speaker [Patient]: Every evening.‍Speaker [Nurse]: And no side effects with it?SourceSpanish and english audio:00:0001:59 1x "language_detection": True"prompt": Preserve natural code-switching between English and Spanish. Retain spokenlanguage as-is (correct "I was hablando con mi manager").Without codeswitchingWould definitely think I spoke Spanish if you heard me speak Spanish. But I still make mistakes. Soy wines. Paltro Soy. La fundadora de goop. Thank you. Thank you for doing that.With codeswitchingYou would definitely think I spoke Spanish if you heard me speak Spanish, but I still make mistakes. Soy Gwyneth Paltrow, soy la fundadora de Goop. Thank you. Thank you for doing that. The industry’s best products need the industry’s best modelsWe build the most accurate, fully featured models on the market, so you can ship with confidence knowing that you’re building on the best.Speech-to-TextUnlock the value of prerecorded voice data, and power workflows with unmatched accuracy.Learn moreStreaming Speech-to-TextBuild intuitive voice agent workflows with ultra-low latency, high accuracy, precise end-of-turn controls, and more.Learn moreSpeech UnderstandingEnable deep analysis and high-value insights with sophisticated audio-intelligence models.Learn moreProduct overviewEverything you need to build voice apps that outpace the competitionThe accuracy and capabilities required to build products that stand out, and the flexibility to scale to millions of users without blinking an eye.industry-leading accuracyAvoid garbage in, garbage outYour product experience is only as good as the inputs it’s built on. AssemblyAI’s models lead the industry in accuracy and reliability.Industry’s lowest Word Error Rate (WER)Up to 30% less hallucinations than other providersPreferred by 73% of end users in unbiased evaluationsExplore our latest modelCAPABILITIESGo beyond transcriptionAccess a full suite of speech understanding capabilities to uncover insights, identify speakers, and build powerful product experiences.Correctly identify speakers with advanced diarization capabilitiesAutomatically format text and alphanumerics for clearer outputsAccurately capture multilingual speech with automatic language detectionCheck out our productsBuild-readyEasy to start, even easier to scaleWe built AssemblyAI to be the easiest platform on the market for developers to build, ship, and scale on.Serving 600M+ inference calls and over 840M API calls per monthOver 40 terabytes of audio processed dailyPay only for what you use and scale to millions of hours without contracts or throttlesGo to developer docsWe’re not playing around—but you canPut our AI models to the test in our no-code playground.Explore PlaygroundThe most loved AI apps are built on AssemblyAILearn why today’s most innovative companies choose us.3x increasein closed enterprise deals after launching Conversation Intelligence with AssemblyAI15% highercustomer win rates after implementing AssemblyAI2Xfree-to-paid conversion rate after implementing AssemblyAIPlay videoPlay video23% improvementin call transcription accuracy and 2X increase in customer conversion rate90% reductionin customer complaints and support ticketsUnlock the value of voice dataBuild what’s next on the platform powering thousands of the industry’s leading of Voice AI apps.Try our API for freeContact sales Cookie BannerBy clicking “Accept Cookies”, you agree to the storing of cookies on your device, pursuant to the AssemblyAI Privacy Policy. By clicking the Cookie Preferences button on the right, you can exercise more granular preferences around how AssemblyAI uses non-essential cookies to process your information. For more information about how AssemblyAI stores and otherwise processes your personal information, and your related rights, please refer to our Privacy Policy.(AssemblyAI Privacy Policy)Cookie PreferencesReject Non-Essential Cookies Accept CookiesCookies Privacy Preference CenterWhen you visit our Websites and other web content pages (hereinafter collectively, “Websites”), we may (on our own or through a third-party) place a cookie on your device. Such cookies on their own rarely contain sufficient information to identify you, but may still collect information related to you, including without limitation your use of the Websites, preferences, or device, but are mostly used to make the Websites work, provide us information on how the Websites are being used, and to provide a more personalized web experience. You can choose below not to allow some types of cookies (excluding “Strictly Necessary” cookies, which must operate). Click on the different category headings to find out more and change our default settings (if applicable). Once you’ve made your selections, you can enter your choices by clicking “Confirm Selections”. Please note that blocking some types of cookies may impact your experience of the site and the services we are able to offer. (AssemblyAI Privacy Policy)Allow All Manage Consent PreferencesStrictly Necessary CookiesAlways ActiveStrictly Necessary cookies are considered “essential.” They are necessary for the Websites to function, and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the Offerings will not then work. These cookies do not store on their own personally identifiable information.Functional Cookies Functional Cookies InactiveFunctional cookies enable website or page enhanced functionality and customization on the Websites. They may be set by us or by third party providers whose services we have added to our pages, or by us directly. If you do not allow these cookies then some or all of the pages or Services may not function properly.Performance Cookies Performance Cookies InactivePerformance cookies allow us to do things such as count visits and traffic sources so we can measure and improve the performance of the Websites. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and anonymous. If you do not allow these cookies, it will be difficult for us to monitor site or page performance.Targeting Cookies Targeting Cookies InactiveTargeting cookies may be set through our Websites by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.Cookie List Clear checkbox label labelApply CancelConsent Leg.Interest checkbox label label checkbox label label checkbox label labelReject All Non-Essential Confirm Selections --- Introducing Medical Mode: Purpose-built accuracy for medical terminology   Learn more Pricing built for innovation, not contract negotiationStart for free, scale seamlessly, and only pay for what you use. AssemblyAI's usage-based pricing has no up-front commits or contracts and decreasing rates as you scale.FreeAccess to industry-leading Speech-to-Text and Audio Intelligence modelsTranscribe up to 185 hours of pre-recorded audio for freeTranscribe up to 333 hours of streaming audio for freeUp to 5 new streams per minuteDeveloper docs, community support, and resources to help you buildGet started for freePay as you goUnlimited access to Speech-to-Text, Speech Understanding, and LLM GatewayUnlimited concurrent streams and pre-recorded concurrency starting at 200 filesCustomize rate limits - scale to any workloadDedicated technical support and customized SLAs and SLOsBAA for HIPAA and compliance with EU Data Residency standardsSelf-hosted deployments (On-prem, EU, VPC) Start building as low as $0.15/hrLooking for tiered pricing? Whether you're an enterprise processing millions of requests looking for tiered pricing options, need dedicated infrastructure, or require custom model configurations, our team will work with you to create a solution fits your specific needs.Talk to our team Speech-to-TextSpeech UnderstandingGuardrailsLLM GatewayPre-recorded Speech-to-TextBuild Voice AI on the most accurate Speech-to-Text with language detection, formatting, filler words, keyterms prompting, custom spelling, word-level timestamps, and more. ModelsPay as you goCustomUniversal-3 ProIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.DefaultProduction-quality speech model that adapts its behavior based on the instructions you provide. Available in  English, Spanish, French, German, Italian, and Portuguese.PromptingIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Add-onControl transcription behavior with plain language instructions: provide context, tag audio events, and more. Keyterms PromptingIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Add-onProvide up to 1,000 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.Medical ModeIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Add-onOptimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.$0.21/hr$0.05/hr$0.05/hr$0.15/hrGet custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloadsContact usUniversal-2Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Fast, accurate transcription across 99 languages—exceptional accuracy straight out of the box.Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.$0.15/hrSpeaker DiarizationIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Add-onDetect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.$0.02/hrStreaming Speech-to-TextTranscribe live audio and video files in real-time at ultra-low latency and high-quality accuracy. Leverage auto punctuation and casing, next-gen end-of-turn detection, and ITM/formatting.ModelsPay as you goCustomUniversal-3 Pro StreamingNewSpeaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide.The most accurate model for voice agents that demand the highest quality. Best-in-class accuracy with advanced prompting capabilities. Supports English, Spanish, German, French, Portuguese, and Italian.Keyterms PromptingIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Provide up to 100 words or phrases (maximum 6 words per phrase) to improve transcription accuracy.PromptingIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.BetaControl transcription behavior with plain language instructions: provide context, tag audio events, and more. Medical ModeIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Add-onOptimize transcription for medical terminology and healthcare conversations with significantly improved accuracy.$0.45/hrIncluded$0.05/hr$0.15/hrGet custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloadsContact usUniversal-Streaming Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Speaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide.The fastest model for real-time English transcription. Optimized for speed and cost-effectiveness for English-only applications.$0.15/hrUniversal-Streaming MultilingualIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Speaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide.Multilingual transcription at the speed and cost of Universal-Streaming. Supports English, Spanish, German, French, Portuguese, and Italian. $0.15/hrWhisper-StreamingIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Identify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Speaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide.Open-source Whisper model enhanced with AssemblyAI's reliable infrastructure and unlimited scale. Supports 99+ languages at an accessible price point.$0.30/hrKeyterms PromptingIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Add-onImprove recognition accuracy for specific words and phrases that are important to your use case.$0.04/hrSpeaker DiarizationIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.Add-onDetect multiple speakers in audio files and segment the transcript into utterances, showing what each speaker said.$0.12/hrSpeech UnderstandingAI models to identify speaker names, translate and format outputs, identify spoken topics, and more.ModelsPay as you goCustomSpeaker IdentificationSpeaker Identification allows you to identify speakers by their actual names or roles, transforming generic labels like “Speaker A” or “Speaker B” into meaningful identifiers that you provide.$0.02/hrGet custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloadsContact usTranslationThe Translation feature automatically converts your transcribed audio content from one language to another, enabling you to reach global audiences without manual translation work. $0.06/hrCustom FormattingThe Custom Formatting feature automatically standardizes and formats specific types of information in your transcripts, ensuring consistency across dates, phone numbers, emails, and other data types. $0.03/hrEntity DetectionIdentify a wide range of entities that are spoken in your audio files, such as person and company names, email addresses, dates, and locations.$0.08/hrSentiment AnalysisWith Sentiment Analysis, AssemblyAI can detect the sentiment of each sentence of speech spoken in your audio files.$0.02/hrAuto ChaptersAutomatically generate a summary over time for audio and video files.$0.08/hrKey PhrasesAccurately identify significant words and phrases in your transcription, enabling you to extract the most pertinent concepts or highlights from your audio/video file.$0.01/hrTopic DetectionLabel the topics that are spoken in your audio and video files. The predicted topic labels follow the standardized IAB Taxonomy, which makes them suitable for contextual targeting.$0.15/hrSummarizationLeverage our AI-powered Summarization models to automatically summarize audio/video data in your products at scale. Customize the summary types to best fit your use case.$0.03/hrGuardrailsGuardrails ensures only high-quality, safe, and compliant content flows through your applications.ModelsPay as you goCustomProfanity FilteringAutomatically filter out profanity from your transcripts.$0.01/hrGet custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloadsContact usPII Audio RedactionIdentify and remove Personally Identifiable Information, such as phone numbers and social security numbers, from the audio file before it is returned to you.$0.05/hrPII RedactionIdentify and remove Personally Identifiable Information, such as phone numbers and social security numbers, from the transcription text before it is returned to you.$0.08/hrContent ModerationDetect sensitive content in your audio and video files - such as hate speech, violence, sensitive social issues, alcohol, drugs, and more.$0.15/hrLLM GatewayThe LLM Gateway unifies your voice-to-intelligence workflow into one API.ModelsPay as you goCustomGPT-5.2Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$1.75 / 1m tokens (Input)$14.00 / 1m tokens (Output)Get custom rate limits, enhanced concurrency, and enterprise-grade flexibility tailored to your AI workloadsContact usGPT-5.1Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$1.25 / 1m tokens (Input)$10.00 / 1m tokens (Output)GPT-5Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$1.25 / 1m tokens (Input)$10.00 / 1m tokens (Output)GPT-5-MiniModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$0.25 / 1m tokens (Input)$2.00 /  1m tokens (Output)GPT-5 NanoModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$0.05 / 1m tokens (Input)$0.40/ 1m tokens (Output)GPT 4.1Model with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$2.00 / 1m tokens (Input)$8.00 / 1m tokens (Output)gpt-oss-20bModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$0.07 / 1m tokens (Input)$0.30 / 1m tokens (Output)gpt-oss-120bModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$0.15 / 1m tokens (Input)$0.60 / 1m tokens (Output)ChatGPT-4oModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$5.00 / 1m tokens (Input)$15.00/ 1m tokens (Output)Gemini 3 ProModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$2.00 / 1m tokens (Input)$12.00 / 1m tokens (Output)Gemini 3 FlashModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$0.50 / 1m tokens (Input)$3.00 / 1m tokens (Output)Gemini 2.5 FlashModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$0.30 / 1m tokens (Input)$2.50 / 1m tokens (Output)Gemini 2.5 Flash LiteModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$0.10/ 1m tokens (Input)$0.40 / 1m tokens (Output)Gemini 2.5 ProModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$1.25/ 1m tokens (Input)$10.00 / 1m tokens (Output)Claude 4.6 SonnetModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$3.00 / 1m tokens (Input)$15.00 / 1m tokens (Output)Claude 4.5 SonnetModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$3.00 / 1m tokens (Input)$15.00 / 1m tokens (Output)Claude 4.5 HaikuA legacy model with a balanced combination of performance and speed for efficient, high-throughput tasks.$1.00/ 1m tokens (Input)$5.00/ 1m tokens (Output)Claude 4 SonnetModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$3.00 / 1m tokens (Input)$15.00 / 1m tokens (Output)Claude 4.6 OpusModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$5.00 / 1m tokens (Input)$25.00 / 1m tokens (Output)Claude 4.5  OpusModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$5.00 / 1m tokens (Input)$25.00 / 1m tokens (Output)Claude 4 OpusModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$15.00 / 1m tokens (Input)$75.00 / 1m tokens (Output)Claude 3.5 HaikuModel with superior performance on complex reasoning tasks, advanced creative work, and sophisticated problem-solving.$0.80/ 1m tokens (Input)$4.00 / 1m tokens (Output)Claude 3 HaikuThe fastest model in the family, optimized for quick responses while maintaining good reasoning.$0.25/ 1m tokens (Input)$1.25/ 1m tokens (Output)Qwen3 Next 80B A3BThe fastest model in the family, optimized for quick responses while maintaining good reasoning.$0.15/ 1m tokens (Input)$1.20/ 1m tokens (Output)Qwen3 32BThe fastest model in the family, optimized for quick responses while maintaining good reasoning.$0.15/ 1m tokens (Input)$0.60/ 1m tokens (Output)Kimi K2.5The fastest model in the family, optimized for quick responses while maintaining good reasoning.$0.60/ 1m tokens (Input)$3.00/ 1m tokens (Output)Security and PrivacyAssemblyAI uses enterprise-grade security practices to keep your data safe. We approach security by design and default, and continuously ensure AssemblyAI is secure for you and your team.GDPRPCI DSSSOC 2 Type 2EU Data ResidencyISO 27001HIPAA ComplianceFrequently Asked QuestionsWhat are the differences between Speech-to-Text models?Universal-3 Pro is our most advanced speech language model, designed specifically for speech tasks. It uses a prompt-based architecture for deeper contextual understanding and allows domain-specific customization—no retraining needed. Universal-2 is a high-accuracy model supporting 99 languages, built for general-purpose use cases. It offers strong out-of-the-box performance and supports features like speaker diarization and real-time streaming. Perfect for legal, medical, and other specialized use cases. Universal-Streaming is an ultra-fast, ultra-accurate streaming speech-to-text model designed for voice agents.Can I sign up for free?Yes! With the free offer, you get $50 in credits to use towards AssemblyAI’s Speech-to-Text APIs. To add more credits, simply add a credit card to your account.Do you offer volume discounts?Absolutely! If you plan to send large volumes of audio and video content through our API, please reach out to us here to see if you qualify for a volume discount.How does Universal-Streaming concurrency work?We don't limit how many streams you can run simultaneously - only how quickly you can start new ones, giving you unlimited scale while ensuring reliable performance. ‍Free users can start 5 new streams per minute, while pay-as-you-go accounts start with 100 new streams per minute. Anytime you are using 70% or more of your current limit, your new sessions rate limit will automatically increase and scale up by 10% every 60 seconds. This means within 5 minutes of sustained usage, you can scale from 100 to 146 new streams per minute (for a total of 610 concurrent streams), with unlimited ceiling as your usage grows.These limits are designed to never interfere with legitimate applications - normal scaling patterns automatically get more capacity before hitting any walls, while protecting against runaway scripts or abuse. Your baseline limit is guaranteed and never decreases, so you can scale smoothly from dozens to thousands of simultaneous streams without artificial barriers or surprise fees. Need higher limits? Contact our sales team for custom limits that match your deployment timeline.How does Universal-Streaming session-based pricing work?We charge based on total session duration - the entire time your connection stays open, whether audio is flowing or not. This gives you complete transparency and control: you pay for exactly what you're using, with no hidden costs for idle streams. You can choose to keep streams open continuously for instant response or open them strategically as needed to minimize costs, scaling up and down without prepaid commitments based on how your voice application actually works.How fast does it take for audio and video files to process?Most audio files sent to AssemblyAI's API can be processed in less than 60 seconds. For example, you ca process a 30 minute pre-recorded audio file in 23 seconds with Universal speech-to-text model.How does billing work?Great question. Once you add a credit card and deposit funds into your account, your account's funds will be drained as you use the API.How is multichannel billed?When multichannel is enabled, each channel will be transcribed and billed separately. The total cost is calculated by taking the hourly transcription rate (billed per second) and multiplying it by the number of channels. To calculate your total cost, simply multiply your recording's duration by the hourly rate, then multiply that result by the number of channels.For example, if you sent a 5-minute recording with three channels, you would be billed for the 5 minutes of audio multiplied by the standard rate, with that total multiplied by three channels. This is equivalent to being billed for 15 minutes of audio.Can I purchase or use AssemblyAI through the AWS Marketplace?You can also get started with AssemblyAI on the AWS Marketplace—or ask your AWS account team about how to leverage AssemblyAI to revolutionize the way your company understands its customers.How can I talk to someone?Feel free to email us at support@assemblyai.com, or click the chat button in the bottom right corner of your browser to chat live with our API Support team!What languages do you support?We support over 99 languages and counting, including Global English (English and all of its accents).What is a token?In the context of a Large Language Model (LLM), a “token” is the smallest unit of text processed by the model. 100 tokens roughly maps to ~75 words.Unlock the value of voice dataBuild what’s next on the platform powering thousands of the industry’s leading of Voice AI apps.Try our API for freeContact sales Cookies Privacy Preference CenterWhen you visit our Websites and other web content pages (hereinafter collectively, “Websites”), we may (on our own or through a third-party) place a cookie on your device. Such cookies on their own rarely contain sufficient information to identify you, but may still collect information related to you, including without limitation your use of the Websites, preferences, or device, but are mostly used to make the Websites work, provide us information on how the Websites are being used, and to provide a more personalized web experience. You can choose below not to allow some types of cookies (excluding “Strictly Necessary” cookies, which must operate). Click on the different category headings to find out more and change our default settings (if applicable). Once you’ve made your selections, you can enter your choices by clicking “Confirm Selections”. Please note that blocking some types of cookies may impact your experience of the site and the services we are able to offer. (AssemblyAI Privacy Policy)Allow All Manage Consent PreferencesStrictly Necessary CookiesAlways ActiveStrictly Necessary cookies are considered “essential.” They are necessary for the Websites to function, and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the Offerings will not then work. These cookies do not store on their own personally identifiable information.Functional Cookies Functional Cookies InactiveFunctional cookies enable website or page enhanced functionality and customization on the Websites. They may be set by us or by third party providers whose services we have added to our pages, or by us directly. If you do not allow these cookies then some or all of the pages or Services may not function properly.Performance Cookies Performance Cookies InactivePerformance cookies allow us to do things such as count visits and traffic sources so we can measure and improve the performance of the Websites. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and anonymous. If you do not allow these cookies, it will be difficult for us to monitor site or page performance.Targeting Cookies Targeting Cookies InactiveTargeting cookies may be set through our Websites by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.Cookie List Clear checkbox label labelApply CancelConsent Leg.Interest checkbox label label checkbox label label checkbox label labelReject All Non-Essential Confirm Selections --- Introducing Medical Mode: Purpose-built accuracy for medical terminology   Learn more Advancing and democratizing Speech AI technology for the worldOur vision is to create new, superhuman Speech AI models that will unlock entirely new classes of applications and products to be built leveraging voice data.Latest News12.04.23Announcing our $50M Series C to build superhuman Speech AI modelsWe're excited to share that we’ve raised $50M in Series C funding led by Accel, our partners that also led our Series A, with participation from Keith Block and Smith Point Capital, Insight Partners, Daniel Gross and Nat Friedman, and Y Combinator.Read moreBacked ByOur ValuesOur values drive everything we do at AssemblyAI and help cultivate a culture that is open and honestPlay to winWe’re here to build a market-leading, top 1% company. While we know it doesn’t happen overnight, we hold ourselves and our peers to this bar, and we seek to get better every day as we continue to build a winning product and service for our customers.Deliver with craftsmanshipWe strive to be excellent in everything that we do: the emails we write, the products we create, the artifacts we produce – everything. While we do operate with urgency and move quickly, we’re also proud of what we put out in the world, deeply care about the quality of our work, and sweat the details.Be deeply curiousWe are truth-seeking, always ready to dive into the raw materials at all levels, and are not afraid to ask the tough questions as we scratch a layer deeper in our search for clarity and to reduce ambiguity.Operate with integrityWe have high expectations, but we also care deeply about each other and our customers. We strive to always assume good intentions, ask questions versus make assumptions, treat others with respect, be deeply thoughtful in our actions, and lift each other up.Think long termWe are not afraid to quickly experiment, iterate, and learn as we pursue our long-term goals – but we do so intentionally and are not reactive. We aim to make thoughtful, long-term decisions over convenient ones. We operate with conviction and with resilience, and we embrace the ever-dynamic process of building a new company, especially in a new, evolving industry.Solve for company firstWe’re all on the same team – AssemblyAI. Regardless of the department or team we work in, we recognize that as a startup, individual and team success comes as a byproduct of AssemblyAI’s success. We are collaborative with our colleagues, transparent with our work, and always strive to do what’s best for AssemblyAI over ourselves or our team.A RESEARCH-ORIENTED ORGANIZATIONWe're a team of interdisciplinary research leaders, scientists, and engineers focused on building and scaling new state-of-the-art Speech AI models that are accurate, capable, easy to use, and safe. Today, our technology is being widely deployed to recognize, understand, and process human speech for thousands of customers, dozens of leading enterprises, and hundreds of thousands of developers around the world.See our researchCareersLeaders in Speech AI researchWe believe the best way to continue to innovate is to bring together some of the best minds in AI across different fields, expertise, and backgrounds. Join our team of interdisciplinary research leaders, scientists, and engineers working to advance the state-of-the-art in AI models for voice data.See open roles Cookies Privacy Preference CenterWhen you visit our Websites and other web content pages (hereinafter collectively, “Websites”), we may (on our own or through a third-party) place a cookie on your device. Such cookies on their own rarely contain sufficient information to identify you, but may still collect information related to you, including without limitation your use of the Websites, preferences, or device, but are mostly used to make the Websites work, provide us information on how the Websites are being used, and to provide a more personalized web experience. You can choose below not to allow some types of cookies (excluding “Strictly Necessary” cookies, which must operate). Click on the different category headings to find out more and change our default settings (if applicable). Once you’ve made your selections, you can enter your choices by clicking “Confirm Selections”. Please note that blocking some types of cookies may impact your experience of the site and the services we are able to offer. (AssemblyAI Privacy Policy)Allow All Manage Consent PreferencesStrictly Necessary CookiesAlways ActiveStrictly Necessary cookies are considered “essential.” They are necessary for the Websites to function, and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the Offerings will not then work. These cookies do not store on their own personally identifiable information.Functional Cookies Functional Cookies InactiveFunctional cookies enable website or page enhanced functionality and customization on the Websites. They may be set by us or by third party providers whose services we have added to our pages, or by us directly. If you do not allow these cookies then some or all of the pages or Services may not function properly.Performance Cookies Performance Cookies InactivePerformance cookies allow us to do things such as count visits and traffic sources so we can measure and improve the performance of the Websites. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and anonymous. If you do not allow these cookies, it will be difficult for us to monitor site or page performance.Targeting Cookies Targeting Cookies InactiveTargeting cookies may be set through our Websites by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.Cookie List Clear checkbox label labelApply CancelConsent Leg.Interest checkbox label label checkbox label label checkbox label labelReject All Non-Essential Confirm Selections --- Introducing Medical Mode: Purpose-built accuracy for medical terminology   Learn more Our partnersDiscover our trusted network of industry-leading partners. Interested in joining our ecosystem? Get in touch to explore collaboration opportunities. Integrate with your whole stackStrategic partnersKey collaborators that help expand the reach and impact of AssemblyAI solutions.Communication infrastructure partnersLeading contact center and communication infrastructure platforms that integrate with AssemblyAI’s speech-to-text products.IntegrationsA broad ecosystem of tools and platforms that seamlessly integrate with AssemblyAI to power diverse voice and language AI use cases.Interested in joining the AssemblyAI network?Reach out to partnerships@assemblyai.com Cookies Privacy Preference CenterWhen you visit our Websites and other web content pages (hereinafter collectively, “Websites”), we may (on our own or through a third-party) place a cookie on your device. Such cookies on their own rarely contain sufficient information to identify you, but may still collect information related to you, including without limitation your use of the Websites, preferences, or device, but are mostly used to make the Websites work, provide us information on how the Websites are being used, and to provide a more personalized web experience. You can choose below not to allow some types of cookies (excluding “Strictly Necessary” cookies, which must operate). Click on the different category headings to find out more and change our default settings (if applicable). Once you’ve made your selections, you can enter your choices by clicking “Confirm Selections”. Please note that blocking some types of cookies may impact your experience of the site and the services we are able to offer. (AssemblyAI Privacy Policy)Allow All Manage Consent PreferencesStrictly Necessary CookiesAlways ActiveStrictly Necessary cookies are considered “essential.” They are necessary for the Websites to function, and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the Offerings will not then work. These cookies do not store on their own personally identifiable information.Functional Cookies Functional Cookies InactiveFunctional cookies enable website or page enhanced functionality and customization on the Websites. They may be set by us or by third party providers whose services we have added to our pages, or by us directly. If you do not allow these cookies then some or all of the pages or Services may not function properly.Performance Cookies Performance Cookies InactivePerformance cookies allow us to do things such as count visits and traffic sources so we can measure and improve the performance of the Websites. They help us to know which pages are the most and least popular and see how visitors move around the site. All information these cookies collect is aggregated and anonymous. If you do not allow these cookies, it will be difficult for us to monitor site or page performance.Targeting Cookies Targeting Cookies InactiveTargeting cookies may be set through our Websites by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant advertisements on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.Cookie List Clear checkbox label labelApply CancelConsent Leg.Interest checkbox label label checkbox label label checkbox label labelReject All Non-Essential Confirm Selections