ai-facebook-com
Website: https://ai.meta.com/blog/voicebox-generative-ai-model-speech/
Detailed pricing plans are not available yet for this tool.
Meta AIAI ResearchThe LatestAboutGet LlamaTry Meta AIBACKExplore Meta AIGet Meta AIMeta AI StudioOverviewProjectsResearch AreasPeopleOverviewOpen SourceCareersClearClearMeta AI>AI Research>The LatestAbout>Get LlamaTry Meta AIResearchIntroducing Voicebox: The first generative AI model for speech to generalize across tasks with state-of-the-art performanceJune 16, 2023•5 minute readMeta AI researchers have achieved a breakthrough in generative AI for speech. We’ve developed Voicebox, the first model that can generalize to speech-generation tasks it was not specifically trained to accomplish with state-of-the-art performance.RECOMMENDED READSIntroducing Make-A-Video: An AI system that generates videos from textGreater creative control for AI image generationIntroducing LLaMA: A foundational, 65-billion-parameter large language modelLike generative systems for images and text, Voicebox creates outputs in a vast variety of styles, and it can create outputs from scratch as well as modify a sample it’s given. But instead of creating a picture or a passage of text, Voicebox produces high-quality audio clips. The model can synthesize speech across six languages, as well as perform noise removal, content editing, style conversion, and diverse sample generation.Prior to Voicebox, generative AI for speech required specific training for each task using carefully prepared training data. Voicebox uses a new approach to learn just from raw audio and an accompanying transcription. Unlike autoregressive models for audio generation, Voicebox can modify any part of a given sample, not just the end of an audio clip it is given.Voicebox is based on a method called Flow Matching, which has been shown to improve upon diffusion models. Voicebox outperforms the current state of the art English model VALL-E on zero-shot text-to-speech in terms of both intelligibility (5.9 percent vs. 1.9 percent word error rates) and audio similarity (0.580 vs. 0.681), while being as much as 20 times faster. For cross-lingual style transfer, Voicebox outperforms YourTTS to reduce average word error rate from 10.9 percent to 5.2 percent, and improves audio similarity from 0.335 to 0.481.Voicebox achieves new state-of-the-art results, outperforming Vall-E and YourTTS on word error rate.Voicebox also achieves new state-of-the-art results on audio style similarity metrics on English and multilingual benchmarks, respectively.There are many exciting use cases for generative speech models, but because of the potential risks of misuse, we are not making the Voicebox model or code publicly available at this time. While we believe it is important to be open with the AI community and to share our research to advance the state of the art in AI, it’s also necessary to strike the right balance between openness with responsibility. With these considerations, today we are sharing audio samples and a research paper detailing the approach and results we have achieved. In the paper, we also detail how we built a highly effective classifier that can distinguish between authentic speech and audio generated with Voicebox.A new approach to speech generationOne of the main limitations of existing speech synthesizers is that they can only be trained on data that has been prepared expressly for that task. These inputs – known as monotonic, clean data – are difficult to produce, so they exist only in limited quantities, and they result in outputs that sound monotone.We built Voicebox upon the Flow Matching model, which is Meta’s latest advancement on non-autoregressive generative models that can learn highly non-deterministic mapping between text and speech. Non-deterministic mapping is useful because it enables Voicebox to learn from varied speech data without those variations having to be carefully labeled. This means Voicebox can train on more diverse data and a much larger scale of data.We trained Voicebox with more than 50,000 hours of recorded speech and transcripts from public domain audiobooks in English, French, Spanish, German, Polish, and Portuguese. Voicebox is trained to predict a speech segment when given the surrounding speech and the transcript of the segment. Having learned to infill speech from context, the model can then apply this across speech generation tasks, including generating portions in the middle of an audio recording without having to re-create the entire input.This versatility enables Voicebox to perform well across a variety of tasks, including:In-context text-to-speech synthesis: Using an input audio sample just two seconds in length, Voicebox can match the sample’s audio style and use it for text-to-speech generation. Future projects could build on this capability by bringing speech to people who are unable to speak, or by allowing people to customize the voices used by nonplayer characters and virtual assistants.Cross-lingual style transfer: Given a sample of speech and a passage of text in English, French, German, Spanish, Polish, or Portuguese, Voicebox can produce a reading of the text in that language. This capability is exciting because in the future it could be used to help people communicate in a natural, authentic way — even if they don’t speak the same languages.Speech denoising and editing: Voicebox’s in-context learning makes it good at generating speech to seamlessly edit segments within audio recordings. It can resynthesize the portion of speech corrupted by short-duration noise, or replace misspoken words without having to rerecord the entire speech. A person could identify which raw segment of the speech is corrupted by noise (like a dog barking), crop it, and instruct the model to regenerate that segment. This capability could one day be used to make cleaning up and editing audio as easy as popular image-editing tools have made adjusting photos.Diverse speech sampling: Having learned from diverse in-the-wild data, Voicebox can generate speech that is more representative of how people talk in the real world and across the six languages listed above. In the future, this capability could be used to generate synthetic data to help better train a speech assistant model. Our results show that speech recognition models trained on Voicebox-generated synthetic speech perform almost as well as models trained on real speech, with 1 percent error rate degradation as opposed to 45 to 70 percent degradation with synthetic speech from previous text-to-speech models.Sharing generative AI research responsiblyAs the first versatile, efficient model that successfully performs task generalization, we believe Voicebox could usher in a new era of generative AI for speech. As with other powerful new AI innovations, we recognize that this technology brings the potential for misuse and unintended harm. In our paper, we detail how we built a highly effective classifier that can distinguish between authentic speech and audio generated with Voicebox to mitigate these possible future risks. We believe it is important to be open about our work so the research community can build on it and to continue the important conversations we’re having about how to build AI responsibly, which is why we are sharing our approach and results in a research paper.Voicebox represents an important step forward in generative AI research. Other scalable generative AI models with task generalization capabilities have sparked excitement about potential applications across tasks when it comes to text, image, and video generation. We hope to see a similar impact for speech in the future. We look forward to continuing our exploration in the audio domain and seeing how other researchers build on our work.This blog post was made possible by the work of Matt Le, Apoorv Vyas, Bowen Shi, Brian Karrer, Leda Sari, Rashel Moritz, Mary Williamson, Vimal Manohar, Yossi Adi, Jay Mahadeokar, and Wei-Ning Hsu.Read the paperListen to more Voicebox samplesOur latest updates delivered to your inboxSubscribe to our newsletter to keep up with Meta AI news, events, research breakthroughs, and more.Join us in the pursuit of what’s possible with AI.See all open positionsRelated PostsComputer VisionIntroducing Segment Anything: Working toward the first foundation model for image segmentationApril 5, 2023Read postFEATUREDResearchMultiRay: Optimizing efficiency for large-scale AI modelsNovember 18, 2022Read postFEATUREDML ApplicationsMuAViC: The first audio-video speech translation benchmark March 8, 2023Read postOur approachAbout AI at MetaPeopleCareersResearchInfrastructureResourcesDemosMeta AIExplore Meta AIGet Meta AIAI StudioLatest newsBlogNewsletterFoundational modelsLlamaOur approachOur approachAbout AI at MetaPeopleCareersResearchResearchInfrastructureResourcesDemosMeta AIMeta AIExplore Meta AIGet Meta AIAI StudioLatest newsLatest newsBlogNewsletterFoundational modelsLlamaPrivacy PolicyTermsCookies Meta © 2026 --- ABOUT AI AT METAAdvancing AI for a connected worldPushing the boundaries of AI through research, infrastructure and product innovationWe’re driven to build incredible things that connect people in inspiring ways. Because we can’t truly advance groundbreaking AI alone, we share our research and engage with the AI community to advance the science, together.Whether it be in AI Infrastructure, Generative AI, NLP, Computer Vision or other core areas of AI, our focus is to connect people in inspiring ways through collaborative, responsible AI innovation.THE WORK WE DOTurning ideas into innovation, togetherWe’re actively building remarkable things in key areas of AI that are shaping an AI-driven future. A future we aim to enhance through our many, diverse fundamental and applied research projects.SAM 2 is a segmentation model that enables fast, precise selection of any object in any video or image.Segment Anything 2DINOv3 scales self-supervised learning (SSL) for images to produce our strongest universal vision backbones.DINOv3V-JEPA 2 is the first world model trained on video that achieves state-of-the-art visual understanding and prediction.V-JEPA 2Movie Gen is the most advanced family of media foundation AI models empowering immersive storytelling.Movie GenINNOVATING RESPONSIBILITYOur goal is to build AI responsibly, for everyoneOur commitment to responsible AI is driven by the belief that everyone should have equitable access to information, services, and opportunities.The responsible approach we’re taking to generative AILearn moreWe’re exploring and discovering the frontier of AI along with everyone else. So we too need guidance as we maneuver through its many possibilities. By listening to subject matter experts, policymakers and people with lived experiences, we aim to be proactive in the We’re exploring and discovering the frontier of AI along with everyone else. So we too need guidance as we maneuver through its many possibilities. By listening to subject matter experts, policymakers and people with lived experiences, we aim to be proactive in the promotion and advancement of responsible design and operation of AI systems. As we do so, we adhere to our organization’s core principles: Privacy & Security, Fairness & Inclusion, Robustness & Safety, Transparency & Control, Accountability & Governance.promotion and advancement of responsible design and operation of AI systems. As we do so, we adhere to our organization’s core principles: Privacy & Security, Fairness & Inclusion, Robustness & Safety, Transparency & Control, Accountability & Governance.PEOPLE AND CULTUREFreedom to explore and apply AI at scaleOpen positionsKEY PRINCIPLESOpennessWe believe the latest advancements in AI should be published and shared for the community to learn from and build upon.CollaborationWe collaborate openly with internal and external partners to share knowledge and cultivate diverse perspectives.ExcellenceThere’s no shortage of new possibilities to explore in AI. While our researchers focus on what they believe will have the most positive impact on people and society, our engineers are working to deliver meaningful advancements in products and experiences for all people.ScaleTo bring the benefits of AI to more people and improve accessibility, our research must account for both large scale data and computational needs. --- Meta AIAI ResearchThe LatestAboutGet LlamaTry Meta AIBACKExplore Meta AIGet Meta AIMeta AI StudioOverviewProjectsResearch AreasPeopleOverviewOpen SourceCareersClearClearMeta AI>AI Research>The LatestAbout>Get LlamaTry Meta AIThe latest AI news from MetaFEATUREDFour MTIA Chips in Two Years: Scaling AI Experiences for BillionsServing a wide range of AI models on a global scale, while maintaining the lowest possible costs, is one of the most demanding infrastructure challenges in the industry. March 11, 2026Latest NewsComputer VisionMapping the World's Forests with Greater Precision: Introducing Canopy Height Maps v2Mar 10, 2026Computer VisionReducing Government Costs and Increasing Access to Greenspaces in the United Kingdom with DINOFeb 9, 2026Computer VisionHow DINO and SAM are Helping Modernize Essential Medical Triage PracticesDec 18, 2025Computer VisionThe Universities Space Research Association Applies Segment Anything Model for Responding to Flood EmergenciesDec 18, 2025Get the latest from AI at Meta in your inboxSign upOur approachAbout AI at MetaPeopleCareersResearchInfrastructureResourcesDemosMeta AIExplore Meta AIGet Meta AIAI StudioLatest newsBlogNewsletterFoundational modelsLlamaOur approachOur approachAbout AI at MetaPeopleCareersResearchResearchInfrastructureResourcesDemosMeta AIMeta AIExplore Meta AIGet Meta AIAI StudioLatest newsLatest newsBlogNewsletterFoundational modelsLlamaPrivacy PolicyTermsCookies Meta © 2026 Allow the use of cookies from Meta on this browser?We use essential cookies and similar technologies to help:Provide and improve content on Meta ProductsProvide a safer experience by using information we receive from cookies on and off Meta ProductsProvide and improve Meta Company Products for people using a Meta or Oculus accountWe use tools on Meta from other companies that also use cookies. These tools are used for things like:Advertising and measurement services off of Meta ProductsAnalyticsProviding certain featuresImproving our servicesYou can allow the use of all cookies, just essential cookies or you can choose more options below. You can learn more about cookies and how we use them, and review or change your choice at any time in our Cookie Policy.Essential cookiesThese cookies are required to use Meta Company Products. They’re necessary for Meta websites to work as intended.Optional cookiesOptional cookies from other companiesWe use tools from other companies for advertising and measurement services off of Meta Company Products, analytics, and to provide certain features and improve our services for you. These companies also use cookies.More informationIf you allow these cookies:We’ll be able to better personalize ads for you off of Meta Products, and measure their performanceFeatures on our products will not be affectedOther companies will receive information about you when you use cookiesIf you don’t allow these cookies:We won’t use cookies from other companies to help personalize ads for you off of Meta Products or measure ads performanceSome features on our products may not workOther ways you can control trackingManage your ad experience in Accounts CenterAd settingsIf you have added your Meta or Oculus account to the same Accounts Center as your Facebook or Instagram account, you can manage how different data is used to personalize ads in ad settings. To show you better ads, we use data that advertisers and other partners provide us about your activity off Meta Company Products, including websites and apps. You can control whether we use this data to show you ads in your ad settings.The Facebook Audience Network is a way for advertisers to show you ads in apps and websites off the Meta Company Products. One of the ways Audience Network shows relevant ads is by using your ad preferences to determine which ads you may be interested in seeing.Ad preferencesIn Ad preferences, you can choose whether we show you ads and make choices about the information used to show you ads.More information about online advertisingYou can opt out of seeing online interest-based ads from Meta and other participating companies through the Digital Advertising Alliance in the US, the Digital Advertising Alliance of Canada in Canada or the European Interactive Digital Advertising Alliance in Europe, or through your mobile device settings, if you are using Android, iOS 13 or an earlier version of iOS. Please note that ad blockers and tools that restrict our cookie use may interfere with these controls.The advertising companies we work with generally use cookies and similar technologies as part of their services. To learn more about how advertisers generally use cookies and the choices they offer, you can review the following resources:Digital Advertising AllianceDigital Advertising Alliance of CanadaEuropean Interactive Digital Advertising AllianceControlling cookies with browser settingsYour browser or device may offer settings that allow you to choose whether browser cookies are set and to delete them. These controls vary by browser, and manufacturers may change both the settings they make available and how they work at any time. As of 5 October 2020, you may find additional information about the controls offered by popular browsers at the links below. Certain parts of Meta Products may not work properly if you have disabled browser cookies. Please be aware these controls are distinct from the controls that Instagram and Facebook offer.Google ChromeInternet ExplorerFirefoxSafariSafari MobileOperaOnly allow essential cookiesAllow essential and optional cookies --- CREATE, REMIX AND SHAREBring your ideas to life with VibesCREATE, REMIX AND SHAREBring your ideas to life with VibesCreate and connect in new waysEasily make and share vibes—expressive AI-generated videos—with industry-leading media creation models.Create a vibeIMAGINE ANYTHINGADD YOURSELF & FRIENDSBRING IT TO LIFEEDIT AND RESTYLEEDIT/RESTYLEJust describe what you want to create in a few words, or animate your own images.Turn yourself and your friends into characters in your vibes, with complete control over when your likeness is used.Complete your vibe with lip synced dialogue and music, or add voiceover and text to express yourself in new ways.Get more creative control with presets and custom editing and animation prompts to transform characters, reimagine scenes and more.Discover and remix vibes from creatorsGet inspired and remix your favorite vibes to try new ideas or put your own twist on them.Explore vibesExperiment and playTry new ideas and experiment with voices, music and styles in an all-in-one creation flow.Share with friendsCreate vibes starring you and your friends, and easily share them on your favorite apps.Expand your imaginationGet even more advanced tools and more space to create with our desktop web experience.Advanced creation tools coming soon. Get notified.Get more done with your personal AISeamlessly access Meta AI through voice and text for tailored assistance throughout your day.Try Meta AIGet answers on-the-goLearn about the world around you, translate in real time, and stay connected to the moment with Meta AI glasses.Shop AI GlassesAccess Meta AI anywhereUse Meta AI wherever you need it—on your phone, from your laptop, and on Instagram, WhatsApp, and Facebook.Get Meta AIExplore and get more doneMeta AI is seamlessly integrated into your favorite apps, helping you learn about trending stories, catch up on group chats, and more.Get answers on-the-goLearn about the world around you, translate in real time, and stay connected to the moment with Meta AI glasses.Shop AI GlassesAccess Meta AI anywhereUse Meta AI wherever you need it—on your phone, from your laptop, and on Instagram, WhatsApp, and Facebook.Get Meta AIExplore and get more doneMeta AI is seamlessly integrated into your favorite apps, helping you learn about trending stories, catch up on group chats, and more.GO HANDS-FREEACCESS ANYWHEREGET MORE DONECONVERSATIONAL VOICE & TEXTTalk it outA conversational voice and text experience makes Meta AI more natural and easy to talk to. Get a glimpse into what’s coming with our full-duplex voice demo.Get Meta AICONVERSATIONAL VOICE & TEXTTalk it outA conversational voice and text experience makes Meta AI more natural and easy to talk to. Get a glimpse into what’s coming with our full-duplex voice demo.Get Meta AITAILORED EXPERIENCE Get more personalized responsesMeta AI learns and remembers your preferences and interests to give you more relevant, helpful answers, recommendations, and more.Get Meta AITAILORED EXPERIENCE Get more personalized responsesMeta AI learns and remembers your preferences and interests to give you more relevant, helpful answers, recommendations, and more.Get Meta AIPRODUCTIVITY Level up your writingWith a single prompt, Meta AI can generate full documents with rich text and images to help you write, edit, and create faster.Get Meta AIPRODUCTIVITY Level up your writingWith a single prompt, Meta AI can generate full documents with rich text and images to help you write, edit, and create faster.Get Meta AISEAMLESS ASSISTANCE Pick up where you left offKeep using Meta AI as you move between the Meta AI app, web, Ray-Ban Meta glasses, and your favorite apps including Instagram and Facebook.Get Meta AISEAMLESS ASSISTANCE Pick up where you left offKeep using Meta AI as you move between the Meta AI app, web, Ray-Ban Meta glasses, and your favorite apps including Instagram and Facebook.Get Meta AICONVERSATIONAL VOICE & TEXTTalk it outA conversational voice and text experience makes Meta AI more natural and easy to talk to. Get a glimpse into what’s coming with our full-duplex voice demo.Get Meta AICONVERSATIONAL VOICE & TEXTTalk it outA conversational voice and text experience makes Meta AI more natural and easy to talk to. Get a glimpse into what’s coming with our full-duplex voice demo.Get Meta AITAILORED EXPERIENCE Get more personalized responsesMeta AI learns and remembers your preferences and interests to give you more relevant, helpful answers, recommendations, and more.Get Meta AITAILORED EXPERIENCE Get more personalized responsesMeta AI learns and remembers your preferences and interests to give you more relevant, helpful answers, recommendations, and more.Get Meta AIPRODUCTIVITY Level up your writingWith a single prompt, Meta AI can generate full documents with rich text and images to help you write, edit, and create faster.Get Meta AIPRODUCTIVITY Level up your writingWith a single prompt, Meta AI can generate full documents with rich text and images to help you write, edit, and create faster.Get Meta AISEAMLESS ASSISTANCE Pick up where you left offKeep using Meta AI as you move between the Meta AI app, web, Ray-Ban Meta glasses, and your favorite apps including Instagram and Facebook.Get Meta AISEAMLESS ASSISTANCE Pick up where you left offKeep using Meta AI as you move between the Meta AI app, web, Ray-Ban Meta glasses, and your favorite apps including Instagram and Facebook.Get Meta AISee what else is happening at AI at MetaThe LatestBringing More Real-Time News and Content to Meta AIRead moreAthletic Intelligence Is Here, Meet Oakley Meta VanguardSEPT 17, 2025Meta Ray-Ban Display: AI Glasses With an EMG WristbandSEPT 17, 2025Introducing the Meta AI App: A New Way to Access Your AI AssistantAPR 29, 2025




