Nebius AI Studio
Website: https://nebius.com/services/token-factory/inference-service
Detailed pricing plans are not available yet for this tool.
Join our webinar 'Aether 3.5: Q1 platform release showcase'. Register now!Unlimited scalability guaranteeRun models through our dedicated endpoints with autoscaling throughput and consistent performance. Scale seamlessly from prototype to production, no rate throttling, no GPU wrangling.Up to 3× cost efficiencyExperience transparent $/token pricing and right-sized serving for RAG, contextual and agentic use cases. Pay only for what you use, with volume discounts and optimized serving pipelines that deliver up to 3× better cost-to-performance (independently verified by Artificial Analysis).Ultra-low latency, verifiedOur serving pipeline delivers sub-second time-to-first-token, validated by internal and third-party benchmarks. Multi-region routing and speculative decoding keep response times stable under load.Benchmark-backed model qualityAll hosted models undergo internal validation for accuracy, consistency and multilingual robustness, ensuring production-grade results across diverse workloads.Choose speed or economySelect between Fast and Base flavors. Fast for optimized for lowest latency and interactive workloads and base for cost-efficient for high-volume inference or background processing. Switch instantly, no redeploys required.No MLOps requiredToken Factory gives you enterprise-ready infrastructure out of the box. Provision, deploy and scale without managing GPUs or clusters, our endpoints are already optimized for performance and reliability.Zülküf GençDirector of AI“Prosus, the power behind some of the world’s leading lifestyle and e-commerce brands, has achieved up to 26x cost reductions compared to proprietary models. We move fast, test and iterate quickly, and the flexibility, products and quick responses from Nebius Token Factory allowed us to keep this pace all the way through production. By leveraging Nebius Token Factory’s dedicated endpoints, Prosus was able to secure guaranteed performance and isolation. The addition of autoscaling was the game-changer, allowing us to handle massive workloads of up to 200 billion tokens per day without manual intervention.”Zülküf GençDirector of AIAlex MashrabovFounder and CEO“Running inference at scale with healthy economics requires efficient on-demand and autoscaling capabilities. Nebius was the only provider that met our requirements — reducing overhead, simplifying management, and enabling us to deliver faster, more cost-efficient AI in production.”Alex MashrabovFounder and CEOJulien ChaumontCTO“Hugging Face and Nebius Token Factory share the same mission of making open AI accessible and scalable. By partnering with Nebius, we’ve been able to provide faster and more reliable inference for developers working with large open-source models.”Julien ChaumontCTORaghav KohliGeneral Counsel, Partnerships“Mithril and Nebius share a vision for making open AI production-ready. By leveraging Nebius Token Factory’s scalable infrastructure, from real-time to batch inference, we’re able to run and optimize large workloads efficiently while keeping the flexibility and transparency of open models.”Raghav KohliGeneral Counsel, PartnershipsBenchmark-backed performance and cost efficiencyup to 4.5× fasterTime-to-first-token up to 4.5 times faster in Europe than other inference providers. Consistent sub-second latency validated by independent testsmore than 2.5× cheaperThan GPT-4o with comparable quality on Llama-405BTop-2 throughput worldwideVerified on DeepSeek R1 0528 (248 output tokens/sec, outperforming major hyperscalers by up to 3×)Top open-source models availableOpenAIgpt-oss-120BA 120B-parameter open model delivering near-GPT-4-class performance on complex reasoning and code generation tasks, with transparent weights and fast inference throughput.131k contextOpen licenseMoonshot AIKimi-K2-InstructHigh-accuracy generalist model optimized for reasoning, dialogue, and structured generation. Strong performance on multilingual and long-context tasks.131k contextProprietary open licenseNousResearchHermes-4-405BA 405B-parameter instruction model built for nuanced conversation, long-form reasoning, and alignment fidelity. A community-driven alternative to closed-weight instruction models.128k contextCustom open licenseZhipuAIGLM-4.5Compact and efficient 128k-context model that delivers exceptional reasoning and code performance per token. A balanced choice for enterprises prioritizing cost-to-quality efficiency.128k contextApache 2.0 LicenseQwenQwen3-Coder-480B-A35B-InstructMassive 480B-parameter code-specialized model for high-precision programming, reasoning and math. Features 262k context and fine-grained JSON control for structured output.262k contextApache 2.0 LicenseQwenQwen3-235B-A22B-Thinking-2507The latest large reasoning model in the Qwen family, delivering top performance on chain-of-thought and math reasoning. Designed for long-context enterprise workloads.262k contextApache 2.0 LicenseDeepSeekDeepSeek-R1-0528State-of-the-art reasoning model achieving GPT-4o-level performance on math, code, and logic. Independently verified by Artificial Analysis for leading throughput and inference speed.164k contextMIT LicenseNebiusAnd much more...Take a look at our Playground to see the models available today. We're continuously adding new and diverse models to expand our offeringsJoin our communityFollow Nebius Token Factory' X account for instant updates, LinkedIn for those who want more detailed news, and Discord for technical inquiries and meaningful community discussions.X/TwitterLinkedInDiscordA simple and friendly UI for a smooth user experienceSign up and start testing, comparing and running AI models in your applications.Try nowFamiliar API at your fingertips import openai import os client = openai.OpenAI( api_key=os.environ.get("NEBIUS_API_KEY"), base_url='https://api.tokenfactory.nebius.com/' ) completion = client.chat.completions.create( messages=[{ 'role': 'user', 'content': 'What is the answer to all questions?' }], model='meta-llama/Meta-Llama-3.1-8B-Instruct-fast' ) Learn more about our APIOptimize costs with our flexible pricingStart freeGet started in minutes with free credits to explore 60+ open-source models directly in the Playground or through API. No setup, no infrastructure to manage, just plug in your key and start generating.Flexible performance tiersChoose between two optimized configurations to match your workload: Fast: sub-second responses for interactive agents, chat, or real-time inference. Base: cost-efficient throughput for large-scale or background processing. Switch tiers instantly, same API, same endpoints. Enterprise-ready deploymentScale securely from prototype to enterprise, with predictable performance and transparent $/token pricing with: Guaranteed throughput and autoscaling. 99.9% SLA and regional routing. RBAC, unified billing and SOC 2 type II with HIPAA, ISO 27001 compliance. Nebius Token Factory pricesScale from shared access to dedicated endpoints with 99.9% SLA, transparent $/token and volume discounts for production.Check out our self-service pricesReach out for dedicated endpointsQ&A about Inference ServiceCan I use your service for large production workloads?Yes. Nebius Token Factory is built for large-scale, production-grade AI workloads. Dedicated endpoints deliver sub-second inference, 99.9% uptime, and autoscaling throughput, ensuring consistent performance for workloads exceeding hundreds of millions of tokens per minute. Scale seamlessly from experimentation to global deployment, no rate throttles, no GPU management.I’d like to use another open-source model, what do I do?We regularly onboard new open-source releases, including Llama, GPT OSS, Qwen, DeepSeek, Mistral and Flux, based on customer demand and benchmarking. Enterprise users can also request model optimization or custom deployment support through our Solutions team.Can I get a dedicated instance?Yes. Dedicated endpoints provide guaranteed isolation, predictable latency, and reserved compute capacity. They include 99.9% SLA, custom autoscaling, and optional regional deployment in EU or US data centers. Contact our team to size your instance for your workload and compliance needs.How do I deploy my custom fine-tuned models?You can upload and host your LoRA or fully fine-tuned models directly through the Token Factory dashboard or API. Each deployment runs with transparent $/token pricing and inherits the same security and latency guarantees as standard endpoints. Full post-training and distillation workflows will soon be available to simplify training, optimization, and deployment in a single pipeline.How secure is your service and where does my data go?We provide zero-retention mode, where requests and outputs are never stored or reused for training. All data is processed in facilities with SOC 2 Type II, HIPAA and ISO 27001 certifications. Data centers are located in Finland and France and US and meet EU and US data-residency requirements.What are your rate limits and can they be increased?Starter includes high defaults; enterprise removes caps. We size endpoints to your traffic profile. Enterprise customers can lift all limits, unlimited throughput, autoscaling to demand, and per-project tuning based on traffic profiles. Need more? Reach out to our team, we’ll size your endpoint for your real-world workload.How can I build RAG applications on your platform?Nebius Token Factory includes all the building blocks for retrieval-augmented generation (RAG), like state-of-the-art embedding models and seamless integration with our chat and inference APIs. For more information, please check our cookbook.What models do you support and can I request new ones?You can request new models directly through your dashboard.How does your pricing compare to other providers?Pricing is transparent $/token, with clear input/output separation and volume discounts as you scale. No hidden infrastructure or idle GPU costs, pay only for what you serve.Do you offer enterprise SLAs, support, and compliance options?Yes. Enterprise customers benefit from: 99.9% SLA with reserved capacity. Dedicated Slack / support channel. Custom DPAs and compliance packages (SOC 2 Type II, HIPAA, ISO 27001). SSO, RBAC, and unified billing for secure team governance. These options are designed for regulated industries and organizations running mission-critical AI in production. This website uses cookiesNebius uses necessary cookies to make your browsing secure and fast. Further, upon your consent, we also use cookies and trackers to ensure we provide personalized content to you better tailored to your interests and advertising. With these cookies, we and our partners collect information about you and track your internet behaviour. You can also choose which cookies to set by managing them separately.Manage cookiesRequired onlyAllow all --- Join our webinar 'Aether 3.5: Q1 platform release showcase'. Register now!AI is rapidly becoming a general-purpose technology, moving from research into large-scale production systems and reshaping cloud infrastructure requirements. Nebius is the AI cloud company, delivering a unified platform that spans the complete AI journey — from data and model training and tuning to production runtime and deployment. Founded around deep in-house technological expertise, Nebius brings a strong engineering culture rooted in designing and operating large-scale platforms that run reliably at global scale. The company serves AI builders and enterprises worldwide across industries including healthcare and life sciences, robotics and physical AI, financial services, media & entertainment, retail and many others. Nebius is listed on Nasdaq (NBIS) and headquartered in Amsterdam.Meet our leadership teamArkady VolozhFounder, Chief Executive Officer, Board memberOphir NaveChief Operating Officer, Board memberRoman CherninChief Business OfficerAndrey KorolenkoChief Infrastructure and Product OfficerDado AlonsoChief Financial OfficerMarc BoroditskyChief Revenue OfficerDanila ShtanChief Technology OfficerBoaz TalGeneral CounselDaniel BoundsChief Marketing OfficerTom BlackwellChief Communications OfficerElena BuninaHead of Nebius Academy, Board memberWhat’s in our name?Combining nebula — Latin for “cloud” — with a reference to the infinite loop of the Möbius strip, Nebius embodies our belief in the limitless possibilities created by our AI cloud.Our other businessesAs well as our core AI infrastructure business, Nebius Group includes other companies growing under distinctive individual brands.AvrideAvride develops autonomous cars and delivery robots for sectors such as ride-hailing, logistics, e-commerce, and food and grocery delivery, with use cases including passenger rides, hub-to-warehouse deliveries and package deliveries to end customers. Avride’s highly experienced team has deployed and operated its autonomous driving solutions in many different contexts, ensuring both safety and efficiency across a wide range of real-world applications.Go to siteTripletenTripleTen is a leading edtech platform, specializing in reskilling and upskilling individuals for successful careers in tech. Through its proprietary learning platform, it offers affordable, high-quality training in a blend of bootcamp and MOOC formats, with course content developed in-house. It also provides career services to its graduates, with more than 40 partners offering job opportunities. TripleTen operates in the US and Latin America, and was named Best Software Bootcamp in the US by Fortune magazine.Go to siteOther assetsNebius is constantly innovating. We own equity stakes in other companies, including some that began as internal initiatives before growing and spinning off as successful standalone ventures.TolokaToloka is a data partner for all stages of AI development from training to evaluation.ClickHouseClickHouse is a fast, open-source database management system built for real-time data processing and analytics at scale.Nebius AcademyNebius Academy offers advanced, open online and in-person courses, as well as corporate programs in machine learning and generative AI. These expert-led programs are designed to help engineers, developers, and tech leaders build their expertise and apply it to real-world challenges. Nebius Academy also offers cloud grants, giving partners access to extra computational resources, and collaborates with top institutions around the world to support research and education.Go to siteThis website uses cookiesNebius uses necessary cookies to make your browsing secure and fast. Further, upon your consent, we also use cookies and trackers to ensure we provide personalized content to you better tailored to your interests and advertising. With these cookies, we and our partners collect information about you and track your internet behaviour. You can also choose which cookies to set by managing them separately.Manage cookiesRequired onlyAllow all --- Our speakersMarouane KhoukhDeveloper AdvocateLinkedInNarek TatevosyanDirector of Product Management at NebiusLinkedInFill out the form to register and get the recordingFill out the form to register and get the recordingFirst name*Last name*Email*Phone numberAfghanistan (افغانستان)Albania (Shqipëri)Algeria (الجزائر)American SamoaAndorraAngolaAnguillaAntigua and BarbudaArgentinaArmenia (Հայաստան)ArubaAustraliaAustria (Österreich)Azerbaijan (Azərbaycan)BahamasBahrain (البحرين)Bangladesh (বাংলাদেশ)BarbadosBelarus (Беларусь)Belgium (België)BelizeBenin (Bénin)BermudaBhutan (འབྲུག)BoliviaBosnia and Herzegovina (Босна и Херцеговина)BotswanaBrazil (Brasil)British Indian Ocean TerritoryBritish Virgin IslandsBruneiBulgaria (България)Burkina FasoBurundi (Uburundi)Cambodia (កម្ពុជា)Cameroon (Cameroun)CanadaCape Verde (Kabu Verdi)Caribbean NetherlandsCayman IslandsCentral African Republic (République centrafricaine)Chad (Tchad)ChileChina (中国)ColombiaComoros (جزر القمر)Congo (DRC) (Jamhuri ya Kidemokrasia ya Kongo)Congo (Republic) (Congo-Brazzaville)Cook IslandsCosta RicaCôte d’IvoireCroatia (Hrvatska)CubaCuraçaoCyprus (Κύπρος)Czech Republic (Česká republika)Denmark (Danmark)DjiboutiDominicaDominican Republic (República Dominicana)EcuadorEgypt (مصر)El SalvadorEquatorial Guinea (Guinea Ecuatorial)EritreaEstonia (Eesti)EthiopiaFalkland Islands (Islas Malvinas)Faroe Islands (Føroyar)FijiFinland (Suomi)FranceFrench Guiana (Guyane française)French Polynesia (Polynésie française)GabonGambiaGeorgia (საქართველო)Germany (Deutschland)Ghana (Gaana)GibraltarGreece (Ελλάδα)Greenland (Kalaallit Nunaat)GrenadaGuadeloupeGuamGuatemalaGuinea (Guinée)Guinea-Bissau (Guiné Bissau)GuyanaHaitiHondurasHong Kong (香港)Hungary (Magyarország)Iceland (Ísland)India (भारत)IndonesiaIran (ایران)Iraq (العراق)IrelandIsrael (ישראל)Italy (Italia)JamaicaJapan (日本)Jordan (الأردن)Kazakhstan (Казахстан)KenyaKiribatiKosovoKuwait (الكويت)Kyrgyzstan (Кыргызстан)Laos (ລາວ)Latvia (Latvija)Lebanon (لبنان)LesothoLiberiaLibya (ليبيا)LiechtensteinLithuania (Lietuva)LuxembourgMacau (澳門)Macedonia (FYROM) (Македонија)Madagascar (Madagasikara)MalawiMalaysiaMaldivesMaliMaltaMarshall IslandsMartiniqueMauritania (موريتانيا)Mauritius (Moris)Mexico (México)MicronesiaMoldova (Republica Moldova)MonacoMongolia (Монгол)Montenegro (Crna Gora)MontserratMorocco (المغرب)Mozambique (Moçambique)Myanmar (Burma) (မြန်မာ)Namibia (Namibië)NauruNepal (नेपाल)Netherlands (Nederland)New Caledonia (Nouvelle-Calédonie)New ZealandNicaraguaNiger (Nijar)NigeriaNiueNorfolk IslandNorth Korea (조선 민주주의 인민 공화국)Northern Mariana IslandsNorway (Norge)Oman (عُمان)Pakistan (پاکستان)PalauPalestine (فلسطين)Panama (Panamá)Papua New GuineaParaguayPeru (Perú)PhilippinesPoland (Polska)PortugalPuerto RicoQatar (قطر)Réunion (La Réunion)Romania (România)Russia (Россия)RwandaSaint Barthélemy (Saint-Barthélemy)Saint HelenaSaint Kitts and NevisSaint LuciaSaint Martin (Saint-Martin (partie française))Saint Pierre and Miquelon (Saint-Pierre-et-Miquelon)Saint Vincent and the GrenadinesSamoaSan MarinoSão Tomé and Príncipe (São Tomé e Príncipe)Saudi Arabia (المملكة العربية السعودية)Senegal (Sénégal)Serbia (Србија)SeychellesSierra LeoneSingaporeSint MaartenSlovakia (Slovensko)Slovenia (Slovenija)Solomon IslandsSomalia (Soomaaliya)South AfricaSouth Korea (대한민국)South Sudan (جنوب السودان)Spain (España)Sri Lanka (ශ්රී ලංකාව)Sudan (السودان)SurinameSwazilandSweden (Sverige)Switzerland (Schweiz)Syria (سوريا)Taiwan (台灣)TajikistanTanzaniaThailand (ไทย)Timor-LesteTogoTokelauTongaTrinidad and TobagoTunisia (تونس)Turkey (Türkiye)TurkmenistanTurks and Caicos IslandsTuvaluU.S. Virgin IslandsUgandaUkraine (Україна)United Arab Emirates (الإمارات العربية المتحدة)United KingdomUnited StatesUruguayUzbekistan (Oʻzbekiston)VanuatuVatican City (Città del Vaticano)VenezuelaVietnam (Việt Nam)Wallis and FutunaYemen (اليمن)ZambiaZimbabweJob title*Please selectData AnalystData EngineerData ScientistML EngineerDevOps Specialist / ManagerMLOps Specialist / ManagerArchitectFounderProduct ManagerCTOCIOCPOCEOVenture CapitalistEcosystem partnerResearch / professorStudentOtherCompany name*Company website*UTM CampaignUTM ContentUTM TermUTM SourceUTM MediumForm typeEvent codeI’d like to receive marketing communications from Nebius about:Nebius AI CloudNebius Token FactoryNebius for AI BuildersYou can manage your preferences and opt out at any time. For more information on how to unsubscribe, our privacy practices, and how we are committed to protecting and respecting your privacy, please review our Privacy Policy.Try Nebius AI Cloud console todayGet immediate access to NVIDIA® GPUs, along with CPU resources, storage and additional services through our user-friendly self-service console.Get startedThis website uses cookiesNebius uses necessary cookies to make your browsing secure and fast. Further, upon your consent, we also use cookies and trackers to ensure we provide personalized content to you better tailored to your interests and advertising. With these cookies, we and our partners collect information about you and track your internet behaviour. You can also choose which cookies to set by managing them separately.Manage cookiesRequired onlyAllow all --- Join our webinar 'Aether 3.5: Q1 platform release showcase'. Register now!Start for freeBegin with $1 in free credits to explore our models through the Playground or API. Start building in minutes.PlaygroundThe Nebius Token Factory provides a model playground: a web interface to try out and compare different AI models available in Nebius Token Factory without writing any code.Two flavorsChoose between fast and base flavors to suit your project needs. Fast flavor delivers quicker results for time-sensitive tasks, while base flavor offers economical processing for larger workloads.Text to textVisionImageEmbeddingsGuardrailsPost-trainingEnterprise-grade inferenceText to textPrices shown are per 1 million tokens. Batch inference is automatically billed at 50% of the base real-time model price, rounded up to the nearest cent. Example: If a model’s base price is $0.13 input and $0.40 output, batch inference is $0.07 input and $0.20 output respectively.ModelFlavorInputOutputgpt-oss-120bfast––base$0.15$0.60gpt-oss-20bfast––base$0.05$0.20Kimi-K2-Instructfast––base$0.50$2.40Qwen/Qwen3-Coder-480B-A35B-Instructfast––base$0.40$1.80Qwen3-235B-A22B-Thinking-2507fast––base$0.20$0.80Qwen3-235B-A22B-Instruct-2507fast––base$0.20$0.60Qwen3-30B-A3B-Thinking-2507fast––base$0.10$0.30Qwen3-30B-A3B-Instruct-2507fast––base$0.10$0.30Qwen3-Coder-30B-A3B-Instructfast––base$0.10$0.30Qwen3-30B-A3Bfast––base$0.10$0.30Qwen3-32Bfast$0.20$0.60base$0.10$0.30Qwen3-14Bfast––base$0.08$0.24Qwen2.5-Coder-7Bfast––base$0.03$0.09Qwen2.5-72B-Instructfast––base$0.13$0.40QwQ-32Bfast$0.50$1.50base$0.15$0.45GLM-4.5fast––base$0.60$2.20GLM-4.5-Airfast––base$0.20$1.20DeepSeek-R1-0528fast$2.00$6.00base$0.80$2.40DeepSeek-V3-0324fast$0.75$2.25base$0.50$1.50DeepSeek-V3fast––base$0.50$1.50Meta/Llama-3.3-70B-Instructfast$0.25$0.75base$0.13$0.40Meta/Llama-3.1-8B-Instructfast$0.03$0.09base$0.02$0.06Meta/Llama-3.1-405B-Instructfast––base$1.00$3.00Llama-3_1-Nemotron-Ultra-253B-v1fast––base$0.60$1.80Gemma-2-2b-itfast––base$0.02$0.06Gemma-2-9b-itfast––base$0.03$0.09Devstral-Small-2505fast––base$0.08$0.24Hermes-4-405Bfast––base$1.00$3.00Hermes-4-70Bfast––base$0.13$0.40Hermes-3-Llama-3.1-405Bfast––base$1.00$3.00

-min.png)
