Ainda nao ha planos de preco detalhados para esta ferramenta.
Deploy High-Performance AI Models with EaseExperience lightning-fast AI inference with Release.ai. Our platform offers sub-100ms latency, enterprise-grade security, and seamless scalability for all your AI deployment needs.Deploy an AI Model NowSchedule a DemoStart with 5 free GPU hours in our Sandbox accountWhy Choose Release.ai for InferenceHigh-Performance InferenceDeploy models with sub-100ms latency. Our optimized infrastructure ensures rapid response times for your AI applications.Seamless ScalabilityAutomatically scale from zero to thousands of concurrent requests. Our platform grows with your needs, ensuring consistent performance.Enterprise-Grade SecurityBenefit from SOC 2 Type II compliance, private networking, and end-to-end encryption. Your models and data remain secure and compliant.Optimized InfrastructureLeverage our fine-tuned infrastructure for various model types. From LLMs to computer vision, we've optimized for peak performance.Easy IntegrationIntegrate with your existing stack using our comprehensive SDKs and APIs. Deploy models with just a few lines of code.Reliable MonitoringKeep track of your model's performance with real-time monitoring and detailed analytics. Identify and resolve issues quickly.Cost-Effective PricingPay only for what you use. Our pricing scales with your usage, ensuring you get the best value for your inference needs.Expert SupportGet assistance from our team of ML experts. We're here to help you optimize your models and resolve any issues you encounter.Explore and Deploy State-of-the-Art AI ModelsShowing 25 of 152 modelsPreviousPage 1 of 7Nextdeepseek-r1DeepSeek's first-generation of reasoning models with comparable performance to OpenAI-o1, including six dense models distilled from DeepSeek-R1 based on Llama and Qwen.1.5b7b8b14b32bAvailable in Upgraded Plans70bAvailable in Upgraded Plans671bAvailable in Upgraded Plans1.4M pullsUpdated 5 days agoDeploy deepseek-r1olmo2OLMo 2 is a new family of 7B and 13B models trained on up to 5T tokens. These models are on par with or better than equivalently sized fully open models, and competitive with open-weight models such as Llama 3.1 on English academic benchmarks.7b13b8,003 pullsUpdated 2 weeks agoDeploy olmo2command-r7bThe smallest model in Cohere's R series delivers top-tier speed, efficiency, and quality to build powerful AI applications on commodity GPUs and edge devices.tools7b8,877 pullsUpdated 10 days agoDeploy command-r7bdeepseek-v3A strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.671bAvailable in Upgraded Plans96.7K pullsUpdated 2 weeks agoDeploy deepseek-v3phi4Phi-4 is a 14B parameter, state-of-the-art open model from Microsoft.14b194.9K pullsUpdated 2 weeks agoDeploy phi4dolphin3Dolphin 3.0 Llama 3.1 8B 🐬 is the next generation of the Dolphin series of instruct-tuned models designed to be the ultimate general purpose local model, enabling coding, math, agentic, function calling, and general use cases.8b36.9K pullsUpdated 3 weeks agoDeploy dolphin3smallthinkerA new small reasoning model fine-tuned from the Qwen 2.5 3B Instruct model.3b34.7K pullsUpdated 3 weeks agoDeploy smallthinkergranite3.1-denseThe IBM Granite 2B and 8B models are text-only dense LLMs trained on over 12 trillion tokens of data, demonstrated significant improvements over their predecessors in performance and speed in IBM's initial testing.tools2b8b40.7K pullsUpdated 9 days agoDeploy granite3.1-densegranite3.1-moeThe IBM Granite 1B and 3B models are long-context mixture of experts (MoE) Granite models from IBM designed for low latency usage.tools1b3b17.1K pullsUpdated 9 days agoDeploy granite3.1-moefalcon3A family of efficient AI models under 10B parameters performant in science, math, and coding through innovative training techniques.1b3b7b10b21.3K pullsUpdated 5 weeks agoDeploy falcon3granite-embeddingThe IBM Granite Embedding 30M and 278M models models are text-only dense biencoder embedding models, with 30M available in English only and 278M serving multilingual use cases.embedding30m278m9,424 pullsUpdated 5 weeks agoDeploy granite-embeddingexaone3.5EXAONE 3.5 is a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research.2.4b7.8b32bAvailable in Upgraded Plans11.7K pullsUpdated 6 weeks agoDeploy exaone3.5llama3.3New state of the art 70B model. Llama 3.3 70B offers similar performance compared to the Llama 3.1 405B model.tools70bAvailable in Upgraded Plans881.8K pullsUpdated 7 weeks agoDeploy llama3.3snowflake-arctic-embed2Snowflake's frontier embedding model. Arctic Embed 2.0 adds multilingual support without sacrificing English performance or scalability.embedding568m16.8K pullsUpdated 7 weeks agoDeploy snowflake-arctic-embed2sailor2Sailor2 are multilingual language models made for South-East Asia. Available in 1B, 8B, and 20B parameter sizes.1b8b20bAvailable in Upgraded Plans4,604 pullsUpdated 7 weeks agoDeploy sailor2qwqQwQ is an experimental research model focused on advancing AI reasoning capabilities.tools32bAvailable in Upgraded Plans150.4K pullsUpdated 8 weeks agoDeploy qwqmarco-o1An open large reasoning model for real-world solutions by the Alibaba International Digital Commerce Group (AIDC-AI).7b27K pullsUpdated 7 weeks agoDeploy marco-o1tulu3Tülu 3 is a leading instruction following model family, offering fully open-source data, code, and recipes by the The Allen Institute for AI.8b70bAvailable in Upgraded Plans11.5K pullsUpdated 5 weeks agoDeploy tulu3athene-v2Athene-V2 is a 72B parameter model which excels at code completion, mathematics, and log extraction tasks.tools72bAvailable in Upgraded Plans66.3K pullsUpdated 2 months agoDeploy athene-v2opencoderOpenCoder is an open and reproducible code LLM family which includes 1.5B and 8B models, supporting chat in English and Chinese languages.1.5b8b18.4K pullsUpdated 2 months agoDeploy opencoderllama3.2-visionLlama 3.2 Vision is a collection of instruction-tuned image reasoning generative models in 11B and 90B sizes.vision11b90bAvailable in Upgraded Plans944K pullsUpdated 2 months agoDeploy llama3.2-visionsmollm2SmolLM2 is a family of compact language models available in three size: 135M, 360M, and 1.7B parameters.tools135m360m1.7b129.1K pullsUpdated 2 months agoDeploy smollm2granite3-guardianThe IBM Granite Guardian 3.0 2B and 8B models are designed to detect risks in prompts and/or responses.2b8b5,594 pullsUpdated 2 months agoDeploy granite3-guardianaya-expanseCohere For AI's language models trained to perform well across 23 different languages.tools8b32bAvailable in Upgraded Plans29.8K pullsUpdated 3 months agoDeploy aya-expansegranite3-denseThe IBM Granite 2B and 8B models are designed to support tool-based use cases and support for retrieval augmented generation (RAG), streamlining code generation, translation and bug fixing.tools2b8b42.4K pullsUpdated 2 months agoDeploy granite3-densePreviousPage 1 of 7NextHow Release.ai ComparesSee why leading companies choose Release.ai for their AI deployment needsFeatureRelease.aiBaseten.coModel Deployment TimeUnder 5 minutesInstant deployment with pre-configured environments15-30 minutesManual configuration requiredInfrastructure ManagementFully automatedZero-config infrastructure with automatic scalingPartially automatedSome manual configuration neededPerformance OptimizationSub-100ms latencyOptimized for high-performance inferenceVariable latencyPerformance varies by configurationSecurity FeaturesEnterprise-gradeSOC 2 Type II compliant with end-to-end encryptionStandardBasic security featuresScaling CapabilitiesAutomaticZero to thousands of concurrent requestsManual configurationRequires manual scaling setupExperience the Release.ai difference with our enterprise-grade AI deployment platformDeploy Your First ModelReady to Deploy Your AI Model?Experience the power of high-performance, secure, and scalable AI inference with our optimized deployment platform.Deploy Your Model Now