Detailed pricing plans are not available yet for this tool.
Start building withopen models irm https://ollama.com/install.ps1 | iex paste this in PowerShell, or download Ollama Explore → Coding OpenClaw Code with open models Launch Claude Code, Codex, OpenCode, and more — powered by the best open models. ollama launch claude More integrations → $ ollama launch claude ✦ Claude Code v2.1.37 Welcome back! qwen3 Tips for getting started Run /init to create a CLAUDE.md Recent activity No recent activity ❯ ~ | Your open source AI assistant OpenClaw automates your work, answers questions, and handles tasks — powered by open models. ollama launch openclaw Learn more about OpenClaw → $ ollama launch openclaw Launching OpenClaw... OpenClaw has been configured with Ollama. Run any app or agent with open models Connect the latest open models to your favorite applications or agents, making it easy to switch between them ollama View documentation → $ ollama Ollama 0.18.3 ▸ Run a model Launch Claude Code Launch Codex (not installed) Launch OpenClaw More... ↑/↓ navigate • enter launch • → change model • esc quit Over 40,000 integrations Coding Codex Claude Code OpenCode Documents & RAG LangChain LlamaIndex AnythingLLM Automation OpenClaw n8n Dify Chat Open WebUI Onyx Msty View all → Sign up for an account Receive updates when new models are released Access cloud hardware to run faster, larger models Customize & share models with others Create account --- Pricing Free Get started with Ollama $0 Download Automate coding, document analysis, and other tasks with open models Keep your data private Run models on your hardware Access cloud models CLI, API, and desktop apps 40,000+ community integrations Unlimited public models Pro Solve harder tasks, faster $20 / mo or $200/yr billed annually Get Pro Everything in Free, plus: Run 3 cloud models at a time 50x more cloud usage than Free Upload and share private models Max For your most demanding work $100 / mo Get Max Everything in Pro, plus: Run 10 cloud models at a time 5x more usage than Pro Frequently asked questions Models Which models are available? See the full list of cloud-enabled models here. Do models support tool calling? Yes. Cloud models that are trained to support tools are tested for tool calling and with real agent workflows before they go live. If something isn't working, let us know at hello@ollama.com. What quantization or data format do cloud models use? Native weights, as released by the model provider. On modern NVIDIA hardware, models may use accelerated data formats supported by Blackwell and Vera Rubin architectures (e.g. NVFP4). How fast is Ollama? Speed depends on model size, architecture, and hardware optimization. We target and monitor for low time-to-first-token and high throughput across all cloud models. Priority tiers with faster performance may be available in the future. Usage What are the usage limits for each plan? Running models on your own hardware is always unlimited. Cloud usage varies by plan: Plan Usage Example use cases Free Light usage Chatting with models, evaluating larger models, coding and AI assistants with smaller models Pro Day-to-day work Larger models, coding automation, deep research Max Heavy, sustained usage Continuous agent tasks, multiple concurrent agents, large models over extended sessions Each plan has session limits that reset every 5 hours and weekly limits that reset every 7 days. How is usage measured? Usage reflects actual utilization of Ollama's cloud infrastructure - primarily GPU time, which depends on model size and request duration. Shorter requests and prompts that share cached context use less. This is different from fixed token or request-based plans. Ollama doesn't cap you at a set number of tokens. As hardware and model architectures get more efficient, you'll get more out of your plan over time. Can I purchase additional usage? Soon. Additional usage at competitive per-token rates, including cache-aware pricing, is coming. How much more usage does Pro include? 50x more than Free. How much more usage does Max include? 5x more than Pro. How do I know when I've hit my limit? Check your usage here anytime. At 90% of your plan's limit, Ollama sends an email reminder. You can turn this off in settings. How many cloud models can I run at once? Concurrency limits ensure dedicated capacity for workflows that need multiple models running simultaneously: Plan Concurrent models Free 1 Pro 3 Max 10 Requests beyond your plan's concurrency limit are queued and processed as soon as a slot is available. Queued requests are held up to a fixed limit - if the queue is full, the request will be rejected until one of your concurrency slots opens. Privacy Where are models hosted? Ollama hosts models and compute resources primarily in the United States. To serve global demand, we may route to Europe and Singapore for additional capacity. Is my prompt or response data trained on? Prompt or response data is never logged or trained on. Who does Ollama partner with to host models? Ollama collaborates with NVIDIA Cloud Providers (NCPs) to host open models. When Ollama partners with providers, we require no logging, no training, and zero data retention policies in place. --- Popular Newest ⇅ Ollama Search for models on Ollama. Cloud Embedding Vision Tools Thinking Popular Newest nemotron-cascade-2 An open 30B MoE model from NVIDIA with 3B activated parameters that delivers strong reasoning and agentic capabilities. tools thinking 30b 23.8K Pulls 3 Tags Updated 6 days ago minimax-m2.7 MiniMax's M2-series model for coding, agentic workflows, and professional productivity. cloud 30.6K Pulls 1 Tag Updated 1 week ago qwen3.5 Qwen 3.5 is a family of open-source multimodal models that delivers exceptional utility and performance. vision tools thinking cloud 0.8b 2b 4b 9b 27b 35b 122b 3.4M Pulls 30 Tags Updated 3 weeks ago lfm2 LFM2 is a family of hybrid models designed for on-device deployment. LFM2-24B-A2B is the largest model in the family, scaling the architecture to 24 billion parameters while keeping inference efficient. tools 24b 994.5K Pulls 6 Tags Updated 1 month ago qwen3-coder-next Qwen3-Coder-Next is a coding-focused language model from Alibaba's Qwen team, optimized for agentic coding workflows and local development. tools cloud 913K Pulls 4 Tags Updated 1 month ago lfm2.5-thinking LFM2.5 is a new family of hybrid models designed for on-device deployment. tools 1.2b 1M Pulls 5 Tags Updated 2 months ago glm-4.7-flash As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency. tools thinking 894.6K Pulls 4 Tags Updated 2 months ago translategemma A new collection of open translation models built on Gemma 3, helping people communicate across 55 languages. vision 4b 12b 27b 801.4K Pulls 13 Tags Updated 2 months ago qwen3-vl The most powerful vision-language model in the Qwen model family to date. vision tools thinking cloud 2b 4b 8b 30b 32b 235b 2.5M Pulls 59 Tags Updated 4 months ago devstral-small-2 24B model that excels at using tools to explore codebases, editing multiple files and power software engineering agents. vision tools cloud 24b 691.4K Pulls 6 Tags Updated 3 months ago ministral-3 The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. vision tools cloud 3b 8b 14b 746.9K Pulls 16 Tags Updated 3 months ago granite4 Granite 4 features improved instruction following (IF) and tool-calling capabilities, making them more effective in enterprise applications. tools 350m 1b 3b 974.9K Pulls 17 Tags Updated 4 months ago minimax-m2.5 MiniMax-M2.5 is a state-of-the-art large language model designed for real-world productivity and coding tasks. cloud 145.3K Pulls 1 Tag Updated 1 month ago qwen3-next The first installment in the Qwen3-Next series with strong performance in terms of both parameter efficiency and inference speed. tools thinking cloud 80b 440K Pulls 10 Tags Updated 3 months ago glm-5 A strong reasoning and agentic model from Z.ai with 744B total parameters (40B active), built for complex systems engineering and long-horizon tasks. cloud 137.2K Pulls 1 Tag Updated 1 month ago nemotron-3-super NVIDIA Nemotron 3 Super is a 120B open MoE model activating just 12B parameters to deliver maximum compute efficiency and accuracy for complex multi-agent applications. tools thinking cloud 120b 99.6K Pulls 7 Tags Updated 2 weeks ago kimi-k2.5 Kimi K2.5 is an open-source, native multimodal agentic model that seamlessly integrates vision and language understanding with advanced agentic capabilities, instant and thinking modes, as well as conversational and agentic paradigms. cloud 187.1K Pulls 1 Tag Updated 1 month ago rnj-1 Rnj-1 is a family of 8B parameter open-weight, dense models trained from scratch by Essential AI, optimized for code and STEM with capabilities on par with SOTA open-weight models. tools cloud 8b 394.6K Pulls 6 Tags Updated 3 months ago glm-ocr GLM-OCR is a multimodal OCR model for complex document understanding, built on the GLM-V encoder–decoder architecture. vision tools 157.7K Pulls 3 Tags Updated 1 month ago nemotron-3-nano Nemotron-3-Nano is a new Standard for Efficient, Open, and Intelligent Agentic Models, now updated with a 4B parameter count model. tools thinking cloud 4b 30b 293K Pulls 9 Tags Updated 1 week ago --- Skip to main contentOllama home pageSearch...⌘KGet startedWelcomeQuickstartCloudCapabilitiesStreamingThinkingStructured OutputsVisionEmbeddingsTool callingWeb searchIntegrationsOverviewAssistantsCodingIDEs & EditorsChat & RAGAutomationNotebooksMore informationCLI ReferenceAssistant SandboxingModelfile ReferenceContext lengthLinuxmacOSWindowsDockerImporting a ModelFAQHardware supportTroubleshootingSign inDownloadOllama home pageSearch...⌘KSearch...NavigationGet startedOllama's documentationDocumentationAPI ReferenceDocumentationAPI Reference Ollama is the easiest way to get up and running with large language models such as gpt-oss, Gemma 3, DeepSeek-R1, Qwen3 and more. QuickstartGet up and running with your first model or integrate Ollama with your favorite toolsDownload OllamaDownload Ollama on macOS, Windows or LinuxCloudOllama’s cloud models offer larger models with better performance.API referenceView Ollama’s API reference Libraries Ollama's Python LibraryThe official library for using Ollama with PythonOllama's JavaScript libraryThe official library for using Ollama with JavaScript or TypeScript.Community librariesView a list of 20+ community-supported libraries for Ollama Community DiscordJoin our Discord communityRedditJoin our Reddit communityQuickstartNext⌘IOn this pageLibrariesCommunity


