Chat models
In the table below, models marked as “Turbo” are quantized to FP8 and those marked as “Lite” are INT4. All our other models are at full precision (FP16).If you’re not sure which chat model to use, we currently recommend Llama 3.3 70B Turbo (
meta-llama/Llama-3.3-70B-Instruct-Turbo
) to get started.
Organization | Model Name | API Model String | Context length | Quantization |
---|---|---|---|---|
Moonshot | Kimi K2 Instruct 0905 | moonshotai/Kimi-K2-Instruct-0905 | 262144 | FP8 |
DeepSeek | DeepSeek-V3.1 | deepseek-ai/DeepSeek-V3.1 | 128000 | FP8 |
OpenAI | GPT-OSS 120B | openai/gpt-oss-120b | 128000 | MXFP4 |
OpenAI | GPT-OSS 20B | openai/gpt-oss-20b | 128000 | MXFP4 |
Moonshot | Kimi K2 Instruct | moonshotai/Kimi-K2-Instruct | 128000 | FP8 |
Z.ai | GLM 4.5 Air | zai-org/GLM-4.5-Air-FP8 | 131072 | FP8 |
Qwen | Qwen3 235B-A22B Thinking 2507 | Qwen/Qwen3-235B-A22B-Thinking-2507 | 262144 | FP8 |
Qwen | Qwen3-Coder 480B-A35B Instruct | Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 | 256000 | FP8 |
Qwen | Qwen3 235B-A22B Instruct 2507 | Qwen/Qwen3-235B-A22B-Instruct-2507-tput | 262144 | FP8 |
DeepSeek | DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1 | 163839 | FP8 |
DeepSeek | DeepSeek-R1-0528 Throughput | deepseek-ai/DeepSeek-R1-0528-tput | 163839 | FP8 |
DeepSeek | DeepSeek-V3-0324 | deepseek-ai/DeepSeek-V3 | 163839 | FP8 |
Meta | Llama 4 Maverick (17Bx128E) | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 1048576 | FP8 |
Meta | Llama 4 Scout (17Bx16E) | meta-llama/Llama-4-Scout-17B-16E-Instruct | 1048576 | FP16 |
Meta | Llama 3.3 70B Instruct Turbo | meta-llama/Llama-3.3-70B-Instruct-Turbo | 131072 | FP8 |
Deep Cogito | Cogito v2 Preview 70B | deepcogito/cogito-v2-preview-llama-70B | 32768 | BF16 |
Deep Cogito | Cogito v2 Preview 109B MoE | deepcogito/cogito-v2-preview-llama-109B-MoE | 32768 | BF16 |
Deep Cogito | Cogito v2 Preview 405B | deepcogito/cogito-v2-preview-llama-405B | 32768 | BF16 |
Deep Cogito | Cogito v2 Preview 671B MoE | deepcogito/cogito-v2-preview-deepseek-671b | 32768 | FP8 |
Mistral AI | Magistral Small 2506 API | mistralai/Magistral-Small-2506 | 40960 | BF16 |
DeepSeek | DeepSeek R1 Distill Llama 70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | 131072 | FP16 |
DeepSeek | DeepSeek R1 Distill Qwen 14B | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 131072 | FP16 |
Marin Community | Marin 8B Instruct | marin-community/marin-8b-instruct | 4096 | FP16 |
Mistral AI | Mistral Small 3 Instruct (24B) | mistralai/Mistral-Small-24B-Instruct-2501 | 32768 | FP16 |
Meta | Llama 3.1 8B Instruct Turbo | meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 131072 | FP8 |
Meta | Llama 3.3 70B Instruct Turbo (Free)** | meta-llama/Llama-3.3-70B-Instruct-Turbo-Free | 131072 | FP8 |
Qwen | Qwen 2.5 7B Instruct Turbo | Qwen/Qwen2.5-7B-Instruct-Turbo | 32768 | FP8 |
Qwen | Qwen 2.5 72B Instruct Turbo | Qwen/Qwen2.5-72B-Instruct-Turbo | 32768 | FP8 |
Qwen | Qwen2.5 Vision Language 72B Instruct | Qwen/Qwen2.5-VL-72B-Instruct | 32768 | FP8 |
Qwen | Qwen 2.5 Coder 32B Instruct | Qwen/Qwen2.5-Coder-32B-Instruct | 32768 | FP16 |
Qwen | QwQ-32B | Qwen/QwQ-32B | 32768 | FP16 |
Qwen | Qwen3 235B A22B Throughput | Qwen/Qwen3-235B-A22B-fp8-tput | 40960 | FP8 |
Arcee | Arcee AI Virtuoso Medium | arcee-ai/virtuoso-medium-v2 | 128000 | - |
Arcee | Arcee AI Coder-Large | arcee-ai/coder-large | 32768 | - |
Arcee | Arcee AI Virtuoso-Large | arcee-ai/virtuoso-large | 128000 | - |
Arcee | Arcee AI Maestro | arcee-ai/maestro-reasoning | 128000 | - |
Arcee | Arcee AI Caller | arcee-ai/caller | 32768 | - |
Arcee | Arcee AI Blitz | arcee-ai/arcee-blitz | 32768 | - |
Meta | Llama 3.1 405B Instruct Turbo | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 130815 | FP8 |
Meta | Llama 3.2 3B Instruct Turbo | meta-llama/Llama-3.2-3B-Instruct-Turbo | 131072 | FP16 |
Meta | Llama 3 8B Instruct Lite | meta-llama/Meta-Llama-3-8B-Instruct-Lite | 8192 | INT4 |
Meta | Llama 3 70B Instruct Reference | meta-llama/Llama-3-70b-chat-hf | 8192 | FP16 |
Gemma Instruct (2B) | google/gemma-2b-it* | 8192 | FP16 | |
Gemma 3N E4B Instruct | google/gemma-3n-E4B-it | 32768 | FP8 | |
Gryphe | MythoMax-L2 (13B) | Gryphe/MythoMax-L2-13b* | 4096 | FP16 |
Mistral AI | Mistral (7B) Instruct | mistralai/Mistral-7B-Instruct-v0.1 | 8192 | FP16 |
Mistral AI | Mistral (7B) Instruct v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 | FP16 |
Mistral AI | Mistral (7B) Instruct v0.3 | mistralai/Mistral-7B-Instruct-v0.3 | 32768 | FP16 |
- PDF to Chat App - Chat with your PDFs (blogs, textbooks, papers)
- Open Deep Research Notebook - Generate long form reports using a single prompt
- RAG with Reasoning Models Notebook - RAG with DeepSeek-R1
- Fine-tuning Chat Models Notebook - Tune Language models for conversation
- Building Agents - Agent workflows with language models
Image models
Use our Images endpoint for Image Models.Organization | Model Name | Model String for API | Default steps |
---|---|---|---|
Black Forest Labs | Flux.1 [schnell] (free)* | black-forest-labs/FLUX.1-schnell-Free | N/A |
Black Forest Labs | Flux.1 [schnell] (Turbo) | black-forest-labs/FLUX.1-schnell | 4 |
Black Forest Labs | Flux.1 Dev | black-forest-labs/FLUX.1-dev | 28 |
Black Forest Labs | Flux1.1 [pro] | black-forest-labs/FLUX.1.1-pro | - |
Black Forest Labs | Flux.1 [pro] | black-forest-labs/FLUX.1-pro | 28 |
Black Forest Labs | Flux .1 Kontext [pro] | black-forest-labs/FLUX.1-kontext-pro | 28 |
Black Forest Labs | Flux .1 Kontext [max] | black-forest-labs/FLUX.1-kontext-max | 28 |
Black Forest Labs | Flux .1 Kontext [dev] | black-forest-labs/FLUX.1-kontext-dev | 28 |
Black Forest Labs | FLUX .1 Krea [dev] | black-forest-labs/FLUX.1-krea-dev | 28 |
black-forest-labs/FLUX.1-schnell
Image Model Examples
- Blinkshot.io - A realtime AI image playground built with Flux Schnell
- Logo Creator - An logo generator that creates professional logos in seconds using Flux Pro 1.1
- PicMenu - A menu visualizer that takes a restaurant menu and generates nice images for each dish.
- Flux LoRA Inference Notebook - Using LoRA fine-tuned image generations models
- Default pricing: The listed per megapixel prices are for the default number of steps.
- Using more or fewer steps: Costs are adjusted based on the number of steps used only if you go above the default steps. If you use more steps, the cost increases proportionally using the formula below. If you use fewer steps, the cost does not decrease and is based on the default rate.
- MP = (Width × Height ÷ 1,000,000)
- Price per MP = Cost for generating one megapixel at the default steps
- Steps = The number of steps used for the image generation. This is only factored in if going above default steps.
Vision models
If you’re not sure which vision model to use, we currently recommend Llama 4 Scout (meta-llama/Llama-4-Scout-17B-16E-Instruct
) to get started. For model specific rate limits, navigate here.
Meta | Llama 4 Maverick (17Bx128E) | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 524288 |
Meta | Llama 4 Scout (17Bx16E) | meta-llama/Llama-4-Scout-17B-16E-Instruct | 327680 |
Qwen | Qwen2.5 Vision Language 72B Instruct | Qwen/Qwen2.5-VL-72B-Instruct | 32768 |
Arcee | Arcee AI Spotlight | arcee_ai/arcee-spotlight | 128000 |
- LlamaOCR - A tool that takes documents (like receipts) and outputs markdown
- Wireframe to Code - A wireframe to app tool that takes in a UI mockup of a site and give you React code.
- Extracting Structured Data from Images - Extract information from images as JSON
Audio models
Use our Audio endpoint for text-to-speech models. For speech-to-text models see Transcription and TranslationsOrganization | Modality | Model Name | Model String for API |
---|---|---|---|
Cartesia | Text-to-Speech | Cartesia Sonic 2 | cartesia/sonic-2 |
Cartesia | Text-to-Speech | Cartesia Sonic | cartesia/sonic |
OpenAI | Speech-to-Text | Whisper Large v3 | openai/whisper-large-v3 |
- PDF to Podcast Notebook - Generate a NotebookLM style podcast given a PDF
- Audio Podcast Agent Workflow - Agent workflow to generate audio files given input content
Code models
Use our Completions endpoint for Code Models.Organization | Model Name | Model String for API | Context length |
---|---|---|---|
Qwen | Qwen 2.5 Coder 32B Instruct | Qwen/Qwen2.5-Coder-32B-Instruct | 32768 |
- LlamaCoder - An open source app to generate small apps with one prompt. Powered by Llama 3 405B.
- Code Generation Agent - An agent workflow to generate and iteratively improve code.
Embedding models
Model Name | Model String for API | Model Size | Embedding Dimension | Context Window |
---|---|---|---|---|
M2-BERT-80M-32K-Retrieval | togethercomputer/m2-bert-80M-32k-retrieval | 80M | 768 | 32768 |
BGE-Large-EN-v1.5 | BAAI/bge-large-en-v1.5 | 326M | 1024 | 512 |
BGE-Base-EN-v1.5 | BAAI/bge-base-en-v1.5 | 102M | 768 | 512 |
GTE-Modernbert-base | Alibaba-NLP/gte-modernbert-base | 149M | 768 | 8192 |
Multilingual-e5-large-instruct | intfloat/multilingual-e5-large-instruct | 560M | 1024 | 514 |
- Contextual RAG - An open source implementation of contextual RAG by Anthropic
- Code Generation Agent - An agent workflow to generate and iteratively improve code
- Multimodal Search and Image Generation - Search for images and generate more similar ones
- Visualizing Embeddings - Visualizing and clustering vector embeddings
Rerank models
Our Rerank API has built-in support for the following models, that we host via our serverless endpoints.Organization | Model Name | Model Size | Model String for API | Max Doc Size (tokens) | Max Docs |
---|---|---|---|---|---|
Salesforce | LlamaRank | 8B | Salesforce/Llama-Rank-v1 | 8192 | 1024 |
MixedBread | Rerank Large | 1.6B | mixedbread-ai/Mxbai-Rerank-Large-V2 | 32768 | - |
- Search and Reranking - Simple semantic search pipeline improved using a reranker
- Implementing Hybrid Search Notebook - Implementing semantic + lexical search along with reranking
Language models
Use our Completions endpoint for Language Models.Organization | Model Name | Model String for API | Context length |
---|---|---|---|
Meta | LLaMA-2 (70B) | meta-llama/Llama-2-70b-hf | 4096 |
mistralai | Mixtral-8x7B (46.7B) | mistralai/Mixtral-8x7B-v0.1 | 32768 |
Moderation models
Use our Completions endpoint to run a moderation model as a standalone classifier, or use it alongside any of the other models above as a filter to safeguard responses from 100+ models, by specifying the parameter"safety_model": "MODEL_API_STRING"
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
Meta | Llama Guard (8B) | meta-llama/Meta-Llama-Guard-3-8B | 8192 |
Meta | Llama Guard 4 (12B) | meta-llama/Llama-Guard-4-12B | 1048576 |
Virtue AI | Virtue Guard | VirtueAI/VirtueGuard-Text-Lite | 32768 |