Chat models
If you’re not sure which chat model to use, check out our recommended models doc for which models to use for what use cases.
Cached input token pricing now available for MiniMax M2.5 — Cached input tokens are billed at just $0.06 per 1M tokens, an 80% discount from the standard input price. This applies automatically for cached tokens.
| Organization | Model Name | API Model String | Context length | Input pricing (per 1M tokens) | Cached input pricing (per 1M tokens) | Output pricing (per 1M tokens) | Quantization | Function Calling | Structured Outputs |
|---|
| Minimax | Minimax M2.5 | MiniMaxAI/MiniMax-M2.5 | 228700 | $0.30 | $0.06 | $1.20 | FP4 | Yes | Yes |
| Qwen | Qwen3.5 397B A17B | Qwen/Qwen3.5-397B-A17B | 262144 | $0.60 | - | $3.60 | BF16 | Yes | Yes |
| Qwen | Qwen3.5 9B | Qwen/Qwen3.5-9B | 262144 | $0.10 | - | $0.15 | FP8 | Yes | Yes |
| Moonshot | Kimi K2.5 | moonshotai/Kimi-K2.5 | 262144 | $0.50 | - | $2.80 | INT4 | Yes | Yes |
| Z.ai | GLM-5 | zai-org/GLM-5 | 202752 | $1.00 | - | $3.20 | FP4 | Yes | Yes |
| OpenAI | GPT-OSS 120B | openai/gpt-oss-120b | 128000 | $0.15 | - | $0.60 | MXFP4 | Yes | Yes |
| OpenAI | GPT-OSS 20B | openai/gpt-oss-20b | 128000 | $0.05 | - | $0.20 | MXFP4 | Yes | Yes |
| DeepSeek | DeepSeek-V3.1 | deepseek-ai/DeepSeek-V3.1 | 128000 | $0.60 | - | $1.70 | FP8 | Yes | Yes |
| Z.ai | GLM 4.7 | zai-org/GLM-4.7 | 202752 | $0.45 | - | $2.00 | FP8 | Yes | Yes |
| Z.ai | GLM 4.5 Air | zai-org/GLM-4.5-Air-FP8 | 131072 | $0.20 | - | $1.10 | FP8 | Yes | Yes |
| Qwen | Qwen3-Coder-Next | Qwen/Qwen3-Coder-Next-FP8 | 262144 | $0.50 | - | $1.20 | FP8 | Yes | Yes |
| Qwen | Qwen3-Next-80B-A3B-Instruct | Qwen/Qwen3-Next-80B-A3B-Instruct | 262144 | $0.15 | - | $1.50 | BF16 | Yes | Yes |
| Qwen | Qwen3-Coder 480B-A35B Instruct | Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 | 256000 | $2.00 | - | $2.00 | FP8 | Yes | Yes |
| Qwen | Qwen3 235B-A22B Instruct 2507 | Qwen/Qwen3-235B-A22B-Instruct-2507-tput | 262144 | $0.20 | - | $0.60 | FP8 | Yes | Yes |
| DeepSeek | DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1 | 163839 | $3.00 | - | $7.00 | FP8 | Yes | Yes |
| Meta | Llama 4 Maverick (17Bx128E) | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 1048576 | $0.27 | - | $0.85 | FP8 | Yes | Yes |
| Meta | Llama 3.3 70B Instruct Turbo | meta-llama/Llama-3.3-70B-Instruct-Turbo | 131072 | $0.88 | - | $0.88 | FP8 | Yes | Yes |
| Deep Cogito | Cogito v2.1 671B | deepcogito/cogito-v2-1-671b | 32768 | $1.25 | - | $1.25 | FP8 | - | Yes |
| Essential AI | Rnj-1 Instruct | essentialai/rnj-1-instruct | 32768 | $0.15 | - | $0.15 | BF16 | Yes | Yes |
| Mistral AI | Mistral Small 3 Instruct (24B) | mistralai/Mistral-Small-24B-Instruct-2501 | 32768 | $0.10 | - | $0.30 | FP16 | Yes | Yes |
| Qwen | Qwen 2.5 7B Instruct Turbo | Qwen/Qwen2.5-7B-Instruct-Turbo | 32768 | $0.30 | - | $0.30 | FP8 | Yes | Yes |
| Google | Gemma 3N E4B Instruct | google/gemma-3n-E4B-it | 32768 | $0.02 | - | $0.04 | FP8 | - | Yes |
| Togethercomputer | LFM2-24B-A2B | LiquidAI/LFM2-24B-A2B | 32768 | $0.03 | - | $0.12 | - | - | - |
| mistralai | Mixtral-8x7B Instruct v0.1 | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32768 | $0.60 | - | $0.60 | - | - | - |
| Servicenow AI | Apriel 1.5 15B Thinker | ServiceNow-AI/Apriel-1.5-15b-Thinker | 131072 | $0.00 | - | $0.00 | - | - | - |
| Servicenow AI | Apriel 1.6 15B Thinker | ServiceNow-AI/Apriel-1.6-15b-Thinker | 131072 | $0.00 | - | $0.00 | - | - | - |
| Meta | Meta Llama 3 8B Instruct Lite | meta-llama/Meta-Llama-3-8B-Instruct-Lite | 8192 | $0.10 | - | $0.10 | - | - | - |
| Qwen | Qwen3 235B A22B Thinking 2507 FP8 | Qwen/Qwen3-235B-A22B-Thinking-2507 | 262144 | $0.65 | - | $3.00 | - | - | - |
*Deprecated model, see Deprecations for more details.
Chat Model Examples
Image models
Use our Images endpoint for Image Models.
| Organization | Model Name | Model String for API | Price per MP | Default steps |
|---|
| Google | Imagen 4.0 Preview | google/imagen-4.0-preview | $0.04 | - |
| Google | Imagen 4.0 Fast | google/imagen-4.0-fast | $0.02 | - |
| Google | Imagen 4.0 Ultra | google/imagen-4.0-ultra | $0.06 | - |
| Google | Flash Image 2.5 (Nano Banana) | google/flash-image-2.5 | $0.039 | - |
| Google | Gemini 3 Pro Image (Nano Banana Pro) | google/gemini-3-pro-image | - | - |
| Black Forest Labs | Flux.1 [schnell] (Turbo) | black-forest-labs/FLUX.1-schnell | $0.0027 | 4 |
| Black Forest Labs | Flux1.1 [pro] | black-forest-labs/FLUX.1.1-pro | $0.04 | - |
| Black Forest Labs | Flux.1 Kontext [pro] | black-forest-labs/FLUX.1-kontext-pro | $0.04 | 28 |
| Black Forest Labs | Flux.1 Kontext [max] | black-forest-labs/FLUX.1-kontext-max | $0.08 | 28 |
| Black Forest Labs | FLUX.1 Krea [dev] | black-forest-labs/FLUX.1-krea-dev | $0.025 | 28 |
| Black Forest Labs | FLUX.2 [pro] | black-forest-labs/FLUX.2-pro | - | - |
| Black Forest Labs | FLUX.2 [dev] | black-forest-labs/FLUX.2-dev | - | - |
| Black Forest Labs | FLUX.2 [flex] | black-forest-labs/FLUX.2-flex | - | - |
| ByteDance | Seedream 3.0 | ByteDance-Seed/Seedream-3.0 | $0.018 | - |
| ByteDance | Seedream 4.0 | ByteDance-Seed/Seedream-4.0 | $0.03 | - |
| Qwen | Qwen Image | Qwen/Qwen-Image | $0.0058 | - |
| RunDiffusion | Juggernaut Pro Flux | RunDiffusion/Juggernaut-pro-flux | $0.0049 | - |
| RunDiffusion | Juggernaut Lightning Flux | Rundiffusion/Juggernaut-Lightning-Flux | $0.0017 | - |
| HiDream | HiDream-I1-Full | HiDream-ai/HiDream-I1-Full | $0.009 | - |
| HiDream | HiDream-I1-Dev | HiDream-ai/HiDream-I1-Dev | $0.0045 | - |
| HiDream | HiDream-I1-Fast | HiDream-ai/HiDream-I1-Fast | $0.0032 | - |
| Ideogram | Ideogram 3.0 | ideogram/ideogram-3.0 | $0.06 | - |
| Lykon | Dreamshaper | Lykon/DreamShaper | $0.0006 | - |
| Stability AI | Stable Diffusion 3 | stabilityai/stable-diffusion-3-medium | $0.0019 | - |
| Stability AI | SD XL | stabilityai/stable-diffusion-xl-base-1.0 | $0.0019 | - |
| Black Forest Labs | FLUX.1 [pro] | black-forest-labs/FLUX.1-pro | - | - |
| Black Forest Labs | FLUX.2 [max] | black-forest-labs/FLUX.2-max | - | - |
| Google | Gemini 3.1 Flash Image (Nano Banana 2) | google/flash-image-3.1 | - | - |
| OpenAI | GPT Image 1.5 | openai/gpt-image-1.5 | - | - |
| Qwen | Qwen Image 2.0 | Qwen/Qwen-Image-2.0 | - | - |
| Qwen | Qwen Image 2.0 Pro | Qwen/Qwen-Image-2.0-Pro | - | - |
| Wan-AI | Wan 2.6 Image | Wan-AI/Wan2.6-image | - | - |
Note: Image models can only be used with credits. Users are unable to call Image models with a zero or negative balance.
Image Model Examples
- Blinkshot.io - A realtime AI image playground built with Flux Schnell
- Logo Creator - An logo generator that creates professional logos in seconds using Flux Pro 1.1
- PicMenu - A menu visualizer that takes a restaurant menu and generates nice images for each dish.
- Flux LoRA Inference Notebook - Using LoRA fine-tuned image generations models
How FLUX pricing works For FLUX models (except for pro) pricing is based on the size of generated images (in megapixels) and the number of steps used (if the number of steps exceed the default steps).
- Default pricing: The listed per megapixel prices are for the default number of steps.
- Using more or fewer steps: Costs are adjusted based on the number of steps used only if you go above the default steps. If you use more steps, the cost increases proportionally using the formula below. If you use fewer steps, the cost does not decrease and is based on the default rate.
Here’s a formula to calculate cost:
Cost = MP × Price per MP × (Steps ÷ Default Steps)
Where:
- MP = (Width × Height ÷ 1,000,000)
- Price per MP = Cost for generating one megapixel at the default steps
- Steps = The number of steps used for the image generation. This is only factored in if going above default steps.
How Pricing works for Gemini 3 Pro Image
Gemini 3 Pro Image offers pricing based on the resolution of the image.
- 1080p and 2K: $0.134/image
- 4K resolution: $0.24 /image
Supported dimensions:
1K: 1024×1024 (1:1), 1264×848 (3:2), 848×1264 (2:3), 1200×896 (4:3), 896×1200 (3:4), 928×1152 (4:5), 1152×928 (5:4), 768×1376 (9:16), 1376×768 (16:9), 1548×672 or 1584×672 (21:9).
2K: 2048×2048 (1:1), 2528×1696 (3:2), 1696×2528 (2:3), 2400×1792 (4:3), 1792×2400 (3:4), 1856×2304 (4:5), 2304×1856 (5:4), 1536×2752 (9:16), 2752×1536 (16:9), 3168×1344 (21:9).
4K: 4096×4096 (1:1), 5096×3392 or 5056×3392 (3:2), 3392×5096 or 3392×5056 (2:3), 4800×3584 (4:3), 3584×4800 (3:4), 3712×4608 (4:5), 4608×3712 (5:4), 3072×5504 (9:16), 5504×3072 (16:9), 6336×2688 (21:9).
Vision models
If you’re not sure which vision model to use, we currently recommend Llama 4 Maverick (meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8) to get started. For model specific rate limits, navigate here.
| Organization | Model Name | API Model String | Context length | Input pricing (per 1M tokens) | Output pricing (per 1M tokens) |
|---|
| Meta | Llama 4 Maverick (17Bx128E) | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 1048576 | $0.27 | $0.85 |
| Qwen | Qwen3.5 397B A17B | Qwen/Qwen3.5-397B-A17B | 262144 | $0.60 | $3.60 |
| Qwen | Qwen3-VL-8B-Instruct | Qwen/Qwen3-VL-8B-Instruct | 262144 | $0.18 | $0.68 |
| Moonshot | Kimi K2.5 | moonshotai/Kimi-K2.5 | 262144 | $0.50 | $2.80 |
Vision Model Examples
Video models
| Organization | Model Name | Model String for API | Price per video | Resolution / Duration |
|---|
| MiniMax | MiniMax 01 Director | minimax/video-01-director | $0.28 | 720p / 5s |
| MiniMax | MiniMax Hailuo 02 | minimax/hailuo-02 | $0.49 | 768p / 10s |
| Google | Veo 2.0 | google/veo-2.0 | $2.50 | 720p / 5s |
| Google | Veo 3.0 | google/veo-3.0 | $1.60 | 720p / 8s |
| Google | Veo 3.0 + Audio | google/veo-3.0-audio | $3.20 | 720p / 8s |
| Google | Veo 3.0 Fast | google/veo-3.0-fast | $0.80 | 1080p / 8s |
| Google | Veo 3.0 Fast + Audio | google/veo-3.0-fast-audio | $1.20 | 1080p / 8s |
| ByteDance | Seedance 1.0 Lite | ByteDance/Seedance-1.0-lite | $0.14 | 720p / 5s |
| ByteDance | Seedance 1.0 Pro | ByteDance/Seedance-1.0-pro | $0.57 | 1080p / 5s |
| PixVerse | PixVerse v5 | pixverse/pixverse-v5 | $0.30 | 1080p / 5s |
| Kuaishou | Kling 2.1 Master | kwaivgI/kling-2.1-master | $0.92 | 1080p / 5s |
| Kuaishou | Kling 2.1 Standard | kwaivgI/kling-2.1-standard | $0.18 | 720p / 5s |
| Kuaishou | Kling 2.1 Pro | kwaivgI/kling-2.1-pro | $0.32 | 1080p / 5s |
| Kuaishou | Kling 2.0 Master | kwaivgI/kling-2.0-master | $0.92 | 1080p / 5s |
| Kuaishou | Kling 1.6 Standard | kwaivgI/kling-1.6-standard | $0.19 | 720p / 5s |
| Kuaishou | Kling 1.6 Pro | kwaivgI/kling-1.6-pro | $0.32 | 1080p / 5s |
| Wan-AI | Wan 2.2 I2V | Wan-AI/Wan2.2-I2V-A14B | $0.31 | - |
| Wan-AI | Wan 2.2 T2V | Wan-AI/Wan2.2-T2V-A14B | $0.66 | - |
| Vidu | Vidu 2.0 | vidu/vidu-2.0 | $0.28 | 720p / 8s |
| Vidu | Vidu Q1 | vidu/vidu-q1 | $0.22 | 1080p / 5s |
| OpenAI | Sora 2 | openai/sora-2 | $0.80 | 720p / 8s |
| OpenAI | Sora 2 Pro | openai/sora-2-pro | $2.40 | 1080p / 8s |
| PixVerse | PixVerse v5 | pixverse/pixverse-v5.6 | - | - |
Audio models
Use our Audio endpoint for text-to-speech models. For speech-to-text models see Transcription and Translations
| Organization | Modality | Model Name | Model String for API | Pricing |
|---|
| Canopy Labs | Text-to-Speech | Orpheus 3B | canopylabs/orpheus-3b-0.1-ft | $15.00 per 1M chars |
| Kokoro | Text-to-Speech | Kokoro | hexgrad/Kokoro-82M | $4.00 per 1M chars |
| Cartesia | Text-to-Speech | Cartesia Sonic 3 | cartesia/sonic-3 | $65.00 per 1M chars |
| Cartesia | Text-to-Speech | Cartesia Sonic 2 | cartesia/sonic-2 | $65.00 per 1M chars |
| Cartesia | Text-to-Speech | Cartesia Sonic | cartesia/sonic | $65.00 per 1M chars |
| OpenAI | Speech-to-Text | Whisper Large v3 | openai/whisper-large-v3 | $0.0015 per audio min |
| NVIDIA | Speech-to-Text | Parakeet TDT 0.6B v3 | nvidia/parakeet-tdt-0.6b-v3 | $0.0015 per audio min |
Audio Model Examples
Embedding models
| Model Name | Model String for API | Model Size | Embedding Dimension | Context Window | Pricing (per 1M tokens) |
|---|
| Multilingual-e5-large-instruct | intfloat/multilingual-e5-large-instruct | 560M | 1024 | 514 | $0.02 |
Embedding Model Examples
Rerank models
Our Rerank API has built-in support for reranker model.
There are currently no rerank models offered via serverless. Rerank models like mixedbread-ai/mxbai-rerank-large-v2 are only available as Dedicated Endpoints. You can bring up a dedicated endpoint to use reranking in your applications.
Rerank Model Examples
Moderation models
Use our Completions endpoint to run a moderation model as a standalone classifier, or use it alongside any of the other models above as a filter to safeguard responses from 100+ models, by specifying the parameter "safety_model": "MODEL_API_STRING"
| Organization | Model Name | Model String for API | Context length | Pricing (per 1M tokens) |
|---|
| Meta | Llama Guard 4 (12B) | meta-llama/Llama-Guard-4-12B | 1048576 | $0.20 |
| Virtue AI | Virtueguard Text Lite | Virtue-AI/VirtueGuard-Text-Lite | 32768 | $0.20 |