Skip to main content

Chat models

If you’re not sure which chat model to use, check out our recommended models doc for which models to use for what use cases.
OrganizationModel NameAPI Model StringContext lengthInput pricing (per 1M tokens)Output pricing (per 1M tokens)QuantizationFunction CallingStructured Outputs
MinimaxMinimax M2.5MiniMaxAI/MiniMax-M2.5228700$0.30$1.20FP4YesYes
QwenQwen3.5 397B A17BQwen/Qwen3.5-397B-A17B262144$0.60$3.60BF16YesYes
MoonshotKimi K2.5moonshotai/Kimi-K2.5262144$0.50$2.80INT4YesYes
Z.aiGLM-5zai-org/GLM-5202752$1.00$3.20FP4YesYes
OpenAIGPT-OSS 120Bopenai/gpt-oss-120b128000$0.15$0.60MXFP4YesYes
OpenAIGPT-OSS 20Bopenai/gpt-oss-20b128000$0.05$0.20MXFP4YesYes
DeepSeekDeepSeek-V3.1deepseek-ai/DeepSeek-V3.1128000$0.60$1.70FP8YesYes
MoonshotKimi K2 Instruct 0905moonshotai/Kimi-K2-Instruct-0905262144$1.00$3.00FP8YesYes
MoonshotKimi K2 Thinkingmoonshotai/Kimi-K2-Thinking262144$1.20$4.00INT4YesYes
Z.aiGLM 4.7zai-org/GLM-4.7202752$0.45$2.00FP8YesYes
Z.aiGLM 4.5 Airzai-org/GLM-4.5-Air-FP8131072$0.20$1.10FP8YesYes
QwenQwen3-Coder-NextQwen/Qwen3-Coder-Next-FP8262144$0.50$1.20FP8YesYes
QwenQwen3-Next-80B-A3B-InstructQwen/Qwen3-Next-80B-A3B-Instruct262144$0.15$1.50BF16YesYes
QwenQwen3 235B-A22B Thinking 2507Qwen/Qwen3-235B-A22B-Thinking-2507262144$0.65$3.00FP8YesYes
QwenQwen3-Coder 480B-A35B InstructQwen/Qwen3-Coder-480B-A35B-Instruct-FP8256000$2.00$2.00FP8YesYes
QwenQwen3 235B-A22B Instruct 2507Qwen/Qwen3-235B-A22B-Instruct-2507-tput262144$0.20$0.60FP8YesYes
DeepSeekDeepSeek-R1-0528deepseek-ai/DeepSeek-R1163839$3.00$7.00FP8YesYes
MetaLlama 4 Maverick
(17Bx128E)
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP81048576$0.27$0.85FP8YesYes
MetaLlama 3.3 70B Instruct Turbometa-llama/Llama-3.3-70B-Instruct-Turbo131072$0.88$0.88FP8YesYes
Deep CogitoCogito v2.1 671Bdeepcogito/cogito-v2-1-671b32768$1.25$1.25FP8-Yes
Essential AIRnj-1 Instructessentialai/rnj-1-instruct32768$0.15$0.15BF16YesYes
Mistral AIMistral Small 3 Instruct (24B)mistralai/Mistral-Small-24B-Instruct-250132768$0.10$0.30FP16YesYes
MetaLlama 3.1 8B Instruct Turbometa-llama/Meta-Llama-3.1-8B-Instruct-Turbo131072$0.18$0.18FP8YesYes
QwenQwen 2.5 7B Instruct TurboQwen/Qwen2.5-7B-Instruct-Turbo32768$0.30$0.30FP8YesYes
ArceeArcee AI Trinity Miniarcee-ai/trinity-mini32768$0.045$0.15-YesYes
MetaLlama 3.2 3B Instruct Turbometa-llama/Llama-3.2-3B-Instruct-Turbo131072$0.06$0.06FP16YesYes
MetaLlama 3 8B Instruct Litemeta-llama/Meta-Llama-3-8B-Instruct-Lite8192$0.10$0.10INT4-Yes
GoogleGemma Instruct (2B)google/gemma-2b-it*8192$0.00$0.00FP16--
GoogleGemma 3N E4B Instructgoogle/gemma-3n-E4B-it32768$0.02$0.04FP8-Yes
Mistral AIMistral (7B) Instruct v0.2mistralai/Mistral-7B-Instruct-v0.232768$0.20$0.20FP16YesYes
*Deprecated model, see Deprecations for more details. Chat Model Examples

Image models

Use our Images endpoint for Image Models.
OrganizationModel NameModel String for APIDefault steps
GoogleImagen 4.0 Previewgoogle/imagen-4.0-preview-
GoogleImagen 4.0 Fastgoogle/imagen-4.0-fast-
GoogleImagen 4.0 Ultragoogle/imagen-4.0-ultra-
GoogleFlash Image 2.5 (Nano Banana)google/flash-image-2.5-
GoogleGemini 3 Pro Image (Nano Banana 2)google/gemini-3-pro-image-
Black Forest LabsFlux.1 [schnell] (Turbo)black-forest-labs/FLUX.1-schnell4
Black Forest LabsFlux1.1 [pro]black-forest-labs/FLUX.1.1-pro-
Black Forest LabsFlux.1 Kontext [pro]black-forest-labs/FLUX.1-kontext-pro28
Black Forest LabsFlux.1 Kontext [max]black-forest-labs/FLUX.1-kontext-max28
Black Forest LabsFLUX.1 Krea [dev]black-forest-labs/FLUX.1-krea-dev28
Black Forest LabsFLUX.2 [pro]black-forest-labs/FLUX.2-pro-
Black Forest LabsFLUX.2 [dev]black-forest-labs/FLUX.2-dev-
Black Forest LabsFLUX.2 [flex]black-forest-labs/FLUX.2-flex-
ByteDanceSeedream 3.0ByteDance-Seed/Seedream-3.0-
ByteDanceSeedream 4.0ByteDance-Seed/Seedream-4.0-
QwenQwen ImageQwen/Qwen-Image-
RunDiffusionJuggernaut Pro FluxRunDiffusion/Juggernaut-pro-flux-
RunDiffusionJuggernaut Lightning FluxRundiffusion/Juggernaut-Lightning-Flux-
HiDreamHiDream-I1-FullHiDream-ai/HiDream-I1-Full-
HiDreamHiDream-I1-DevHiDream-ai/HiDream-I1-Dev-
HiDreamHiDream-I1-FastHiDream-ai/HiDream-I1-Fast-
IdeogramIdeogram 3.0ideogram/ideogram-3.0-
LykonDreamshaperLykon/DreamShaper-
Stability AIStable Diffusion 3stabilityai/stable-diffusion-3-medium-
Stability AISD XLstabilityai/stable-diffusion-xl-base-1.0-
Note: Image models can only be used with credits. Users are unable to call Image models with a zero or negative balance. Image Model Examples
  • Blinkshot.io - A realtime AI image playground built with Flux Schnell
  • Logo Creator - An logo generator that creates professional logos in seconds using Flux Pro 1.1
  • PicMenu - A menu visualizer that takes a restaurant menu and generates nice images for each dish.
  • Flux LoRA Inference Notebook - Using LoRA fine-tuned image generations models
How FLUX pricing works For FLUX models (except for pro) pricing is based on the size of generated images (in megapixels) and the number of steps used (if the number of steps exceed the default steps).
  • Default pricing: The listed per megapixel prices are for the default number of steps.
  • Using more or fewer steps: Costs are adjusted based on the number of steps used only if you go above the default steps. If you use more steps, the cost increases proportionally using the formula below. If you use fewer steps, the cost does not decrease and is based on the default rate.
Here’s a formula to calculate cost: Cost = MP × Price per MP × (Steps ÷ Default Steps) Where:
  • MP = (Width × Height ÷ 1,000,000)
  • Price per MP = Cost for generating one megapixel at the default steps
  • Steps = The number of steps used for the image generation. This is only factored in if going above default steps.
How Pricing works for Gemini 3 Pro Image Gemini 3 Pro Image offers pricing based on the resolution of the image.
  • 1080p and 2K: $0.134/image
  • 4K resolution: $0.24 /image
Supported dimensions: 1K: 1024×1024 (1:1), 1248×832 (3:2), 832×1248 (2:3), 1184×864 (4:3), 864×1184 (3:4), 896×1152 (4:5), 1152×896 (5:4), 768×1344 (9:16), 1344×768 (16:9), 1536×672 (21:9). 2K: 2048×2048 (1:1), 2496×1664 (3:2), 1664×2496 (2:3), 2368×1728 (4:3), 1728×2368 (3:4), 1792×2304 (4:5), 2304×1792 (5:4), 1536×2688 (9:16), 2688×1536 (16:9), 3072×1344 (21:9). 4K: 4096×4096 (1:1), 4992×3328 (3:2), 3328×4992 (2:3), 4736×3456 (4:3), 3456×4736 (3:4), 3584×4608 (4:5), 4608×3584 (5:4), 3072×5376 (9:16), 5376×3072 (16:9), 6144×2688 (21:9).

Vision models

If you’re not sure which vision model to use, we currently recommend Llama 4 Maverick (meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8) to get started. For model specific rate limits, navigate here.
OrganizationModel NameAPI Model StringContext length
MetaLlama 4 Maverick
(17Bx128E)
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8524288
QwenQwen3-VL-8B-InstructQwen/Qwen3-VL-8B-Instruct262100
Vision Model Examples

Video models

OrganizationModel NameModel String for APIResolution / Duration
MiniMaxMiniMax 01 Directorminimax/video-01-director720p / 5s
MiniMaxMiniMax Hailuo 02minimax/hailuo-02768p / 10s
GoogleVeo 2.0google/veo-2.0720p / 5s
GoogleVeo 3.0google/veo-3.0720p / 8s
GoogleVeo 3.0 + Audiogoogle/veo-3.0-audio720p / 8s
GoogleVeo 3.0 Fastgoogle/veo-3.0-fast1080p / 8s
GoogleVeo 3.0 Fast + Audiogoogle/veo-3.0-fast-audio1080p / 8s
ByteDanceSeedance 1.0 LiteByteDance/Seedance-1.0-lite720p / 5s
ByteDanceSeedance 1.0 ProByteDance/Seedance-1.0-pro1080p / 5s
PixVersePixVerse v5pixverse/pixverse-v51080p / 5s
KuaishouKling 2.1 MasterkwaivgI/kling-2.1-master1080p / 5s
KuaishouKling 2.1 StandardkwaivgI/kling-2.1-standard720p / 5s
KuaishouKling 2.1 ProkwaivgI/kling-2.1-pro1080p / 5s
KuaishouKling 2.0 MasterkwaivgI/kling-2.0-master1080p / 5s
KuaishouKling 1.6 StandardkwaivgI/kling-1.6-standard720p / 5s
KuaishouKling 1.6 ProkwaivgI/kling-1.6-pro1080p / 5s
Wan-AIWan 2.2 I2VWan-AI/Wan2.2-I2V-A14B-
Wan-AIWan 2.2 T2VWan-AI/Wan2.2-T2V-A14B-
ViduVidu 2.0vidu/vidu-2.0720p / 8s
ViduVidu Q1vidu/vidu-q11080p / 5s
OpenAISora 2openai/sora-2720p / 8s
OpenAISora 2 Proopenai/sora-2-pro1080p / 8s

Audio models

Use our Audio endpoint for text-to-speech models. For speech-to-text models see Transcription and Translations
OrganizationModalityModel NameModel String for API
Canopy LabsText-to-SpeechOrpheus 3Bcanopylabs/orpheus-3b-0.1-ft
KokoroText-to-SpeechKokorohexgrad/Kokoro-82M
CartesiaText-to-SpeechCartesia Sonic 2cartesia/sonic-2
CartesiaText-to-SpeechCartesia Soniccartesia/sonic
OpenAISpeech-to-TextWhisper Large v3openai/whisper-large-v3
Mistral AISpeech-to-TextVoxtral Mini 3Bmistralai/Voxtral-Mini-3B-2507
Audio Model Examples

Embedding models

Model NameModel String for APIModel SizeEmbedding DimensionContext Window
Multilingual-e5-large-instructintfloat/multilingual-e5-large-instruct560M1024514
Embedding Model Examples

Rerank models

Our Rerank API has built-in support for the following models, that we host via our serverless endpoints.
OrganizationModel NameModel SizeModel String for APIMax Doc Size (tokens)Max Docs
MixedBreadRerank Large1.6Bmixedbread-ai/Mxbai-Rerank-Large-V232768-
Rerank Model Examples

Moderation models

Use our Completions endpoint to run a moderation model as a standalone classifier, or use it alongside any of the other models above as a filter to safeguard responses from 100+ models, by specifying the parameter "safety_model": "MODEL_API_STRING"
OrganizationModel NameModel String for APIContext length
MetaLlama Guard 4 (12B)meta-llama/Llama-Guard-4-12B1048576
Virtue AIVirtue GuardVirtueAI/VirtueGuard-Text-Lite32768