Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Serverless models are the fastest way to run inference on Together. You call any supported model through a shared per-token API, with no provisioning, no replicas to size, and no minimum cost. Pay only for the tokens you process. Serverless models are rate-limited, so they work best when you’re prototyping or evaluating a model, or when your production traffic is variable, bursty, or low enough that per-token pricing is cost-effective. If your traffic is steady, you need higher rate limits, or you want reserved hardware, use a dedicated endpoint.
Serverless and dedicated endpoints support different sets of models. See the dedicated endpoint model catalog for details.

Pricing

Serverless models bill based on usage, with no minimums and no provisioning cost. Per-model rates are in the catalog tables below, and on together.ai/pricing. If you don’t need real-time responses, some models are discounted up to 50% when run with batch workloads.

Models

If you’re not sure which model to use, see Recommended models for our picks by use case.

Chat

Image

Vision

Video

Audio

Embedding

Rerank

Moderation

Chat models

Cached input token pricing available for select models: Cached input tokens are billed at a significant discount from the standard input price and apply automatically for cached tokens. Currently supported for:
  • MiniMax M2.7: $0.06 per 1M cached input tokens (80% discount).
  • Moonshot Kimi K2.6: $0.20 per 1M cached input tokens (~83% discount).
  • DeepSeek-V4-Pro: $0.20 per 1M cached input tokens (~90% discount).
OrganizationModel nameAPI model stringContext lengthInput pricing (per 1M tokens)Cached input pricing (per 1M tokens)Output pricing (per 1M tokens)QuantizationFunction callingStructured outputs
MinimaxMinimax M2.7MiniMaxAI/MiniMax-M2.7202752$0.30$0.06$1.20FP4YesYes
QwenQwen3.5 397B A17BQwen/Qwen3.5-397B-A17B262144$0.60-$3.60FP4YesYes
QwenQwen3.6 PlusQwen/Qwen3.6-Plus1000000$0.50-$3.00---
QwenQwen3.5 9BQwen/Qwen3.5-9B262144$0.10-$0.15FP8YesYes
MoonshotKimi K2.6moonshotai/Kimi-K2.6262144$1.20$0.20$4.50FP4YesYes
MoonshotKimi K2.5moonshotai/Kimi-K2.5262144$0.50-$2.80FP4YesYes
Z.aiGLM-5.1zai-org/GLM-5.1202752$1.40-$4.40FP4YesYes
Z.aiGLM-5zai-org/GLM-5202752$1.00-$3.20FP4YesYes
OpenAIGPT-OSS 120Bopenai/gpt-oss-120b128000$0.15-$0.60MXFP4YesYes
OpenAIGPT-OSS 20Bopenai/gpt-oss-20b128000$0.05-$0.20MXFP4YesYes
DeepSeekDeepSeek-V4-Prodeepseek-ai/DeepSeek-V4-Pro512000$2.10$0.20$4.40FP4YesYes
QwenQwen3-Coder 480B-A35B InstructQwen/Qwen3-Coder-480B-A35B-Instruct-FP8256000$2.00-$2.00FP8YesYes
QwenQwen3 235B-A22B Instruct 2507Qwen/Qwen3-235B-A22B-Instruct-2507-tput262144$0.20-$0.60FP8YesYes
MetaLlama 3.3 70B Instruct Turbometa-llama/Llama-3.3-70B-Instruct-Turbo131072$0.88-$0.88FP8YesYes
Essential AIRnj-1 Instructessentialai/rnj-1-instruct32768$0.15-$0.15BF16YesYes
QwenQwen 2.5 7B Instruct TurboQwen/Qwen2.5-7B-Instruct-Turbo32768$0.30-$0.30FP8YesYes
GoogleGemma 4 31B Instructgoogle/gemma-4-31B-it262144$0.20-$0.50FP8YesYes
GoogleGemma 3N E4B Instructgoogle/gemma-3n-E4B-it32768$0.06-$0.12FP8-Yes
TogethercomputerLFM2-24B-A2BLiquidAI/LFM2-24B-A2B32768$0.03-$0.12---
MetaMeta Llama 3 8B Instruct Litemeta-llama/Meta-Llama-3-8B-Instruct-Lite8192$0.10-$0.10---
DeepcogitoCogito v2.1 671Bdeepcogito/cogito-v2-1-671b163840$1.25-$1.25---
DeepSeekDeepSeek R1-0528deepseek-ai/DeepSeek-R1163840$3.00-$7.00---
DeepSeekDeepseek V3.1deepseek-ai/DeepSeek-V3.1131072$0.60-$1.70---
MetaMeta Llama 3.3 70B Instruct Turbometa-llama/Llama-3.3-70B-Instruct-Turbo-test131072$0.88-$0.88---
QwenQwen3 Coder Next Fp8Qwen/Qwen3-Coder-Next-FP8262144$0.50-$1.20---
Chat model examples

Image models

Use our Images endpoint for image models.
OrganizationModel nameModel string for APIPrice per MPDefault steps
GoogleImagen 4.0 Previewgoogle/imagen-4.0-preview$0.04-
GoogleImagen 4.0 Fastgoogle/imagen-4.0-fast$0.02-
GoogleImagen 4.0 Ultragoogle/imagen-4.0-ultra$0.06-
GoogleFlash Image 2.5 (Nano Banana)google/flash-image-2.5$0.039-
GoogleGemini 3 Pro Image (Nano Banana Pro)google/gemini-3-pro-image$0.134-
Black Forest LabsFlux.1 [schnell] (Turbo)black-forest-labs/FLUX.1-schnell$0.00274
Black Forest LabsFlux1.1 [pro]black-forest-labs/FLUX.1.1-pro$0.04-
Black Forest LabsFlux.1 Kontext [pro]black-forest-labs/FLUX.1-kontext-pro$0.0428
Black Forest LabsFlux.1 Kontext [max]black-forest-labs/FLUX.1-kontext-max$0.0828
Black Forest LabsFLUX.1 Krea [dev]black-forest-labs/FLUX.1-krea-dev$0.02528
Black Forest LabsFLUX.2 [pro]black-forest-labs/FLUX.2-pro$0.03-
Black Forest LabsFLUX.2 [dev]black-forest-labs/FLUX.2-dev$0.0154-
Black Forest LabsFLUX.2 [flex]black-forest-labs/FLUX.2-flex$0.03-
ByteDanceSeedream 3.0ByteDance-Seed/Seedream-3.0$0.018-
ByteDanceSeedream 4.0ByteDance-Seed/Seedream-4.0$0.03-
QwenQwen ImageQwen/Qwen-Image$0.0058-
RunDiffusionJuggernaut Pro FluxRunDiffusion/Juggernaut-pro-flux$0.0049-
RunDiffusionJuggernaut Lightning FluxRundiffusion/Juggernaut-Lightning-Flux$0.0017-
HiDreamHiDream-I1-FullHiDream-ai/HiDream-I1-Full$0.009-
HiDreamHiDream-I1-DevHiDream-ai/HiDream-I1-Dev$0.0045-
HiDreamHiDream-I1-FastHiDream-ai/HiDream-I1-Fast$0.0032-
IdeogramIdeogram 3.0ideogram/ideogram-3.0$0.06-
LykonDreamshaperLykon/DreamShaper$0.0006-
Stability AIStable Diffusion 3stabilityai/stable-diffusion-3-medium$0.0019-
Stability AISD XLstabilityai/stable-diffusion-xl-base-1.0$0.0019-
Black Forest LabsFLUX.2 [max]black-forest-labs/FLUX.2-max$0.0750
GoogleGemini 3.1 Flash Image (Nano Banana 2)google/flash-image-3.1$0.05-
OpenAIGPT Image 1.5openai/gpt-image-1.5$0.034-
QwenQwen Image 2.0Qwen/Qwen-Image-2.0$0.035-
QwenQwen Image 2.0 ProQwen/Qwen-Image-2.0-Pro$0.075-
Wan-AIWan 2.6 ImageWan-AI/Wan2.6-image$0.03-
xAIGrok Imagine Image Proxai/grok-imagine-image-pro$0.07-
Calling image models require a positive credit balance.

Image model examples

  • Blinkshot.io: A realtime AI image playground built with Flux Schnell.
  • Logo creator: A logo generator that creates professional logos in seconds using Flux Pro 1.1.
  • PicMenu: A menu visualizer that takes a restaurant menu and generates nice images for each dish.
  • Flux LoRA inference notebook: Using LoRA fine-tuned image generations models.
FLUX pricing For FLUX models (excluding pro models) pricing is based on the size of generated images in megapixels and the number of steps used (if the number of steps exceed the default steps).
  • Default pricing: The listed per megapixel prices are for the default number of steps.
  • Using more or fewer steps: Costs are adjusted based on the number of steps used only if you go above the default steps. If you use more steps, the cost increases proportionally using the formula below. If you use fewer steps, the cost does not decrease and is based on the default rate.
Here’s a formula to calculate cost: Cost = MP × Price per MP × (Steps ÷ Default Steps) Where:
  • MP = (Width × Height ÷ 1,000,000).
  • Price per MP = Cost for generating one megapixel at the default steps.
  • Steps = The number of steps used for the image generation. This is only factored in if going above default steps.

Gemini 3 Pro Image pricing

Gemini 3 Pro Image offers pricing based on the resolution of the image.
  • 1080p and 2K: $0.134/image.
  • 4K resolution: $0.24/image.
Supported dimensions: 1K: 1024×1024 (1:1), 1264×848 (3:2), 848×1264 (2:3), 1200×896 (4:3), 896×1200 (3:4), 928×1152 (4:5), 1152×928 (5:4), 768×1376 (9:16), 1376×768 (16:9), 1548×672 or 1584×672 (21:9). 2K: 2048×2048 (1:1), 2528×1696 (3:2), 1696×2528 (2:3), 2400×1792 (4:3), 1792×2400 (3:4), 1856×2304 (4:5), 2304×1856 (5:4), 1536×2752 (9:16), 2752×1536 (16:9), 3168×1344 (21:9). 4K: 4096×4096 (1:1), 5096×3392 or 5056×3392 (3:2), 3392×5096 or 3392×5056 (2:3), 4800×3584 (4:3), 3584×4800 (3:4), 3712×4608 (4:5), 4608×3712 (5:4), 3072×5504 (9:16), 5504×3072 (16:9), 6336×2688 (21:9).

Vision models

If you’re not sure which vision model to use, we currently recommend Qwen3.5 397B A17B (Qwen/Qwen3.5-397B-A17B) to get started. For model specific rate limits, navigate here.
OrganizationModel nameAPI model stringContext lengthInput pricing (per 1M tokens)Output pricing (per 1M tokens)
QwenQwen3.5 397B A17BQwen/Qwen3.5-397B-A17B262144$0.60$3.60
QwenQwen3.5 9BQwen/Qwen3.5-9B262144$0.10$0.15
GoogleGemma 4 31B ITgoogle/gemma-4-31B-it262144$0.20$0.50
MoonshotKimi K2.5moonshotai/Kimi-K2.5262144$0.50$2.80

Vision model examples

Video models

OrganizationModel nameModel string for APIPrice per videoResolution / duration
MiniMaxMiniMax 01 Directorminimax/video-01-director$0.28720p / 5s
MiniMaxMiniMax Hailuo 02minimax/hailuo-02$0.49768p / 10s
GoogleVeo 2.0google/veo-2.0$2.50720p / 5s
GoogleVeo 3.0google/veo-3.0$1.60720p / 8s
GoogleVeo 3.0 + Audiogoogle/veo-3.0-audio$3.20720p / 8s
GoogleVeo 3.0 Fastgoogle/veo-3.0-fast$0.801080p / 8s
GoogleVeo 3.0 Fast + Audiogoogle/veo-3.0-fast-audio$1.201080p / 8s
ByteDanceSeedance 1.0 LiteByteDance/Seedance-1.0-lite$0.14720p / 5s
ByteDanceSeedance 1.0 ProByteDance/Seedance-1.0-pro$0.571080p / 5s
PixVersePixVerse v5pixverse/pixverse-v5$0.301080p / 5s
KuaishouKling 2.1 MasterkwaivgI/kling-2.1-master$0.921080p / 5s
KuaishouKling 2.1 StandardkwaivgI/kling-2.1-standard$0.18720p / 5s
KuaishouKling 2.1 ProkwaivgI/kling-2.1-pro$0.321080p / 5s
KuaishouKling 2.0 MasterkwaivgI/kling-2.0-master$0.921080p / 5s
KuaishouKling 1.6 StandardkwaivgI/kling-1.6-standard$0.19720p / 5s
KuaishouKling 1.6 ProkwaivgI/kling-1.6-pro$0.321080p / 5s
Wan-AIWan 2.2 I2VWan-AI/Wan2.2-I2V-A14B$0.31-
Wan-AIWan 2.2 T2VWan-AI/Wan2.2-T2V-A14B$0.66-
ViduVidu 2.0vidu/vidu-2.0$0.80720p / 8s
ViduVidu Q1vidu/vidu-q1$0.221080p / 5s
OpenAISora 2openai/sora-2$0.80720p / 8s
OpenAISora 2 Proopenai/sora-2-pro$2.401080p / 8s
PixVersePixVerse v5.6pixverse/pixverse-v5.6$0.1326-
Wan-AIWan 2.7 T2VWan-AI/wan2.7-t2v$0.10-
GoogleVeo 3.1 Debug Testgoogle/veo-3.1-test-debug$0.08-
ViduVidu Q3vidu/vidu-q3$0.0975-
ViduVidu Q3 Turbovidu/vidu-q3-turbo$0.195-
Wan-AIWan 2.7 I2VWan-AI/wan2.7-i2v$0.10-
Wan-AIWan 2.7 R2VWan-AI/wan2.7-r2v$0.10-
PixVersePixVerse v6pixverse/pixverse-v6$0.09-
AlibabaHappyHorse 1.0 T2Valibaba/happyhorse-1.0-t2v$0.24-
ByteDanceByteDance Seedance 2.0ByteDance/Seedance-2.0$0.16-

Audio models

Use our Audio endpoint for text-to-speech models. For speech-to-text models see Transcription and Translations
OrganizationModalityModel nameModel string for APIPricing
Canopy LabsText-to-SpeechOrpheus 3Bcanopylabs/orpheus-3b-0.1-ft$15.00 per 1M chars
KokoroText-to-SpeechKokorohexgrad/Kokoro-82M$4.00 per 1M chars
CartesiaText-to-SpeechCartesia Sonic 3cartesia/sonic-3$65.00 per 1M chars
CartesiaText-to-SpeechCartesia Sonic 2cartesia/sonic-2$65.00 per 1M chars
CartesiaText-to-SpeechCartesia Soniccartesia/sonic$65.00 per 1M chars
OpenAISpeech-to-TextWhisper Large v3openai/whisper-large-v3$0.0015 per audio min
NVIDIASpeech-to-TextParakeet TDT 0.6B v3nvidia/parakeet-tdt-0.6b-v3$0.0015 per audio min
Audio model examples

Embedding models

Model nameModel string for APIModel sizeEmbedding dimensionContext windowPricing (per 1M tokens)
Multilingual-e5-large-instructintfloat/multilingual-e5-large-instruct560M1024514$0.02

Embedding model examples

Rerank models

There are currently no rerank models offered via serverless. Rerank models like mixedbread-ai/mxbai-rerank-large-v2 are only available as dedicated endpoints.

Rerank model examples

Moderation models

Use our Completions endpoint to run a moderation model as a standalone classifier, or use it alongside any of the other models above as a filter to safeguard responses from 100+ models, by specifying the parameter "safety_model": "MODEL_API_STRING"
OrganizationModel nameModel string for APIContext lengthPricing (per 1M tokens)
MetaLlama Guard 4 (12B)meta-llama/Llama-Guard-4-12B1048576$0.20