Available models

Serverless models are the fastest way to run inference on Together. You call any supported model through a shared per-token API, with no provisioning, no replicas to size, and no minimum cost. Pay only for the tokens you process.

Models

If you’re not sure which model to use, see Recommended models for our picks by use case.

Chat

Image

Vision

Video

Audio

Embedding

Rerank

Moderation

Serverless and dedicated model inference support different sets of models. See the dedicated model inference catalog for details.

For rate limits and pricing, see the Serverless overview.

Chat models

Organization	Model name	API model string	Context length	Input pricing (per 1M tokens)	Cached input pricing (per 1M tokens)	Output pricing (per 1M tokens)	Quantization	Function calling	Structured outputs
Thinking Machines	Inkling	thinkingmachines/Inkling	524288	$1.00	$0.17	$4.05	NVFP4	Yes	Yes
Minimax	Minimax M3	MiniMaxAI/MiniMax-M3	524288	$0.30	$0.06	$1.20	FP4	Yes	Yes
Qwen	Qwen3.7 Max	Qwen/Qwen3.7-Max	-	$1.25	-	$3.75	-	-	-
Qwen	Qwen3.6 Plus	Qwen/Qwen3.6-Plus	1000000	$0.50	-	$3.00	-	-	-
Qwen	Qwen3.5 9B	Qwen/Qwen3.5-9B	262144	$0.17	-	$0.25	FP8	Yes	Yes
Moonshot	Kimi K3	moonshotai/Kimi-K3	1000000	$3.00	$0.30	$15.00	-	Yes	Yes
Moonshot	Kimi K2.7 Code	moonshotai/Kimi-K2.7-Code	262144	$0.95	$0.19	$4.00	FP4	Yes	Yes
Moonshot	Kimi K2.6	moonshotai/Kimi-K2.6	262144	$1.20	$0.20	$4.50	FP4	Yes	Yes
Z.ai	GLM-5.2	zai-org/GLM-5.2	262144	$1.40	$0.26	$4.40	FP4	Yes	Yes
OpenAI	GPT-OSS 120B	openai/gpt-oss-120b	128000	$0.15	-	$0.60	MXFP4	Yes	Yes
OpenAI	GPT-OSS 20B	openai/gpt-oss-20b	128000	$0.05	-	$0.20	MXFP4	Yes	Yes
DeepSeek	DeepSeek-V4-Pro	deepseek-ai/DeepSeek-V4-Pro	512000	$1.74	$0.20	$3.48	FP4	Yes	Yes
NVIDIA	Nemotron 3 Ultra 550B A55B	nvidia/nemotron-3-ultra-550b-a55b	512300	$0.60	$0.20	$3.60	NVFP4	Yes	Yes
Meta	Llama 3.3 70B Instruct Turbo	meta-llama/Llama-3.3-70B-Instruct-Turbo	131072	$1.04	-	$1.04	FP8	Yes	Yes
Qwen	Qwen 2.5 7B Instruct Turbo	Qwen/Qwen2.5-7B-Instruct-Turbo	32768	$0.30	-	$0.30	FP8	Yes	Yes
Google	Gemma 4 31B Instruct	google/gemma-4-31B-it	262144	$0.39	-	$0.97	FP8	Yes	Yes
Pearl AI	Gemma 4 31B Instruct	pearl-ai/gemma-4-31b-it	32000	$0.28	-	$0.86	INT8	-	-
Deepcogito	Cogito v2.1 671B	deepcogito/cogito-v2-1-671b	163840	$1.25	-	$1.25	-	-	-
Qwen	Qwen3.7 Plus	Qwen/Qwen3.7-Plus	1000000	$0.32	-	$1.28	-	-	-
Google	Gemma 3N E4B Instruct	google/gemma-3n-E4B-it	32768	$0.06	-	$0.12	-	-	-
LiquidAI	LFM2.5-8B-A1B	LiquidAI/LFM2.5-8B-A1B	32768	$0.03	-	$0.12	-	-	-

Chat model examples

PDF to chat app: Chat with your PDFs (blogs, textbooks, papers).
Open deep research notebook: Generate long form reports using a single prompt.
RAG with reasoning models notebook: RAG with DeepSeek-R1.
Fine-tuning chat models notebook: Tune language models for conversation.
Building agents: Agent workflows with language models.

Image models

Use our Images endpoint for image models.

Organization	Model name	Model string for API	Price per MP	Default steps
Google	Imagen 4.0 Preview	google/imagen-4.0-preview	$0.04	-
Google	Imagen 4.0 Fast	google/imagen-4.0-fast	$0.02	-
Google	Imagen 4.0 Ultra	google/imagen-4.0-ultra	$0.06	-
Google	Flash Image 2.5 (Nano Banana)	google/flash-image-2.5	$0.039	-
Google	Gemini 3 Pro Image (Nano Banana Pro)	google/gemini-3-pro-image	$0.134	-
Black Forest Labs	Flux.1 [schnell] (Turbo)	black-forest-labs/FLUX.1-schnell	$0.0027	4
Black Forest Labs	Flux1.1 [pro]	black-forest-labs/FLUX.1.1-pro	$0.04	-
Black Forest Labs	Flux.1 Kontext [pro]	black-forest-labs/FLUX.1-kontext-pro	$0.04	28
Black Forest Labs	Flux.1 Kontext [max]	black-forest-labs/FLUX.1-kontext-max	$0.08	28
Black Forest Labs	FLUX.2 [pro]	black-forest-labs/FLUX.2-pro	$0.03	-
Black Forest Labs	FLUX.2 [dev]	black-forest-labs/FLUX.2-dev	$0.0154	-
Black Forest Labs	FLUX.2 [flex]	black-forest-labs/FLUX.2-flex	$0.03	-
ByteDance	Seedream 3.0	ByteDance-Seed/Seedream-3.0	$0.018	-
ByteDance	Seedream 4.0	ByteDance-Seed/Seedream-4.0	$0.03	-
ByteDance	Seedream 5.0 Lite	ByteDance/Seedream-5.0-lite	-	-
Qwen	Qwen Image	Qwen/Qwen-Image	$0.0058	-
RunDiffusion	Juggernaut Pro Flux	RunDiffusion/Juggernaut-pro-flux	$0.0049	-
RunDiffusion	Juggernaut Lightning Flux	Rundiffusion/Juggernaut-Lightning-Flux	$0.0017	-
Ideogram	Ideogram 3.0	ideogram/ideogram-3.0	$0.06	-
Stability AI	SD XL	stabilityai/stable-diffusion-xl-base-1.0	$0.0019	-
Black Forest Labs	FLUX.2 [max]	black-forest-labs/FLUX.2-max	$0.07	50
Google	Gemini 3.1 Flash Image (Nano Banana 2)	google/flash-image-3.1	$0.05	-
OpenAI	GPT Image 1.5	openai/gpt-image-1.5	$0.034	-
Qwen	Qwen Image 2.0	Qwen/Qwen-Image-2.0	$0.035	-
Qwen	Qwen Image 2.0 Pro	Qwen/Qwen-Image-2.0-Pro	$0.075	-
Wan-AI	Wan 2.6 Image	Wan-AI/Wan2.6-image	$0.03	-
ideogram	Ideogram 4.0	ideogram/ideogram-4.0	-	-
OpenAI	GPT Image 2	openai/gpt-image-2	-	-
Google	Gemini 3.1 Flash-Lite Image (Nano Banana 2 Lite)	google/flash-image-3.1-lite	-	-

Calling image models requires a positive credit balance.

Image model examples

Blinkshot.io: A realtime AI image playground built with Flux Schnell.
Logo creator: A logo generator that creates professional logos in seconds using Flux Pro 1.1.
PicMenu: A menu visualizer that takes a restaurant menu and generates nice images for each dish.
Flux LoRA inference notebook: Using LoRA fine-tuned image generations models.

FLUX pricing For FLUX models (excluding pro models) pricing is based on the size of generated images in megapixels and the number of steps used (if the number of steps exceeds the default steps).

Default pricing: The listed per megapixel prices are for the default number of steps.
Using more or fewer steps: Costs are adjusted based on the number of steps used only if you go above the default steps. If you use more steps, the cost increases proportionally using the formula below. If you use fewer steps, the cost does not decrease and is based on the default rate.

Here’s a formula to calculate cost: Cost = MP × Price per MP × (Steps ÷ Default Steps) Where:

MP = (Width × Height ÷ 1,000,000).
Price per MP = Cost for generating one megapixel at the default steps.
Steps = The number of steps used for the image generation. This is only factored in if going above default steps.

Gemini 3 Pro Image pricing

Gemini 3 Pro Image offers pricing based on the resolution of the image.

1080p and 2K: $0.134/image.
4K resolution: $0.24/image.

Supported dimensions: 1K: 1024×1024 (1:1), 1264×848 (3:2), 848×1264 (2:3), 1200×896 (4:3), 896×1200 (3:4), 928×1152 (4:5), 1152×928 (5:4), 768×1376 (9:16), 1376×768 (16:9), 1548×672 or 1584×672 (21:9). 2K: 2048×2048 (1:1), 2528×1696 (3:2), 1696×2528 (2:3), 2400×1792 (4:3), 1792×2400 (3:4), 1856×2304 (4:5), 2304×1856 (5:4), 1536×2752 (9:16), 2752×1536 (16:9), 3168×1344 (21:9). 4K: 4096×4096 (1:1), 5096×3392 or 5056×3392 (3:2), 3392×5096 or 3392×5056 (2:3), 4800×3584 (4:3), 3584×4800 (3:4), 3712×4608 (4:5), 4608×3712 (5:4), 3072×5504 (9:16), 5504×3072 (16:9), 6336×2688 (21:9).

Vision models

If you’re not sure which vision model to use, we currently recommend Qwen3.5 9B (Qwen/Qwen3.5-9B) to get started. For model specific rate limits, navigate here.

Organization	Model name	API model string	Context length	Input pricing (per 1M tokens)	Output pricing (per 1M tokens)
Qwen	Qwen3.5 9B	Qwen/Qwen3.5-9B	262144	$0.17	$0.25
Google	Gemma 4 31B IT	google/gemma-4-31B-it	262144	$0.39	$0.97
Minimax	Minimax M3	MiniMaxAI/MiniMax-M3	524288	$0.30	$1.20
Moonshot	Kimi K3	moonshotai/Kimi-K3	1000000	$3.00	$15.00
Moonshot	Kimi K2.7 Code	moonshotai/Kimi-K2.7-Code	262144	$0.95	$4.00
Moonshot	Kimi K2.6	moonshotai/Kimi-K2.6	262144	$1.20	$4.50

Vision model examples

LlamaOCR: A tool that takes documents (like receipts) and outputs markdown.
Wireframe to code: A wireframe to app tool that takes in a UI mockup of a site and gives you React code.
Extracting structured data from images: Extract information from images as JSON.

Video models

Organization	Model name	Model string for API	Price per video	Resolution / duration
MiniMax	MiniMax 01 Director	minimax/video-01-director	$0.28	720p / 5s
MiniMax	MiniMax Hailuo 02	minimax/hailuo-02	$0.49	768p / 10s
Google	Veo 2.0	google/veo-2.0	$2.50	720p / 5s
Google	Veo 3.0	google/veo-3.0	$1.60	720p / 8s
Google	Veo 3.0 + Audio	google/veo-3.0-audio	$3.20	720p / 8s
Google	Veo 3.0 Fast	google/veo-3.0-fast	$0.80	1080p / 8s
Google	Veo 3.0 Fast + Audio	google/veo-3.0-fast-audio	$1.20	1080p / 8s
ByteDance	Seedance 1.0 Lite	ByteDance/Seedance-1.0-lite	$0.14	720p / 5s
ByteDance	Seedance 1.0 Pro	ByteDance/Seedance-1.0-pro	$0.57	1080p / 5s
PixVerse	PixVerse v5	pixverse/pixverse-v5	$0.30	1080p / 5s
Kuaishou	Kling 2.1 Master	kwaivgI/kling-2.1-master	$0.92	1080p / 5s
Kuaishou	Kling 2.1 Standard	kwaivgI/kling-2.1-standard	$0.18	720p / 5s
Kuaishou	Kling 2.1 Pro	kwaivgI/kling-2.1-pro	$0.32	1080p / 5s
Kuaishou	Kling 1.6 Standard	kwaivgI/kling-1.6-standard	$0.19	720p / 5s
Wan-AI	Wan 2.2 I2V	Wan-AI/Wan2.2-I2V-A14B	$0.31	-
Wan-AI	Wan 2.2 T2V	Wan-AI/Wan2.2-T2V-A14B	$0.66	-
Vidu	Vidu 2.0	vidu/vidu-2.0	$0.80	720p / 8s
Vidu	Vidu Q1	vidu/vidu-q1	$0.22	1080p / 5s
OpenAI	Sora 2	openai/sora-2	$0.80	720p / 8s
OpenAI	Sora 2 Pro	openai/sora-2-pro	$2.40	1080p / 8s
PixVerse	PixVerse v5.6	pixverse/pixverse-v5.6	$0.1326	-
Wan-AI	Wan 2.7 T2V	Wan-AI/wan2.7-t2v	$0.10	-
Google	Veo 3.1 Debug Test	google/veo-3.1-test-debug	$0.08	-
Vidu	Vidu Q3	vidu/vidu-q3	$0.0975	-
Vidu	Vidu Q3 Turbo	vidu/vidu-q3-turbo	$0.195	-
Wan-AI	Wan 2.7 I2V	Wan-AI/wan2.7-i2v	$0.10	-
Wan-AI	Wan 2.7 R2V	Wan-AI/wan2.7-r2v	$0.10	-
PixVerse	PixVerse v6	pixverse/pixverse-v6	$0.09	-
Alibaba	HappyHorse 1.0 T2V	alibaba/happyhorse-1.0-t2v	$0.24	-
ByteDance	ByteDance Seedance 2.0	ByteDance/Seedance-2.0	$0.16	-
Alibaba	HappyHorse 1.0 I2V	alibaba/happyhorse-1.0-i2v	-	-
Alibaba	HappyHorse 1.0 R2V	alibaba/happyhorse-1.0-r2v	-	-
Google	Veo 3.1	google/veo-3.1	-	-
Google	Veo 3.1 Lite	google/veo-3.1-lite	-	-
Alibaba	HappyHorse 1.1 I2V	alibaba/happyhorse-1.1-i2v	-	-
Alibaba	HappyHorse 1.1 R2V	alibaba/happyhorse-1.1-r2v	-	-
Alibaba	HappyHorse 1.1 T2V	alibaba/happyhorse-1.1-t2v	-	-
HappyHorse		HappyHorse/HappyHorse-1.0-T2V	-	-

Audio models

Use our Audio endpoint for text-to-speech models. For speech-to-text models see Transcription and Translations.

Organization	Modality	Model name	Model string for API	Pricing
Canopy Labs	Text-to-Speech	Orpheus 3B	canopylabs/orpheus-3b-0.1-ft	$15.00 per 1M chars
Kokoro	Text-to-Speech	Kokoro	hexgrad/Kokoro-82M	$4.00 per 1M chars
Cartesia	Text-to-Speech	Cartesia Sonic 3	cartesia/sonic-3	$65.00 per 1M chars
Cartesia	Text-to-Speech	Cartesia Sonic 2	cartesia/sonic-2	$65.00 per 1M chars
Cartesia	Text-to-Speech	Cartesia Sonic	cartesia/sonic	$65.00 per 1M chars
OpenAI	Speech-to-Text	Whisper Large v3	openai/whisper-large-v3	$0.0015 per audio min
NVIDIA	Speech-to-Text	Parakeet TDT 0.6B v3	nvidia/parakeet-tdt-0.6b-v3	$0.0015 per audio min
NVIDIA	Speech-to-Text	NVIDIA Nemotron 3 ASR Streaming 0.6B	nvidia/nemotron-3-asr-streaming-0.6b	$0.0015 per audio min
NVIDIA	Speech-to-Text	NVIDIA Nemotron 3.5 ASR Streaming 0.6B	nvidia/nemotron-3.5-asr-streaming-0.6b	$0.0015 per audio min

Audio model examples

PDF to podcast notebook: Generate a NotebookLM style podcast given a PDF.
Audio podcast agent workflow: Agent workflow to generate audio files given input content.

Embedding models

Model name	Model string for API	Model size	Embedding dimension	Context window	Pricing (per 1M tokens)
Multilingual-e5-large-instruct	intfloat/multilingual-e5-large-instruct	560M	1024	514	$0.02

Embedding model examples

Contextual RAG: An open source implementation of contextual RAG by Anthropic.
Code generation agent: An agent workflow to generate and iteratively improve code.
Multimodal search and image generation: Search for images and generate more similar ones.
Visualizing embeddings: Visualizing and clustering vector embeddings.

Rerank models

There are currently no rerank models offered via serverless. Rerank models like mixedbread-ai/mxbai-rerank-large-v2 are only available with dedicated model inference.

Rerank model examples

Search and reranking: Simple semantic search pipeline improved using a reranker.
Implementing hybrid search notebook: Implementing semantic + lexical search along with reranking.

Moderation models

There are currently no moderation models offered via serverless.

GET STARTED

SERVERLESS

INFERENCE APIS

DEDICATED MODEL INFERENCE

DEDICATED CONTAINER INFERENCE

GPU CLUSTERS

FINE-TUNING

CODE EXECUTION

ADMINISTRATION

Models

Chat

Image

Vision

Video

Audio

Embedding

Rerank

Moderation

Chat models

Image models

Image model examples

Gemini 3 Pro Image pricing

Vision models

Vision model examples

Video models

Audio models

Embedding models

Embedding model examples

Rerank models

Rerank model examples

Moderation models

​Models

Chat

Image

Vision

Video

Audio

Embedding

Rerank

Moderation

​Chat models

​Image models

​Image model examples

​Gemini 3 Pro Image pricing

​Vision models

​Vision model examples

​Video models

​Audio models

​Embedding models

​Embedding model examples

​Rerank models

​Rerank model examples

​Moderation models

Models

Chat models

Image models

Image model examples

Gemini 3 Pro Image pricing

Vision models

Vision model examples

Video models

Audio models

Embedding models

Embedding model examples

Rerank models

Rerank model examples

Moderation models