Together hosts 100+ open-source models across text, image, video, and audio. Most of the models below are for instant serverless inference, or reserved hardware deployments on dedicated endpoints. Both options use the same inference API.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Chat & text
| Use case | Recommended model | Model string | Alternatives | Learn more |
|---|---|---|---|---|
| Chat | Kimi K2.5 (instant mode) | moonshotai/Kimi-K2.5 | openai/gpt-oss-120b | Chat completions |
| Reasoning | Kimi K2.5 (reasoning mode) | moonshotai/Kimi-K2.5 | deepseek-ai/DeepSeek-R1, Qwen/Qwen3-235B-A22B-Instruct-2507-tput | Reasoning |
| Coding agents | GLM-5.1 | zai-org/GLM-5.1 | Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 | Build coding agents |
| Small and fast | Gemma 4 31B IT | google/gemma-4-31B-it | openai/gpt-oss-20b, Qwen/Qwen3.5-9B | - |
| Mid-size general purpose | GPT-OSS 120B | openai/gpt-oss-120b | MiniMaxAI/MiniMax-M2.7, meta-llama/Llama-3.3-70B-Instruct-Turbo | - |
| Function calling | GLM-5.1 | zai-org/GLM-5.1 | moonshotai/Kimi-K2.5 | Function calling |
Vision
| Use case | Recommended model | Model string | Alternatives | Learn more |
|---|---|---|---|---|
| Vision | Kimi K2.5 | moonshotai/Kimi-K2.5 | google/gemma-4-31B-it, Qwen/Qwen3.5-397B-A17B, Qwen/Qwen3.5-9B | Vision, OCR quickstart |
Image generation
| Use case | Recommended model | Model string | Alternatives | Learn more |
|---|---|---|---|---|
| Text-to-image | Flash Image 2.5 | google/flash-image-2.5 | black-forest-labs/FLUX.2-pro, ByteDance-Seed/Seedream-4.0 | Text-to-image |
| Image-to-image | Flash Image 2.5 | google/flash-image-2.5 | black-forest-labs/FLUX.1-kontext-max, google/gemini-3-pro-image | Image-to-image |
Video generation
| Use case | Recommended model | Model string | Alternatives | Learn more |
|---|---|---|---|---|
| Text-to-video | Sora 2 Pro | openai/sora-2-pro | google/veo-3.0, ByteDance/Seedance-1.0-pro | Video generation |
| Image-to-video | Veo 3.0 | google/veo-3.0 | ByteDance/Seedance-1.0-pro, kwaivgI/kling-2.1-master | Video generation |
Audio
| Use case | Recommended model | Model string | Alternatives | Learn more |
|---|---|---|---|---|
| Text-to-speech | Cartesia Sonic 3 | cartesia/sonic-3 | canopylabs/orpheus-3b-0.1-ft, hexgrad/Kokoro-82M | Text-to-speech |
| Speech-to-text | Whisper Large v3 | openai/whisper-large-v3 | nvidia/parakeet-tdt-0.6b-v3, deepgram/nova-3-en, mistralai/Voxtral-Mini-3B-2507 | Speech-to-text |
Embeddings, rerank, and moderation
| Use case | Recommended model | Model string | Notes | Learn more |
|---|---|---|---|---|
| Embeddings | Multilingual E5 Large | intfloat/multilingual-e5-large-instruct | - | Embeddings |
| Rerank | MixedBread Rerank Large | mixedbread-ai/Mxbai-Rerank-Large-V2 | Only on dedicated endpoints | Rerank, Improve search with rerankers |
| Moderation | Llama Guard 4 12B | meta-llama/Llama-Guard-4-12B | - | - |
Related resources
Serverless models
Full catalog with context windows, pricing, and capabilities.
Dedicated endpoint models
Models available on reserved hardware.
WhichLLM
Categorical benchmarks to compare models across use cases.
Pricing
Per-token and per-output pricing for all models.