Get started
Quickstart
Set up an API key and make your first call in Python, TypeScript, or cURL.
Recommended models
Our picks for common inference use cases.
Pricing
How Together AI bills for inference.
Shared inference API
Serverless models and dedicated endpoints use the same inference APIs for generating and retrieving model outputs. Apps work on either deployment mode without code changes; just swap themodel parameter:
Integrations
OpenAI compatibility
Drop-in replacement for OpenAI clients.
SDK integrations
Together SDKs and framework wiring.
Batch processing
Batch processing
If your workload doesn’t need a real-time response, submit it as a batch job for up to 50% off serverless rates.
Model capabilities
Chat & text
Chat completions, streaming, parameters.
Function calling
Tool use and agentic loops.
Vision
Pass images alongside text.
Image generation
FLUX, Kontext, and Google models.
Video generation
Text-to-video and image-to-video.
Speech-to-text
Batch and streaming transcription.
Text-to-speech
HTTP and WebSocket audio output.
Embeddings & rerank
Vectors, rerankers, and RAG.