Run AI modelsRun leading open source AI models (across chat, image, vision, ect...) with our OpenAI-compatible API.
Fine-tune modelsFinetune models on your own data (or bring your own model) and run inference for them on Together
Launch a GPU clusterInstantly spin up H100 and B200 clusters with attached storage for training or large batch jobs.
Our models:Together hosts many popular models via our serverless endpoints and dedicated endpoints. On serverless, you’ll be charged based on the tokens you use and size of the model. On dedicated, you’ll be charged based on GPU hours.
Chat models:
View all models
Image models:
View all models
Vision models:
View all models
Audio models:
View all models
Embedding models:
Rerank models:
Build AI apps and agents with Together:
Build an agentBuild agent workflows to solve real use cases with Together
Build a Next.js chatbotSpin up a production-ready chatbot using Together + Next.js.
Build RAG appsCombine retrieval and generation to build grounded RAG apps.
Build a real-time image appStream real-time image generations with Flux Schnell on Together.
Build a text → app workflowTurn natural language into interactive apps with Together + CodeSandbox.
Build an AI search engineShip a simplified Perplexity-style search using Together models.
Use structured inputs with LLM’sGet reliable JSON by defining schemas and using structured outputs.
Working with reasoning modelsUse open reasoning models (e.g., DeepSeek-R1) for logic-heavy, multi-step tasks.
Explore our services:
Spin up a batch jobQueue async generations and fetch results later.
Run a dedicated instanceProvision single-tenant GPUs for predictable, isolated latency.
Use our evals APIAutomate scoring with LLM judges and reports.
Do code execution with together code sandboxRun Python safely alongside model calls.
Bring your own modelUpload weights and serve them via our API.