Together’s API is compatible with the OpenAI REST API and SDKs across chat, completions, vision, image generation, text-to-speech, and embeddings. If you have an application that uses the OpenAI Python or TypeScript client (or cURL againstDocumentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
api.openai.com), you can point it at models hosted on Together with just two changes: the API key and base URL.
This page is a configuration reference for the Together AI OpenAI compatibility layer. For end-to-end examples of each capability, follow the links to the dedicated capability pages.
Drop-in client setup
Setapi_key to your Together API key (or pull it from an environment variable) and base_url to https://api.together.ai/v1:
Endpoint compatibility matrix
The following OpenAI SDK methods route to Together-native endpoints when the base URL is set tohttps://api.together.ai/v1.
| OpenAI SDK call | Together endpoint | Status | Capability page |
|---|---|---|---|
chat.completions.create | POST /v1/chat/completions | Supported | Chat overview, Streaming, Parameters |
chat.completions.create (vision input) | POST /v1/chat/completions | Supported | Vision |
chat.completions.create (tools) | POST /v1/chat/completions | Supported | Function calling |
chat.completions.create (response_format) | POST /v1/chat/completions | Supported | Structured outputs |
completions.create | POST /v1/completions | Supported | Legacy text completions, see Parameters |
embeddings.create | POST /v1/embeddings | Supported | Embeddings |
images.generate | POST /v1/images/generations | Supported | Image generation |
audio.speech.create | POST /v1/audio/speech | Supported | Text-to-speech |
audio.transcriptions.create | POST /v1/audio/transcriptions | Supported | Speech-to-text |
audio.translations.create | POST /v1/audio/translations | Supported | Speech-to-text |
models.list, models.retrieve | GET /v1/models | Supported | Model list |
responses.create (Responses API) | n/a | Not supported | Use chat.completions.create instead |
assistants.*, threads.*, runs.* | n/a | Not supported | Build agent loops on top of chat completions and function calling |
fine_tuning.jobs.* (OpenAI shape) | n/a | Not supported | Use the Together-native fine-tuning API |
files.* (OpenAI shape) | n/a | Partial | Together has its own Files API for fine-tuning datasets and batch jobs |
batches.* (OpenAI shape) | n/a | Not supported | Use the Together-native Batch API |
moderations.create | n/a | Not supported | See moderation models using Llama Guard via chat completions |
requests, fetch, or the Together SDK):
- Video generation, see Video generation.
- Image edits and inpainting beyond
images.generate, see Image generation. - Reasoning controls and
reasoning_content, see Reasoning. - Logprobs surface, see Logprobs.
Drop-in compatibility
These capabilities work without code changes beyond the API key and base URL. Each row maps a Together capability to the OpenAI SDK method that drives it.| Capability | OpenAI SDK method | Capability page |
|---|---|---|
| Chat completions (with streaming) | chat.completions.create | Chat overview |
| Vision (image inputs) | chat.completions.create with image content parts | Vision |
| Function calling | chat.completions.create with tools and tool_choice | Function calling |
| Structured outputs | chat.completions.create with response_format | Structured outputs |
| Embeddings | embeddings.create | Embeddings |
| Image generation | images.generate | Image generation |
| Text-to-speech | audio.speech.create | Text-to-speech |
| Speech-to-text and translation | audio.transcriptions.create, audio.translations.create | Speech-to-text |
Known incompatibilities
Model identifiers
Together model IDs are namespaced (openai/gpt-oss-20b, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, black-forest-labs/FLUX.2-dev). OpenAI model strings like gpt-4o or text-embedding-3-large return a 404. Browse the full list at Available models.
Endpoints not implemented
- The Responses API (
/v1/responses) is not implemented. Use chat completions instead. - Assistants, Threads, and Runs are not implemented. Build the agent loop yourself with function calling.
- The OpenAI-shaped Batch API and Files API are not exposed through
/v1. Together has separate equivalents, see Batch processing and Files. moderations.createis not implemented. Use Llama Guard via chat completions, see Moderation models.
Parameter quirks
logprobsreturns Together’s own shape, which is richer than OpenAI’s. See Logprobs.seedis best-effort. Determinism is not guaranteed across replicas, model versions, or load conditions.n(multiple completions per request) is supported on most chat models but not on every model. Loop client-side if a model rejects it.logit_biasis not supported on most models.service_tier,store,metadata, andpredictionare accepted but ignored.reasoning_effortworks on GPT-OSS models ("low","medium","high"). Other reasoning controls (Together’sreasoning={"enabled": ...}toggle,chat_template_kwargs) are not part of OpenAI’s API surface. See Reasoning.- Vision models accept
image_urlwith both remote URLs and base64 data URIs. Thedetailfield is accepted but ignored.
Response shape differences
usageincludesprompt_tokens,completion_tokens, andtotal_tokens. Some Together-only fields (for examplecached_tokens,reasoning_tokens) appear on models that support them and may not match OpenAI’sprompt_tokens_details/completion_tokens_detailsnesting exactly.- Reasoning models return reasoning traces in a
reasoningfield on the assistant message rather than OpenAI’sreasoningobject structure. See Reasoning. idandsystem_fingerprintare present but use Together’s formats. Don’t parse them as OpenAI IDs.images.generatereturnsurlorb64_jsonper theresponse_formatparam, matching OpenAI. Some image models also return Together-specific metadata fields (for exampleseed).
Errors
Together returns OpenAI-shaped error objects ({ "error": { "message", "type", "code" } }), but type and code values are Together’s. Match on HTTP status (400, 401, 404, 429, 500, 503) for portable handling.