Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Together’s API is compatible with the OpenAI REST API and SDKs across chat, completions, vision, image generation, text-to-speech, and embeddings. If you have an application that uses the OpenAI Python or TypeScript client (or cURL against api.openai.com), you can point it at models hosted on Together with just two changes: the API key and base URL. This page is a configuration reference for the Together AI OpenAI compatibility layer. For end-to-end examples of each capability, follow the links to the dedicated capability pages.

Drop-in client setup

Set api_key to your Together API key (or pull it from an environment variable) and base_url to https://api.together.ai/v1:
import os
import openai

client = openai.OpenAI(
    api_key=os.environ.get("TOGETHER_API_KEY"),
    base_url="https://api.together.ai/v1",
)

response = client.chat.completions.create(
    model="openai/gpt-oss-20b",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)
You can find your API key in your settings page. If you don’t have an account, you can register for free.
Substitute the model field with any Together model ID. Model names follow the <provider>/<model_name> convention rather than OpenAI’s flat namespace.

Endpoint compatibility matrix

The following OpenAI SDK methods route to Together-native endpoints when the base URL is set to https://api.together.ai/v1.
OpenAI SDK callTogether endpointStatusCapability page
chat.completions.createPOST /v1/chat/completionsSupportedChat overview, Streaming, Parameters
chat.completions.create (vision input)POST /v1/chat/completionsSupportedVision
chat.completions.create (tools)POST /v1/chat/completionsSupportedFunction calling
chat.completions.create (response_format)POST /v1/chat/completionsSupportedStructured outputs
completions.createPOST /v1/completionsSupportedLegacy text completions, see Parameters
embeddings.createPOST /v1/embeddingsSupportedEmbeddings
images.generatePOST /v1/images/generationsSupportedImage generation
audio.speech.createPOST /v1/audio/speechSupportedText-to-speech
audio.transcriptions.createPOST /v1/audio/transcriptionsSupportedSpeech-to-text
audio.translations.createPOST /v1/audio/translationsSupportedSpeech-to-text
models.list, models.retrieveGET /v1/modelsSupportedModel list
responses.create (Responses API)n/aNot supportedUse chat.completions.create instead
assistants.*, threads.*, runs.*n/aNot supportedBuild agent loops on top of chat completions and function calling
fine_tuning.jobs.* (OpenAI shape)n/aNot supportedUse the Together-native fine-tuning API
files.* (OpenAI shape)n/aPartialTogether has its own Files API for fine-tuning datasets and batch jobs
batches.* (OpenAI shape)n/aNot supportedUse the Together-native Batch API
moderations.createn/aNot supportedSee moderation models using Llama Guard via chat completions
Together-native endpoints not exposed by the OpenAI SDKs (call them with requests, fetch, or the Together SDK):

Drop-in compatibility

These capabilities work without code changes beyond the API key and base URL. Each row maps a Together capability to the OpenAI SDK method that drives it.
CapabilityOpenAI SDK methodCapability page
Chat completions (with streaming)chat.completions.createChat overview
Vision (image inputs)chat.completions.create with image content partsVision
Function callingchat.completions.create with tools and tool_choiceFunction calling
Structured outputschat.completions.create with response_formatStructured outputs
Embeddingsembeddings.createEmbeddings
Image generationimages.generateImage generation
Text-to-speechaudio.speech.createText-to-speech
Speech-to-text and translationaudio.transcriptions.create, audio.translations.createSpeech-to-text
Video generation is Together-native and isn’t exposed through the OpenAI SDK.

Known incompatibilities

Model identifiers

Together model IDs are namespaced (openai/gpt-oss-20b, meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8, black-forest-labs/FLUX.2-dev). OpenAI model strings like gpt-4o or text-embedding-3-large return a 404. Browse the full list at Available models.

Endpoints not implemented

  • The Responses API (/v1/responses) is not implemented. Use chat completions instead.
  • Assistants, Threads, and Runs are not implemented. Build the agent loop yourself with function calling.
  • The OpenAI-shaped Batch API and Files API are not exposed through /v1. Together has separate equivalents, see Batch processing and Files.
  • moderations.create is not implemented. Use Llama Guard via chat completions, see Moderation models.

Parameter quirks

  • logprobs returns Together’s own shape, which is richer than OpenAI’s. See Logprobs.
  • seed is best-effort. Determinism is not guaranteed across replicas, model versions, or load conditions.
  • n (multiple completions per request) is supported on most chat models but not on every model. Loop client-side if a model rejects it.
  • logit_bias is not supported on most models.
  • service_tier, store, metadata, and prediction are accepted but ignored.
  • reasoning_effort works on GPT-OSS models ("low", "medium", "high"). Other reasoning controls (Together’s reasoning={"enabled": ...} toggle, chat_template_kwargs) are not part of OpenAI’s API surface. See Reasoning.
  • Vision models accept image_url with both remote URLs and base64 data URIs. The detail field is accepted but ignored.

Response shape differences

  • usage includes prompt_tokens, completion_tokens, and total_tokens. Some Together-only fields (for example cached_tokens, reasoning_tokens) appear on models that support them and may not match OpenAI’s prompt_tokens_details / completion_tokens_details nesting exactly.
  • Reasoning models return reasoning traces in a reasoning field on the assistant message rather than OpenAI’s reasoning object structure. See Reasoning.
  • id and system_fingerprint are present but use Together’s formats. Don’t parse them as OpenAI IDs.
  • images.generate returns url or b64_json per the response_format param, matching OpenAI. Some image models also return Together-specific metadata fields (for example seed).

Errors

Together returns OpenAI-shaped error objects ({ "error": { "message", "type", "code" } }), but type and code values are Together’s. Match on HTTP status (400, 401, 404, 429, 500, 503) for portable handling.

Community libraries

The Together API is also supported by most OpenAI libraries built by the community. If you come across unexpected behavior, reach out to support.