Skip to main content

Overview

We’re excited to announce the release of Python v2 an upgrade to the Together AI Python SDK. This guide will help you migrate from the legacy (v1) SDK to the new version. Why Migrate? The new SDK offers several advantages:
  • Modern Architecture: Built with Stainless OpenAPI generator for consistency and reliability
  • Better Type Safety: Comprehensive typing for better IDE support and fewer runtime errors
  • Broader Python Support: Python 3.8+ (vs 3.10+ in legacy)
  • Modern HTTP Client: Uses httpx instead of requests
  • Faster Performance: ~20ms faster per request on internal benchmarks
  • uv Support: Compatible with uv, the fast Python package installer - uv add together --prerelease allow

Feature Parity Matrix

Use this table to quickly assess the migration effort for your specific use case: Legend: ✅ No changes | ⚠️ Minor changes needed | 🆕 New capability
FeatureLegacy SDKNew SDKMigration Notes
Chat CompletionsNo changes required
Text CompletionsNo changes required
VisionNo changes required
Function CallingNo changes required
Structured Decoding (JSON Mode)No changes required
EmbeddingsNo changes required
Image GenerationNo changes required
Video GenerationNo changes required
StreamingNo changes required
Async SupportNo changes required
Models ListNo changes required
RerankNo changes required
Audio Speech (TTS)⚠️ Voice listing: dict access → attribute access
Audio Transcription⚠️ File paths → file objects with context manager
Audio Translation⚠️ File paths → file objects with context manager
Fine-tuning⚠️ list_checkpoints response changed, downloadcontent
File Upload/Download⚠️ retrieve_contentcontent, no longer writes to disk
Batches⚠️ Method names simplified, response shape changed
Endpoints⚠️ getretrieve, list_hardware removed, response shapes changed
Evaluations⚠️ Namespace changed to evals, parameters restructured
Code Interpreter⚠️ runexecute
Hardware API🆕 New feature (replaces endpoints.list_hardware)
Jobs API🆕 New feature
Raw Response Access🆕 New feature

Installation & Setup

1. Install the New SDK
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create a new project and enter it
uv init myproject
cd myproject

# Install the Together Python SDK (allowing prereleases)
uv add together --prerelease allow

# pip still works aswell
pip install --pre together
2. Dependency Changes The new SDK uses different dependencies. You can remove legacy dependencies if not used elsewhere: Old dependencies (can remove):
requests>=2.31.0
typer>=0.9
aiohttp>=3.9.3
New dependencies (automatically installed):
httpx>=0.23.0
pydantic>=1.9.0
typing-extensions>=4.10
3. Client Initialization Basic client setup remains the same:
from together import Together

# Using API key directly
client = Together(api_key="your-api-key")

# Using environment variable (recommended)
client = Together()  # Uses TOGETHER_API_KEY env var

# Async client
from together import AsyncTogether

async_client = AsyncTogether()
Some constructor parameters have changed. See Constructor Parameters for details.

Global Breaking Changes

Constructor Parameters

The client constructor has been updated with renamed and new parameters:
client = Together(
    api_key="...",
    base_url="...",
    timeout=30,
    max_retries=3,
    supplied_headers={"X-Custom-Header": "value"},
)
Key Changes:
  • supplied_headersdefault_headers (renamed)
  • New optional parameters: default_query, http_client

Keyword-Only Arguments

All API method arguments must now be passed as keyword arguments. Positional arguments are no longer supported.
# ❌ Legacy SDK (positional arguments worked)
response = client.chat.completions.create(
    "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", messages
)

# ✅ New SDK (keyword arguments required)
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", messages=messages
)

Optional Parameters

The new SDK uses NOT_GIVEN instead of None for omitted optional parameters. In most cases, you can simply omit the parameter entirely:
# ❌ Legacy approach
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[...],
    max_tokens=None,  # Don't pass None
)

# ✅ New SDK approach - just omit the parameter
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[...],
    # max_tokens omitted entirely
)

Extra Parameters

The legacy **kwargs pattern has been replaced with explicit parameters for passing additional data:
# ❌ Legacy SDK (**kwargs)
response = client.chat.completions.create(
    model="...",
    messages=[...],
    custom_param="value",  # Passed via **kwargs
)

# ✅ New SDK (explicit extra_* parameters)
response = client.chat.completions.create(
    model="...",
    messages=[...],
    extra_body={"custom_param": "value"},
    extra_headers={"X-Custom-Header": "value"},
    extra_query={"query_param": "value"},
)

Response Type Names

Most API methods have renamed response type definitions. If you’re importing response types for type hints, you’ll need to update your imports:
# ❌ Legacy imports
from together.types import ChatCompletionResponse

# ✅ New imports
from together.types.chat.chat_completion import ChatCompletion

CLI Commands Removed

The following CLI commands have been removed in the new SDK:
  • together chat.completions
  • together completions
  • together images generate

APIs with No Changes Required

The following APIs work identically in both SDKs. No code changes are needed: Chat Completions
response = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Hello!"},
    ],
    max_tokens=512,
    temperature=0.7,
)

print(response.choices[0].message.content)
Streaming
stream = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
Embeddings
response = client.embeddings.create(
    model="togethercomputer/m2-bert-80M-32k-retrieval",
    input=["Hello, world!", "How are you?"],
)

embeddings = [data.embedding for data in response.data]
Images
response = client.images.generate(
    prompt="a flying cat", model="black-forest-labs/FLUX.1-schnell", steps=4
)

print(response.data[0].url)
Videos
import time

# Create a video generation job
job = client.videos.create(
    prompt="A serene sunset over the ocean with gentle waves",
    model="minimax/video-01-director",
    width=1366,
    height=768,
)

print(f"Job ID: {job.id}")

# Poll until completion
while True:
    status = client.videos.retrieve(job.id)
    if status.status == "completed":
        print(f"Video URL: {status.outputs.video_url}")
        break
    elif status.status == "failed":
        print("Video generation failed")
        break
    time.sleep(5)
Rerank
response = client.rerank.create(
    model="Salesforce/Llama-Rank-V1",
    query="What is the capital of France?",
    documents=["Paris is the capital", "London is the capital"],
    top_n=1,
)
Fine-tuning (Basic Operations)
# Create fine-tune job
job = client.fine_tuning.create(
    training_file="file-abc123",
    model="meta-llama/Llama-3.2-3B-Instruct",
    n_epochs=3,
    learning_rate=1e-5,
)

# List jobs
jobs = client.fine_tuning.list()

# Get job details
job = client.fine_tuning.retrieve(id="ft-abc123")

# Cancel job
client.fine_tuning.cancel(id="ft-abc123")

APIs with Changes Required

Batches Method names have been simplified, and the response structure has changed slightly.
# Create batch
batch_job = client.batches.create_batch(
    file_id="file-abc123", endpoint="/v1/chat/completions"
)

# Get batch
batch_job = client.batches.get_batch(batch_job.id)

# List batches
batches = client.batches.list_batches()

# Cancel batch
client.batches.cancel_batch("job_id")
Key Changes:
  • create_batch()create()
  • get_batch()retrieve()
  • list_batches()list()
  • cancel_batch()cancel()
  • file_idinput_file_id
  • create() returns full response; access .job for the job object
Endpoints
# List endpoints
endpoints = client.endpoints.list()
for ep in endpoints:  # Returned array directly
    print(ep.id)

# Create endpoint
endpoint = client.endpoints.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    hardware="80GB-H100",
    min_replicas=1,
    max_replicas=5,
    display_name="My Endpoint",
)

# Get endpoint
endpoint = client.endpoints.get(endpoint_id="ep-abc123")

# List available hardware
hardware = client.endpoints.list_hardware()

# Delete endpoint
client.endpoints.delete(endpoint_id="ep-abc123")
Key Changes:
  • get()retrieve()
  • min_replicas and max_replicas are now nested inside autoscaling parameter
  • list() response changed: previously returned array directly, now returns object with .data
  • list_hardware() removed; use client.hardware.list() instead
Files
# Upload file
response = client.files.upload(file="training_data.jsonl", purpose="fine-tune")

# Download file content to disk
client.files.retrieve_content(
    id="file-abc123", output="downloaded_file.jsonl"  # Writes directly to disk
)
Key Changes:
  • retrieve_content()content()
  • No longer writes to disk automatically; returns binary data for you to handle
Fine-tuning Checkpoints
checkpoints = client.fine_tuning.list_checkpoints("ft-123")

for checkpoint in checkpoints:
    print(checkpoint.type)
    print(checkpoint.timestamp)
    print(checkpoint.name)
Key Changes:
  • Response is now an object with .data containing the list of checkpoints
  • Checkpoint properties renamed: typecheckpoint_type, timestampcreated_at
  • name no longer exists; construct from ft_id and step
Fine-tuning Download
# Download fine-tuned model
client.fine_tuning.download(
    id="ft-abc123", output="model_weights/"  # Writes directly to disk
)
Key Changes:
  • download()content() with streaming response
  • No longer writes to disk automatically
Code Interpreter
# Execute code
result = client.code_interpreter.run(
    code="print('Hello, World!')", language="python", session_id="session-123"
)

print(result.output)
Key Changes:
  • run()execute()
  • Output access: result.outputresult.data.outputs[0].data
  • New sessions.list() method for session management
Audio Transcriptions & Translations The new SDK requires file objects instead of file paths for audio operations. Use context managers for proper resource handling.
# Transcription with file path
response = client.audio.transcriptions.create(
    file="audio.mp3",
    model="openai/whisper-large-v3",
    language="en",
)

# Translation with file path
response = client.audio.translations.create(
    file="french_audio.mp3",
    model="openai/whisper-large-v3",
)
Key Changes:
  • File paths (strings) → file objects opened with open(file, "rb")
  • Use context managers (with open(...) as f:) for proper resource cleanup
Audio Speech (TTS) - Voice Listing When listing available voices, voice properties are now accessed as object attributes instead of dictionary keys.
response = client.audio.voices.list()

for model_voices in response.data:
    print(f"Model: {model_voices.model}")
    for voice in model_voices.voices:
        print(f"  - Voice: {voice['name']}")  # Dict access
Key Changes:
  • Voice properties: voice['name']voice.name (dict access → attribute access)
Evaluations The evaluations API has significant changes including a namespace rename and restructured parameters.
# Create evaluation
evaluation = client.evaluation.create(
    type="classify",
    judge_model_name="meta-llama/Llama-3.1-70B-Instruct-Turbo",
    judge_system_template="You are an expert evaluator...",
    input_data_file_path="file-abc123",
    labels=["good", "bad"],
    pass_labels=["good"],
    model_to_evaluate="meta-llama/Llama-3.1-8B-Instruct-Turbo",
)

# Get evaluation
eval_job = client.evaluation.retrieve(workflow_id=evaluation.workflow_id)

# Get status
status = client.evaluation.status(eval_job.workflow_id)

# List evaluations
evaluations = client.evaluation.list()
Key Changes:
  • Namespace: client.evaluationclient.evals
  • Parameters restructured with typed parameter objects
  • retrieve() and status() no longer use named arguments

New SDK-Only Features

Hardware API Discover available hardware configurations:
# List all available hardware
hardware_list = client.hardware.list()

# Filter by model compatibility
hardware_list = client.hardware.list(
    model="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo"
)

for hw in hardware_list.data:
    price_per_hour = hw.pricing.cents_per_minute * 60 / 100
    print(f"Hardware: {hw.id} - Price: ${price_per_hour}/hour")
Jobs API General job management capabilities:
# Retrieve job details
job = client.jobs.retrieve(job_id="job-abc123")

# List all jobs
jobs = client.jobs.list()

print(f"Job {job.id} status: {job.status}")
Raw Response Access Access raw HTTP responses for debugging:
response = client.chat.completions.with_raw_response.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Hello"}],
)

print(f"Status: {response.status_code}")
print(f"Headers: {response.headers}")
completion = response.parse()  # Get parsed response
Streaming with Context Manager Better resource management for streaming:
with client.chat.completions.with_streaming_response.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True,
) as response:
    for line in response.iter_lines():
        print(line)
# Response automatically closed

Error Handling Migration

The exception hierarchy has been completely restructured with a new, more granular set of HTTP status-specific exceptions. Update your error handling code accordingly:
Legacy SDK ExceptionNew SDK ExceptionNotes
TogetherExceptionTogetherErrorBase exception renamed
AuthenticationErrorAuthenticationErrorHTTP 401
RateLimitErrorRateLimitErrorHTTP 429
TimeoutAPITimeoutErrorRenamed
APIConnectionErrorAPIConnectionErrorUnchanged
ResponseErrorAPIStatusErrorBase class for HTTP errors
InvalidRequestErrorBadRequestErrorHTTP 400
ServiceUnavailableErrorInternalServerErrorHTTP 500+
JSONErrorAPIResponseValidationErrorResponse parsing errors
InstanceErrorAPIStatusErrorUse base class or specific status error
APIErrorAPIErrorBase for all API errors
FileTypeErrorFileTypeErrorStill exists (different module)
DownloadErrorDownloadErrorStill exists (different module)
New exceptions added:
  • PermissionDeniedError (403)
  • NotFoundError (404)
  • ConflictError (409)
  • UnprocessableEntityError (422)
Exception attributes have changed. For example, http_status is now status_code. Check your error handling code for attribute access.
Updated Error Handling Example
import together

try:
    response = client.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
        messages=[{"role": "user", "content": "Hello"}],
    )
except together.APIConnectionError:
    print("Connection error - check your network")
except together.RateLimitError:
    print("Rate limit exceeded - slow down requests")
except together.AuthenticationError:
    print("Invalid API key")
except together.APITimeoutError:
    print("Request timed out")
except together.APIStatusError as e:
    print(f"API error: {e.status_code} - {e.message}")

Troubleshooting

Import Errors Problem:
ImportError: No module named 'together.types.ChatCompletionResponse'
Solution: Response type imports have changed:
# Old import
from together.types import ChatCompletionResponse

# New import
from together.types.chat.chat_completion import ChatCompletion
Method Not Found Errors Problem:
AttributeError: 'BatchesResource' object has no attribute 'create_batch'
Solution: Method names have been simplified:
# Old → New
client.batches.create_batch(...)  →  client.batches.create(...)
client.batches.get_batch(...)     →  client.batches.retrieve(...)
client.batches.list_batches()     →  client.batches.list()
client.endpoints.get(...)         →  client.endpoints.retrieve(...)
client.code_interpreter.run(...)  →  client.code_interpreter.execute(...)
Parameter Type Errors Problem:
TypeError: Expected NotGiven, got None
Solution: Don’t pass None for optional parameters; omit them instead:
# ❌ Wrong
client.chat.completions.create(model="...", messages=[...], max_tokens=None)

# ✅ Correct - just omit the parameter
client.chat.completions.create(model="...", messages=[...])
Namespace Errors Problem:
AttributeError: 'Together' object has no attribute 'evaluation'
Solution: The namespace was renamed:
# Old
client.evaluation.create(...)

# New
client.evals.create(...)

Best Practices

Type Safety Take advantage of improved typing:
from together.types.chat import completion_create_params
from together.types.chat.chat_completion import ChatCompletion
from typing import List


def create_chat_completion(
    messages: List[completion_create_params.Message],
) -> ChatCompletion:
    return client.chat.completions.create(
        model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo", messages=messages
    )
HTTP Client Configuration The new SDK uses httpx. Configure it as needed:
import httpx

client = Together(
    timeout=httpx.Timeout(60.0, connect=10.0),
    http_client=httpx.Client(verify=True, headers={"User-Agent": "MyApp/1.0"}),
)

Getting Help

If you encounter issues during migration: