Skip to main content

Step 1: Create an API key

  1. Register for an account if you don’t have one.
  2. Go to your project’s API keys page.
  3. Select Create key, give it a name, and copy the value. New keys are only shown once, so make sure to save it somewhere safe.
  4. Export the key as an environment variable in your terminal:
export TOGETHER_API_KEY="your_api_key"
The SDK reads TOGETHER_API_KEY automatically when you call Together(). Pass api_key= to the constructor to override it.

Step 2: Install the SDK

Together AI publishes official SDKs for Python and TypeScript. You can also use the OpenAI SDK pointed at our base URL, or call the REST API directly from any language.
uv init --no-workspace # optional
uv add together

Step 3: Run your first query

The example below sends a chat completion request to MiniMax M3 and prints the response:
from together import Together

client = Together()  # reads TOGETHER_API_KEY from environment

response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M3",
    messages=[
        {
            "role": "user",
            "content": "What are the top 3 things to do in New York?",
        }
    ],
)

print(response.choices[0].message.content)
Save the snippet to a file, then run it. (The cURL command runs directly in your terminal.)
uv run main.py
After a few seconds, you should see the response printed to your terminal.

Going further

Try some of these variations to see what else the model can do:

Stream the response

Streaming returns the response token by token as it’s generated, instead of making you wait for the full reply. This is especially helpful with a reasoning model like MiniMax M3, which works through a problem before answering and can produce a lot of output. A reasoning model’s response has two parts: the step-by-step thinking, in a reasoning field, and the final answer, in content. Set stream=True (Python) or stream: true (TypeScript/cURL) and read both fields off each chunk’s delta:
from together import Together

client = Together()  # reads TOGETHER_API_KEY from environment

stream = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M3",
    messages=[
        {
            "role": "user",
            "content": "What are the top 3 things to do in New York?",
        }
    ],
    stream=True,
)

printed_answer_header = False

for chunk in stream:
    if not chunk.choices:
        continue
    delta = chunk.choices[0].delta

    # Reasoning models return their thinking in a separate `reasoning` field.
    if getattr(delta, "reasoning", None):
        print(delta.reasoning, end="", flush=True)

    # The final answer arrives in `content`.
    if getattr(delta, "content", None):
        if not printed_answer_header:
            print("\n\n--- Answer ---\n", flush=True)
            printed_answer_header = True
        print(delta.content, end="", flush=True)
With a non-reasoning model, reasoning stays empty and only content is returned, so the same loop works unchanged.

Add a system prompt

Prepend a system message to set the model’s tone, role, or constraints:
from together import Together

client = Together()

response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M3",
    messages=[
        {
            "role": "system",
            "content": "You are a concise travel guide. Answer in two sentences or fewer.",
        },
        {
            "role": "user",
            "content": "What are the top 3 things to do in New York?",
        },
    ],
)

print(response.choices[0].message.content)

Get structured JSON output

Pass a JSON schema via response_format to get parseable JSON back:
from pydantic import BaseModel
from together import Together

client = Together()


class Activity(BaseModel):
    name: str
    neighborhood: str
    why: str


class Itinerary(BaseModel):
    city: str
    activities: list[Activity]


response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M3",
    messages=[
        {"role": "user", "content": "Suggest 3 things to do in New York."},
    ],
    response_format={
        "type": "json_schema",
        "json_schema": {
            "name": "Itinerary",
            "schema": Itinerary.model_json_schema(),
        },
    },
)

itinerary = Itinerary.model_validate_json(response.choices[0].message.content)
print(itinerary)

Analyze an image

MiniMax M3 also accepts images. Add an image_url block to the user message to ask questions about a picture:
from together import Together

client = Together()

response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M3",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe this image in one sentence.",
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
                    },
                },
            ],
        }
    ],
)

print(response.choices[0].message.content)

Use the OpenAI SDK

If you’re already using the OpenAI SDK, you can point it at Together’s base URL (https://api.together.ai/v1) and keep the rest of your code the same:
import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["TOGETHER_API_KEY"],
    base_url="https://api.together.ai/v1",
)

response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M3",
    messages=[
        {
            "role": "user",
            "content": "What are the top 3 things to do in New York?",
        }
    ],
)

print(response.choices[0].message.content)
See OpenAI compatibility for the full list of supported endpoints and parameters.

Next steps

Choose a model

Browse the catalog of models for chat, coding, vision, and reasoning.

Dedicated endpoints

Reserve GPUs for steady traffic or fine-tuned models.

Fine-tune a model

Train a model on your own data with LoRA, DPO, or full fine-tuning.

GPU clusters

Run large-scale training and custom workloads on dedicated GPU clusters.