Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Using a coding agent? Install the together-chat-completions skill to let your agent write correct chat inference code automatically. See agent skills for details.

Send a single query

Use chat.completions.create to send a single query to a chat model:
from together import Together

client = Together()

response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    reasoning={"enabled": False},
    messages=[
        {
            "role": "user",
            "content": "What are some fun things to do in New York?",
        }
    ],
)

print(response.choices[0].message.content)
The create method takes a model name and a messages array. Each message is an object with content and a role naming the author. In the example above, the role is user. The user role tells the model that the message comes from the end user of your system, for example, a customer using your chatbot app. The other two roles are assistant and system, covered below.

Multi-turn conversations

Every query to a chat model is self-contained, so models don’t automatically remember prior queries. The assistant role solves this by carrying historical context for how a model has responded to prior queries, which makes it useful for chatbots and long-running conversations. To provide a chat history for a new query, pass the previous messages to the messages array. Tag the user-provided messages with the user role and the model’s responses with the assistant role:
import os
from together import Together

client = Together()

response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    reasoning={"enabled": False},
    messages=[
        {
            "role": "user",
            "content": "What are some fun things to do in New York?",
        },
        {
            "role": "assistant",
            "content": "You could go to the Empire State Building!",
        },
        {"role": "user", "content": "That sounds fun! Where is it?"},
    ],
)

print(response.choices[0].message.content)
How your app stores historical messages is up to you.

Add a system prompt

You can query a model with just a user message, but you’ll typically want to give the model a system prompt with context for how to respond. For example, if you’re building a travel chatbot, you might tell the model to act like a helpful travel guide. To add a system prompt, provide an initial message with the system role:
import os
from together import Together

client = Together()

response = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    reasoning={"enabled": False},
    messages=[
        {"role": "system", "content": "You are a helpful travel guide."},
        {
            "role": "user",
            "content": "What are some fun things to do in New York?",
        },
    ],
)

print(response.choices[0].message.content)

Stream responses

Models take time to generate a full response. Streaming returns chunks as they’re produced, so your app can display partial results while the model is still running instead of waiting for the entire request to finish. To return a stream, set the stream option to True.
import os
from together import Together

client = Together()

stream = client.chat.completions.create(
    model="Qwen/Qwen3.5-9B",
    reasoning={"enabled": False},
    messages=[
        {
            "role": "user",
            "content": "What are some fun things to do in New York?",
        }
    ],
    stream=True,
)

for chunk in stream:
    if chunk.choices:
        print(chunk.choices[0].delta.content or "", end="", flush=True)

Run async requests in parallel from Python

By default, Python’s Together client runs requests synchronously, so multiple queries execute in sequence even when they’re independent. To run independent calls in parallel, use the AsyncTogether module from the Python library:
Python
import os, asyncio
from together import AsyncTogether

async_client = AsyncTogether()
messages = [
    "What are the top things to do in San Francisco?",
    "What country is Paris in?",
]


async def async_chat_completion(messages):
    async_client = AsyncTogether(api_key=os.environ.get("TOGETHER_API_KEY"))
    tasks = [
        async_client.chat.completions.create(
            model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
            messages=[{"role": "user", "content": message}],
        )
        for message in messages
    ]
    responses = await asyncio.gather(*tasks)

    for response in responses:
        print(response.choices[0].message.content)


asyncio.run(async_chat_completion(messages))