Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Standard chat models return plain text, which is hard to parse if your app needs to read specific fields from the response. Supported models can return JSON that conforms to any schema you supply, so you can read the output directly in code without retries or fragile parsing. Pass the schema in the response_format key on the chat completions request.

Supported models

For the current list of models that support structured outputs, see the serverless and dedicated endpoint model catalogs.

Basic example

Pass a transcript of a voice note to a model and ask it to return a summary in this shape:
JSON
{
  "title": "A title for the voice note",
  "summary": "A short one-sentence summary of the voice note",
  "actionItems": ["Action item 1", "Action item 2"]
}
To enforce the structure, give the model a JSON Schema. Writing JSON Schema by hand is tedious, so use a helper library: Pydantic in Python, Zod in TypeScript. Include the schema in the system prompt and pass it via the response_format key:
import json
import together
from pydantic import BaseModel, Field

client = together.Together()


## Define the schema for the output
class VoiceNote(BaseModel):
    title: str = Field(description="A title for the voice note")
    summary: str = Field(
        description="A short one sentence summary of the voice note."
    )
    actionItems: list[str] = Field(
        description="A list of action items from the voice note"
    )


def main():
    transcript = (
        "Good morning! It's 7:00 AM, and I'm just waking up. Today is going to be a busy day, "
        "so let's get started. First, I need to make a quick breakfast. I think I'll have some "
        "scrambled eggs and toast with a cup of coffee. While I'm cooking, I'll also check my "
        "emails to see if there's anything urgent."
    )

    # Call the LLM with the JSON schema
    extract = client.chat.completions.create(
        messages=[
            {
                "role": "system",
                "content": f"The following is a voice message transcript. Only answer in JSON and follow this schema {json.dumps(VoiceNote.model_json_schema())}.",
            },
            {
                "role": "user",
                "content": transcript,
            },
        ],
        model="Qwen/Qwen3.5-9B",
        reasoning={"enabled": False},
        response_format={
            "type": "json_schema",
            "json_schema": {
                "name": "voice_note",
                "schema": VoiceNote.model_json_schema(),
            },
        },
    )

    output = json.loads(extract.choices[0].message.content)
    print(json.dumps(output, indent=2))
    return output


main()
The model responds with output that matches the schema:
JSON
{
  "title": "Morning Routine",
  "summary": "Starting the day with a quick breakfast and checking emails",
  "actionItems": [
    "Cook scrambled eggs and toast",
    "Brew a cup of coffee",
    "Check emails for urgent messages"
  ]
}

Prompt the model

Always tell the model to respond only in JSON and include a plain-text copy of the schema in the prompt (as a system prompt or a user message). Send this instruction in addition to passing the schema via the response_format parameter. The combination of an explicit “respond in JSON” direction, the schema text in the prompt, and the response_format setting produces consistent, valid JSON every time.

Regex example

Every model that supports JSON mode also supports regex mode. The example below uses regex to constrain a sentiment classification to one of three labels.
import together

client = together.Together()

completion = client.chat.completions.create(
    model="meta-llama/Llama-3.3-70B-Instruct-Turbo",
    messages=[
        {
            "role": "system",
            "content": "You are an AI-powered expert specializing in classifying sentiment. You will be provided with a text, and your task is to classify its sentiment as positive, neutral, or negative.",
        },
        {"role": "user", "content": "Wow. I loved the movie!"},
    ],
    response_format={
        "type": "regex",
        "pattern": "(positive|neutral|negative)",
    },
)

print(completion.choices[0].message.content)
Structured outputs work with reasoning models too. See Structured outputs with reasoning models on the reasoning page.You can also combine structured outputs with vision models to extract typed data from images. See Structured extraction with vision models on the vision page.

Troubleshooting

If your generated JSON gets cut off, contains stray characters, or fails to parse, the cause is usually one of two things. Token limits: The model can run out of output budget mid-structure. Check the max_tokens you’re sending against the model’s ceiling, and watch for a finish_reason of length in the response. If the model truncates, the JSON is incomplete (unterminated strings, missing closing brackets) regardless of how good your schema is. Either raise max_tokens or simplify the schema. Malformed example JSON: If your prompt includes an example JSON object, the model follows the example exactly, syntax errors and all. Validate any JSON you embed in prompts before using it. Common symptoms of a bad example: unterminated strings, repeated newlines, repeated keys, or output that stops abruptly with finish_reason: stop.

Test schemas in the Together playground

Test variations on your schema and prompts in the Together model playground: Open the Response format dropdown in the right sidebar, choose JSON, select Add schema, then paste in your schema.