Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

This tutorial walks through a complete batch inference job from start to finish. By the end you’ll have uploaded a JSONL file of chat completion requests, run them as a single job at up to 50% off serverless rates, and reconciled the responses with your original inputs.

Prerequisites

Before you begin, make sure you have:

Step 1: Prepare a JSONL input file

Each request lives on its own line in a JSONL file. A request has two fields: a custom_id you choose, and a body matching the schema of the endpoint you’re calling. The Batch API runs every line independently and stamps each output with the same custom_id, so this is how you’ll map results back to inputs at the end. Save the following as batch_input.jsonl:
batch_input.jsonl
{"custom_id": "request-1", "body": {"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", "messages": [{"role": "user", "content": "Hello, world!"}], "max_tokens": 200}}
{"custom_id": "request-2", "body": {"model": "meta-llama/Llama-3.3-70B-Instruct-Turbo", "messages": [{"role": "user", "content": "Explain quantum computing."}], "max_tokens": 200}}
FieldTypeRequiredDescription
custom_idstringYesUnique identifier for tracking (max 64 chars).
bodyobjectYesRequest body matching the endpoint’s schema.

Step 2: Upload the file

Upload the JSONL file with purpose="batch-api". The upload returns a file object whose id you’ll pass to the batch job in the next step. Pass check=False to skip client-side validation. The server still validates the file during the VALIDATING phase, and skipping the client check is faster for large files without changing the error surface.
from together import Together

client = Together()

file_resp = client.files.upload(
    file="batch_input.jsonl",
    purpose="batch-api",
    check=False,
)

print(file_resp.id)

Step 3: Create the batch

Now hand the uploaded file’s id to the batch endpoint, along with the API endpoint each request should run against. For chat completion requests, that’s /v1/chat/completions.
response = client.batches.create(
    input_file_id=file_resp.id,
    endpoint="/v1/chat/completions",
)

batch = response.job
print(batch.id)
batches.create() returns a wrapper; the batch object lives at .job. batches.retrieve() (used in the next step) returns the batch object directly.

Step 4: Poll for completion

The job moves through VALIDATING, then IN_PROGRESS, then a terminal status: COMPLETED, FAILED, EXPIRED, or CANCELLED. Poll every 30 to 60 seconds until you hit a terminal status. Tighter loops will hit rate limits without giving the server time to make progress.
import time

while True:
    batch = client.batches.retrieve(batch.id)
    print(f"{batch.status}: {batch.progress:.0f}%")

    if batch.status == "COMPLETED":
        break
    if batch.status in ("FAILED", "EXPIRED", "CANCELLED"):
        raise SystemExit(f"Batch ended: {batch.status}")

    time.sleep(30)
Most batches under 1,000 requests finish in minutes. The 24-hour completion window is a maximum, not a typical wait.

Step 5: Retrieve the results

When the job reaches COMPLETED, the batch object carries an output_file_id. Download that file and you’ll get one JSON object per line, each keyed by the custom_id from your input. Output line order does not match input line order, so use custom_id to reconcile.
with client.files.with_streaming_response.content(
    id=batch.output_file_id,
) as response:
    with open("batch_output.jsonl", "wb") as f:
        for chunk in response.iter_bytes():
            f.write(chunk)
A successful output line looks like:
{
  "custom_id": "request-1",
  "response": {
    "status_code": 200,
    "body": {
      "choices": [
        {
          "index": 0,
          "message": { "role": "assistant", "content": "Hello!" },
          "finish_reason": "stop"
        }
      ],
      "usage": { "prompt_tokens": 12, "completion_tokens": 3, "total_tokens": 15 }
    }
  }
}
Per-request failures land in a separate file referenced by error_file_id. Always check it: a batch can be COMPLETED and still contain individual request failures. See retrieve results and error files on the manage page.

Next steps