This tutorial walks through a complete batch inference job from start to finish. By the end you’ll have uploaded a JSONL file of chat completion requests, run them as a single job at up to 50% off serverless rates, and reconciled the responses with your original inputs.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Before you begin, make sure you have:- Created an account and generated an API key.
- Set your API key as an environment variable in your terminal.
- Installed the Python or TypeScript SDK. Python examples require
together>=2.0.0.
Step 1: Prepare a JSONL input file
Each request lives on its own line in a JSONL file. A request has two fields: acustom_id you choose, and a body matching the schema of the endpoint you’re calling. The Batch API runs every line independently and stamps each output with the same custom_id, so this is how you’ll map results back to inputs at the end.
Save the following as batch_input.jsonl:
batch_input.jsonl
| Field | Type | Required | Description |
|---|---|---|---|
custom_id | string | Yes | Unique identifier for tracking (max 64 chars). |
body | object | Yes | Request body matching the endpoint’s schema. |
Step 2: Upload the file
Upload the JSONL file withpurpose="batch-api". The upload returns a file object whose id you’ll pass to the batch job in the next step.
Pass check=False to skip client-side validation. The server still validates the file during the VALIDATING phase, and skipping the client check is faster for large files without changing the error surface.
Step 3: Create the batch
Now hand the uploaded file’sid to the batch endpoint, along with the API endpoint each request should run against. For chat completion requests, that’s /v1/chat/completions.
batches.create() returns a wrapper; the batch object lives at .job. batches.retrieve() (used in the next step) returns the batch object directly.Step 4: Poll for completion
The job moves throughVALIDATING, then IN_PROGRESS, then a terminal status: COMPLETED, FAILED, EXPIRED, or CANCELLED. Poll every 30 to 60 seconds until you hit a terminal status. Tighter loops will hit rate limits without giving the server time to make progress.
Step 5: Retrieve the results
When the job reachesCOMPLETED, the batch object carries an output_file_id. Download that file and you’ll get one JSON object per line, each keyed by the custom_id from your input. Output line order does not match input line order, so use custom_id to reconcile.
error_file_id. Always check it: a batch can be COMPLETED and still contain individual request failures. See retrieve results and error files on the manage page.
Next steps
- Manage batch jobs: cancel, list, and download error files.
- Batch processing overview: rate limits, discounted models, best practices, and FAQ.