Process jobs asynchronously with the Batch API.
together
version number is >1.5.13. Run pip install together --upgrade
to upgrade if needed.
.jsonl
file where each line contains the details of an individual request to the API. The available endpoint is /v1/chat/completions
(Chat Completions API). Each request must include a unique custom_id
value, which you can use to reference results after completion. Here’s an example of an input file with 2 requests:
Field | Type | Required | Description |
---|---|---|---|
custom_id | string | Yes | Unique identifier for tracking (max 64 chars) |
body | object | Yes | The request body matching the endpoint’s schema |
.jsonl
file using the Files API with purpose=batch-api
.
Upload files for Batch API
id
and other details:
24h
. For now, the completion window defaults to 24h
and cannot be changed. You can also provide custom metadata.
Create the Batch
Status | Description |
---|---|
VALIDATING | The input file is being validated before the batch can begin |
IN_PROGRESS | Batch is in progress |
COMPLETED | Batch processing completed successfully |
FAILED | Batch processing failed |
EXPIRED | Batch exceeded deadline |
CANCELLED | Batch was cancelled |
output_file_id
field from the Batch object.
Retrieving the batch results
.jsonl
file will have one response line for every successful request line in the input file. Any failed requests will have their error information in a separate error file accessible via error_file_id
.
Note that the output line order may not match the input line order. Use the custom_id
field to map requests to results.
Model ID | Size |
---|---|
deepseek-ai/DeepSeek-R1 | 685B |
deepseek-ai/DeepSeek-V3 | 671B |
meta-llama/Llama-3-70b-chat-hf | 70B |
meta-llama/Llama-3.3-70B-Instruct-Turbo | 70B |
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 17B |
meta-llama/Llama-4-Scout-17B-16E-Instruct | 17B |
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 405B |
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | 70B |
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 8B |
mistralai/Mistral-7B-Instruct-v0.1 | 7B |
mistralai/Mixtral-8x7B-Instruct-v0.1 | 8x7B |
Qwen/Qwen2.5-72B-Instruct-Turbo | 72B |
Qwen/Qwen2.5-7B-Instruct-Turbo | 7B |
Qwen/Qwen3-235B-A22B-fp8-tput | 235B |
Qwen/QwQ-32B | 32B |
error_file_id
field. Common error codes include:
Error Code | Description | Solution |
---|---|---|
400 | Invalid request format | Check JSONL syntax and required fields |
401 | Authentication failed | Verify API key |
404 | Batch not found | Check batch ID |
429 | Rate limit exceeded | Reduce request frequency |
500 | Server error | Retry with exponential backoff |
EXPIRED
state. Unfinished requests are cancelled, and completed requests are made available via the output file. You will only be charged for tokens consumed from completed requests. Batches are best effort completion within 24 hours.
error_file_id
for partial failurescustom_id
values for easy trackingcustom_id
to match requests with responses.
Q: Can I use the same file for multiple batches?