Batch Inference

Process jobs asynchronously with the Batch API.

Learn how to use the Batch API to send asynchronous groups of requests with 50% lower costs, higher rate limits, and flexible completion windows. The service is ideal for processing jobs that don't require immediate responses.

Overview

The Batch API enables you to process large volumes of requests asynchronously at 50% lower cost compared to real-time API calls. It's perfect for workloads that don't need immediate responses such as:

  • Running evaluations and data analysis
  • Classifying large datasets
  • Offline summarization
  • Synthetic data generation
  • Content generation for marketing
  • Dataset processing and transformations

Compared to using standard endpoints directly, Batch API offers:

  • Better cost efficiency: 50% cost discount compared to synchronous APIs
  • Higher rate limits: Substantially more headroom with separate rate limit pools
  • Large-scale support: Process thousands of requests per batch
  • Flexible completion: Best-effort completion within 24 hours with progress tracking

Getting started

Note: Make sure your together version number is >1.5.13. Run pip install together --upgradeto upgrade if needed.

1. Prepare your batch file

Batches start with a .jsonl file where each line contains the details of an individual request to the API. The available endpoint is /v1/chat/completions (Chat Completions API). Each request must include a unique custom_id value, which you can use to reference results after completion. Here's an example of an input file with 2 requests:

{"custom_id": "request-1", "body": {"model": "deepseek-ai/DeepSeek-V3", "messages": [{"role": "user", "content": "Hello, world!"}], "max_tokens": 200}}
{"custom_id": "request-2", "body": {"model": "deepseek-ai/DeepSeek-V3", "messages": [{"role": "user", "content": "Explain quantum computing"}], "max_tokens": 200}}

Each line in your batch file must follow this schema:

FieldTypeRequiredDescription
custom_idstringYesUnique identifier for tracking (max 64 chars)
bodyobjectYesThe request body matching the endpoint's schema

2. Upload your batch input file

You must first upload your input file so that you can reference it correctly when creating batches. Upload your .jsonl file using the Files API with purpose=batch-api.

Upload files for Batch API

from together import Together

client = Together()

# Uploads batch job file
file_resp = client.files.upload(file="batch_input.jsonl", purpose="batch-api") 
together files upload batch_input.jsonl

This will return a file object with id and other details:

FileResponse(
  id='file-fa37fdce-89cb-414b-923c-2add62250155', 
  object=<ObjectType.File: 'file'>, 
	...
  filename='simpleqa_batch_requests.jsonl', 
  bytes=1268723, 
  line_count=0, 
  processed=True, 
  FileType='jsonl')

3. Create the batch

Once you've successfully uploaded your input file, you can use the input File object's ID to create a batch. The completion window can be set to 24h. For now, the completion window defaults to 24h and cannot be changed. You can also provide custom metadata.

Create the Batch

file_id = file_resp.id

batch = client.batches.create_batch(file_id, endpoint="/v1/chat/completions")

print(batch.id)

This request will return a Batch object with metadata about your batch:

{
  "id": "batch-xyz789",
  "status": "VALIDATING",
  "endpoint": "/v1/chat/completions",
  "input_file_id": "file-abc123",
  "created_at": "2024-01-15T10:00:00Z",
  "request_count": 0,
  "model_id": null
}

4. Check the status of a batch

You can check the status of a batch at any time, which will return updated batch information.

Check the status of a batch

batch_stat = client.batches.get_batch(batch.id)

print(batch_stat.status)

The status of a given Batch object can be any of the following:

StatusDescription
VALIDATINGThe input file is being validated before the batch can begin
IN_PROGRESSBatch is in progress
COMPLETEDBatch processing completed successfully
FAILEDBatch processing failed
EXPIREDBatch exceeded deadline
CANCELLEDBatch was cancelled

5. Retrieve the results

Once the batch is complete, you can download the output by making a request to retrieve the output file using the output_file_id field from the Batch object.

Retrieving the batch results

from together import Together

client = Together()

# Get the batch status to find output_file_id
batch = client.batches.get_batch('batch-xyz789')

if batch.status == 'COMPLETED':
    # Download the output file
    client.files.retrieve_content(id=batch_stat.output_file_id, output="batch_output.jsonl")

The output .jsonl file will have one response line for every successful request line in the input file. Any failed requests will have their error information in a separate error file accessible via error_file_id.

Note that the output line order may not match the input line order. Use the custom_id field to map requests to results.

6. Get a list of all batches

At any time, you can see all your batches.

Getting a list of all batches

from together import Together

client = Together()

# List all batches
batches = client.batches.list_batches()

for batch in batches:
    print(batch)

Model availability

The following models are supported for batch processing:

Model IDSize
deepseek-ai/DeepSeek-R1685B
deepseek-ai/DeepSeek-V3671B
meta-llama/Llama-3-70b-chat-hf70B
meta-llama/Llama-3.3-70B-Instruct-Turbo70B
meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP817B
meta-llama/Llama-4-Scout-17B-16E-Instruct17B
meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo405B
meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo70B
meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo8B
mistralai/Mistral-7B-Instruct-v0.17B
mistralai/Mixtral-8x7B-Instruct-v0.18x7B
Qwen/Qwen2.5-72B-Instruct-Turbo72B
Qwen/Qwen2.5-7B-Instruct-Turbo7B
Qwen/Qwen3-235B-A22B-fp8-tput235B
Qwen/QwQ-32B32B

Rate limits

Batch API rate limits are separate from existing per-model rate limits. The Batch API has specific rate limits:

  • Max Token limits: A maximum of 10M tokens can be enqueued per model
  • Per-batch limits: A single batch may include up to 50,000 requests
  • Batch file size: Maximum 100MB per batch input file
  • Separate pool: Batch API usage doesn't consume tokens from standard rate limits

Error handling

When errors occur during batch processing, they are recorded in a separate error file accessible via the error_file_id field. Common error codes include:

Error CodeDescriptionSolution
400Invalid request formatCheck JSONL syntax and required fields
401Authentication failedVerify API key
404Batch not foundCheck batch ID
429Rate limit exceededReduce request frequency
500Server errorRetry with exponential backoff

Error File Format:

{"custom_id": "req-1", "error": {"message": "Invalid model specified", "code": "invalid_model"}}
{"custom_id": "req-5", "error": {"message": "Request timeout", "code": "timeout"}}

Batch expiration

Batches that do not complete within the 24-hour window will move to an EXPIRED state. Unfinished requests are cancelled, and completed requests are made available via the output file. You will only be charged for tokens consumed from completed requests. Batches are best effort completion within 24 hours.

Best practices

Optimal Batch Size

  • Aim for 1,000-10,000 requests per batch for best performance
  • Maximum 50,000 requests per batch
  • Keep file size under 100MB

Error Handling

  • Always check the error_file_id for partial failures
  • Implement retry logic for failed requests
  • Use unique custom_id values for easy tracking

Model Selection

  • Choose models based on your quality/cost requirements
  • Smaller models (7B-17B) for simple tasks
  • Larger models (70B+) for complex reasoning

Request Formatting

  • Validate JSON before submission
  • Use consistent schema across requests
  • Include all required fields

Monitoring

  • Poll status endpoint every 30-60 seconds
  • Set up notifications for completion (if available)

FAQ

Q: How long do batches take to complete?
A: Processing time depends on batch size and model complexity. Most batches typically complete within 1-12 hours, but can take up to 24 hours (or only partially complete within 24 hours) depending on inference capacity.

Q: Can I cancel a running batch?
A: Currently, batches cannot be cancelled once processing begins.

Q: What happens if my batch exceeds the deadline?
A: The batch will be marked as EXPIRED and partial results may be available.

Q: Are results returned in the same order as requests?
A: No, results may be in any order. Use custom_id to match requests with responses.

Q: Can I use the same file for multiple batches?
A: Yes, uploaded files can be reused for multiple batch jobs.