Skip to main content
The Queue API provides asynchronous job processing for Dedicated Containers. Submit jobs to a managed queue, and workers automatically claim and process them. This model supports long-running inference, batch workloads, and explicit priority control.
New to Dedicated Containers? Start with the Overview to understand the platform, or jump to the Quickstart to deploy your first container.

Core Concepts

Jobs

A job is a single unit of work submitted to your deployment. Jobs can run for seconds or hours, making them ideal for:
  • Video generation
  • Batch image processing
  • Long-running inference tasks
  • Any workload that doesn’t fit the request-response pattern

Job Lifecycle

StatusDescription
pendingJob is queued, waiting for a worker to claim it
runningJob has been claimed and is being processed
doneJob completed successfully
failedJob failed with an error
canceledJob was canceled before processing started

Priority

Jobs are processed in strict order of priority first, then submission time. Priority is an integer where higher values are processed first.
# High priority job (processed first)
client.beta.queue.submit(model="my-model", payload={...}, priority=10)

# Normal priority job
client.beta.queue.submit(model="my-model", payload={...}, priority=1)

# Low priority job (processed last)
client.beta.queue.submit(model="my-model", payload={...}, priority=0)
By default, priority is not considered for autoscaling metrics—the autoscaler scales based on total queue depth regardless of priority. Contact [email protected] for advanced scaling policies that account for priority tiers.

Job State with info

The info field provides persistent state that survives across the job lifecycle. You can:
  1. Set initial state when submitting a job via the info parameter
  2. Update state during processing using emit() in your Sprocket worker
  3. Preserve state across retries—info accumulates rather than resets
This is useful for tracking progress, storing metadata, or passing context between retries.
job = client.beta.queue.submit(
    model="my-model",
    payload={"prompt": "A cat playing piano"},
    info={"user_id": "user_123", "tier": "premium"},
)
For full endpoint documentation — request parameters, response schemas, and error codes — see the Queue REST API Reference: submit, status, cancel, metrics.

Polling for Job Completion

For jobs that take time to complete, poll the status endpoint until the job reaches a terminal state (done, failed, or canceled).
import time
from together import Together

client = Together()

# Submit job
job = client.beta.queue.submit(
    model="my-deployment", payload={"prompt": "Generate a video of a sunset"}
)

print(f"Submitted job: {job.request_id}")

# Poll for completion
while True:
    status = client.beta.queue.retrieve(
        request_id=job.request_id, model="my-deployment"
    )

    if status.status == "done":
        print(f"Success! Result: {status.outputs}")
        break
    elif status.status == "failed":
        print(f"Failed: {status.error}")
        break
    elif status.status == "canceled":
        print("Job was canceled")
        break
    else:
        # Show progress if available
        if status.info and "progress" in status.info:
            print(f"Progress: {status.info['progress']:.0%}")
        time.sleep(2)  # Poll every 2 seconds

Best Practices

Use Priority for Tiered Service

Implement different service tiers by assigning priority based on customer type:
def submit_job(user, payload):
    priority = 10 if user.tier == "premium" else 1
    return client.beta.queue.submit(
        model="my-deployment",
        payload=payload,
        priority=priority,
        info={"user_id": user.id, "tier": user.tier},
    )

Track Progress for Long-Running Jobs

For jobs that take more than a few seconds, emit progress updates so clients can show status:
class VideoGenerator(Sprocket):
    def predict(self, args: dict) -> dict:
        total_frames = args.get("num_frames", 60)

        for i, frame in enumerate(self.generate_frames(args)):
            emit(
                {
                    "progress": (i + 1) / total_frames,
                    "current_frame": i + 1,
                    "total_frames": total_frames,
                }
            )

        return {"video": FileOutput("output.mp4")}

Handle All Terminal States

Always check for done, failed, and canceled when polling:
terminal_states = {"done", "failed", "canceled"}

while status.status not in terminal_states:
    time.sleep(2)
    status = client.beta.queue.retrieve(...)

Store Metadata in info

Use info to store job metadata that you’ll need when the job completes:
job = client.beta.queue.submit(
    model="my-deployment",
    payload={"prompt": "..."},
    info={
        "user_id": "user_123",
        "callback_url": "https://myapp.com/webhook",
        "requested_at": datetime.now().isoformat(),
    },
)

Error Codes

CodeDescription
400Invalid request (missing required fields, malformed payload)
401Unauthorized (invalid or missing API key)
404Job or deployment not found
409Cannot cancel job (already running or completed)
500Internal server error