What You’ll Learn
- Deploying a custom model with Sprocket and Jig
- Returning base64-encoded images from your worker
- Submitting jobs via the Queue API and polling for results
- Configuring autoscaling for production workloads
Prerequisites
- Together API Key – Get one from together.ai
- Dedicated Containers access – Contact [email protected] to enable for your organization
- Docker – For building container images. Install Docker
- Together CLI – Install with
pip install together --upgradeoruv tool install together
Overview
This example deploys a Flux2 text-to-image model as a Dedicated Container. The Sprocket worker handles job processing, and Together manages GPU provisioning, autoscaling, and observability. What gets deployed:- A Sprocket worker running on an H100 GPU
- Queue-based job processing for async image generation
- Automatic scaling based on queue depth
How It Works
- Build – Jig builds a Docker image from your
pyproject.tomlconfiguration - Push – The image is pushed to Together’s private container registry
- Deploy – Together provisions an H100 GPU and starts your container
- Queue – Jobs are submitted to the managed queue and processed by your Sprocket worker
- Scale – The autoscaler adjusts replicas based on queue depth
Project Structure
Implementation
Sprocket Worker Code
Configuration
Key Concepts
Base64 Image Encoding
Images are returned as base64-encoded strings for JSON compatibility:Generation Parameters
Flux2 supports several parameters to control generation:| Parameter | Default | Description |
|---|---|---|
prompt | "a cat" | Text description of the image |
num_inference_steps | 28 | Denoising steps (more = better quality, slower) |
guidance_scale | 4.0 | How closely to follow the prompt (higher = more literal) |
Using the Deployment Name from Environment
The deployment name is read from the environment, with a fallback default:TOGETHER_DEPLOYMENT_NAME.
Deployment
Deploy
Check Deployment Status
running and replicas are ready before submitting jobs.
Submit Jobs
Jobs are submitted to the managed queue and processed asynchronously. You’ll need to poll for the result.Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt | string | "a cat" | Text description of the image to generate |
num_inference_steps | int | 28 | Number of denoising steps |
guidance_scale | float | 4.0 | Classifier-free guidance scale |
Output
image: Base64-encoded PNG image dataformat: Image format (always"png")encoding: Encoding type (always"base64")
Batch Processing and Autoscaling
The configuration above can be updated to include autoscaling by increasing themax_replicas parameter. Then when the queue backlog grows, more replicas are added automatically. When workers are idle, replicas are removed (down to min_replicas).
To scale more aggressively for high-throughput workloads:
min_replicas = 0 (saves costs but adds cold start latency):
Cleanup
When you’re done, delete the deployment:Next Steps
- Video Generation Example – Multi-GPU inference with torchrun
- Quickstart – Deploy your first container in 20 minutes
- Sprocket SDK – Full SDK reference for workers
- Jig CLI Reference – CLI commands and configuration options
- Deployments API Reference – REST API for deployments, secrets, storage, and queues