Prerequisites
- Together API Key – Required for all operations. Get one from together.ai.
- Dedicated Containers access – Contact your account representative or [email protected] to enable Dedicated Containers for your organization.
- Docker – For building and pushing container images. Get it here.
- uv (optional) – For Python/package management. Install from astral-sh/uv.
Step 1: Install the Together CLI
Step 2: Clone the Sprocket Examples
sprocket/examples/hello_world, is a minimal Sprocket that returns a greeting:
Step 3: Build and Deploy
Deployments can be configured with apyproject.toml file.
The deployment name, set by the configuration, must be globally unique. The example worker uses this pyproject.toml configuration:
Change the project name in
pyproject.toml and use this name for the rest of the tutorial.- Builds the Docker image from the example
- Pushes it to Together’s private registry
- Creates a deployment on Together’s GPU infrastructure
Step 4: Watch Deployment Status
running and replicas are ready. Press Ctrl+C to stop watching. Note that watch is not installed by default on MacOS, use brew install watch or your package manager of choice.
You can also view the status of your deployments from the Together AI web console.
Step 5: Test the Health Endpoint
Step 6: Submit a Job
request_id for the next step.
Step 7: Get the Job Result
Real request IDs use UUIDv7 format (e.g.,
019ba379-92da-71e4-ac40-d98059fd67c7). Replace req_abc123 with your actual request ID from the submit response.Step 8: View Logs
Stream logs from your deployment:Step 9: Clean Up
When you’re done, delete the deployment:Next Steps
Now that you’ve deployed your first container, explore the full platform:- Dedicated Containers Overview – Architecture and concepts
- Jig CLI – Build, push, deploy, secrets, and volumes
- Sprocket SDK – Build queue-integrated inference workers
- API Reference – REST API for deployments, secrets, and queues
Example Guides
- Image Generation with Flux2 – Single-GPU inference with 4-bit quantization
- Video Generation with Wan 2.1 – Multi-GPU inference with torchrun