sprocket.Sprocket
Base class for inference workers.
| Method | Signature | Description |
|---|---|---|
setup | setup(self) -> None | Called once at startup. Load models and resources. |
predict | predict(self, args: dict) -> dict | Called for each job. Process input and return output. |
shutdown | shutdown(self) -> None | Called on graceful shutdown. Clean up resources. Optional. |
| Attribute | Type | Default | Description |
|---|---|---|---|
processor | Type[InputOutputProcessor] | InputOutputProcessor | Custom I/O processor class |
warmup_inputs | list[dict] | [] | Inputs to run during cache warmup |
sprocket.run
Entry point for starting a Sprocket worker.
| Parameter | Type | Description |
|---|---|---|
sprocket | Sprocket | Your Sprocket instance |
name | str | Deployment name (used for queue routing) |
use_torchrun | bool | Enable multi-GPU mode via torchrun. Default: False |
sprocket.FileOutput
Wraps a local file path for automatic upload after predict() returns. Extends pathlib.PosixPath.
FileOutput is replaced with the public URL in the final job result.
sprocket.emit_info
Report progress updates from inside predict(). Emitted data is available to clients via the info field on the job status endpoint.
| Parameter | Type | Description |
|---|---|---|
info | dict | Progress data to emit. Must serialize to under 4096 bytes JSON. |
use_torchrun=True, call emit_info() only from rank 0 to avoid duplicate updates.
sprocket.InputOutputProcessor
Override for custom file download/upload behavior. Attach to your Sprocket via the processor class attribute.
Custom I/O Processing
| Method | Signature | Description |
|---|---|---|
process_input_file | process_input_file(self, resp: httpx.Response, dst: pathlib.Path) -> None | Called after downloading each input file. Write resp.content to dst. |
finalize | async finalize(self, request_id: str, inputs: dict, outputs: dict) -> dict | Called after predict(), before FileOutput upload. Return modified outputs. |
process_input_file: writesresp.contenttodstfinalize: returnsoutputsunchanged
HTTP Endpoints
| Endpoint | Method | Response |
|---|---|---|
/health | GET | 200 {"status": "healthy"} or 503 {"status": "unhealthy"} |
/metrics | GET | requests_inflight 0.0 or 1.0 (Prometheus format) |
/generate | POST | Direct HTTP inference (non-queue mode) |
CLI Arguments
| Argument | Default | Description |
|---|---|---|
--queue | false | Enable queue worker mode |
--port | 8000 | HTTP server port |
Environment Variables
| Variable | Default | Description |
|---|---|---|
TOGETHER_API_KEY | Required | API key for queue authentication |
TOGETHER_API_BASE_URL | https://api.together.ai | API base URL |
TERMINATION_GRACE_PERIOD_SECONDS | 300 | Max time for graceful shutdown and prediction timeout |
WORLD_SIZE | 1 | Number of GPU processes (set automatically by torchrun) |