Prerequisites
- A completed fine-tuning job. See the quickstart for the full lifecycle.
- The job’s
x_model_output_name(visible once status iscompleted). It follows the pattern<your_account>/<base_model>:<suffix>:<job_id>.
Deploy on a dedicated endpoint
- CLI / SDK
- UI
If endpoint creation fails immediately with “There was an issue starting your endpoint”, the cause is almost always an incompatible base model. Verify with
client.endpoints.list_hardware(model=...); a 404 means the base (often a -Reference model) can’t host a fine-tune. Pick a different base before retrying.Run locally
To run your model outside Together, download the checkpoint by job ID:Choose a checkpoint type
Thecheckpoint parameter selects what to download. It’s required for the v2 SDK’s content() method and the GET /v1/finetune/download endpoint, unless you pass checkpoint_step, which downloads a specific intermediate step and overrides checkpoint. Valid values depend on how the job was trained.
| Job type | Valid checkpoint values | What you get |
|---|---|---|
| LoRA fine-tune | merged, adapter, or model_output_path | merged combines the base model and adapter into self-contained weights, the usual choice for running the model locally or uploading it elsewhere. adapter returns only the LoRA adapter weights, so you can load them on top of the base model yourself (for example, with PEFT or vLLM). |
| Full fine-tune | model_output_path only | The full set of trained model weights. merged and adapter return an error for full fine-tunes. |
model_output_path returns the raw training output directory before any merging. It works for both job types but is mainly useful for advanced workflows that need the unmodified artifacts: for LoRA jobs, prefer merged or adapter; for full fine-tunes, it’s the only option.
The v1 SDK’s client.fine_tuning.download() selects the checkpoint automatically (merged for LoRA jobs, model_output_path for full fine-tunes), so you don’t pass a checkpoint argument there.
The output is a .tar.zst archive that uses ZStandard compression. On macOS, install zstd with Homebrew and decompress:
Python
--checkpoint-step <STEP_NUMBER> to tg fine-tuning download (or checkpoint_step=<STEP_NUMBER> to client.fine_tuning.content()). List checkpoints with tg fine-tuning list-checkpoints <JOB_ID>.
Troubleshooting
x_model_output_nameis empty: The job hasn’t reachedcompleted. Poll status withclient.fine_tuning.retrieve(id=...)until it’s done. See Monitor a fine-tuning job for the polling pattern.- Endpoint creation fails immediately: Run
client.endpoints.list_hardware(model=<base_model>). A 404 means the base can’t host a fine-tune.-Referencemodels fall into this bucket. - 404 on inference: Use
endpoint.nameas themodelparameter, not the raw output model name. The endpoint name includes a unique suffix that routes traffic to your deployment.
Next steps
Upload a custom model
Upload your own model weights from outside the Together catalog.
Upload a LoRA adapter
Load a LoRA adapter onto a shared base instead of deploying a full model.
Manage endpoints
Inspect, start, stop, update, and delete dedicated endpoints.
Endpoint settings
Tune autoscaling, decoding, and auto-shutdown.