When to BYOM
Use the BYOM flow when:- You want to start from a community variant. A specialized model on Hugging Face (medical, legal, code) sometimes makes a better starting point than a generic base.
- You’re continuing your own previous work. Upload your last checkpoint to Hugging Face and resume training on Together.
- A new model isn’t in the catalog yet. As long as it has a supported architecture under 100B parameters, you can fine-tune it.
Prerequisites
Your model must meet these constraints:- Architecture: CausalLM only (text generation).
- Size: Under 100 billion parameters.
- Weights:
.safetensorsformat. - No custom code:
trust_remote_code=Trueis not allowed. - Access: The Hugging Face repo is public, or you have an API token with read access.
- Framework compatibility: Transformers v4.55 or earlier.
max_seq_length is no larger than your checkpoint supports.
Launch the job
Launch the job by pairing the base model (template) withfrom_hf_model (your checkpoint):
| Parameter | Purpose |
|---|---|
model | A base model from Together’s catalog. Its config provides the training template and inference setup. |
from_hf_model | The Hugging Face repo with your custom weights. |
hf_api_token | Only needed for private repos. Omit for public ones. Passing a dummy value can cause a 400 error. |
hf_model_revision | Optional. Pin to a specific commit hash instead of main. |
Pick the base template
Match these three variables to pick the base template:- Architecture: Must match (treat Code Llama as Llama, etc.).
- Size: As close to your custom checkpoint as the catalog allows. If every option is larger, pick the smallest.
- Max sequence length: The base’s max must be at least as large as your checkpoint’s; ideally not much larger.
HuggingFaceTB/SmolLM2-135M-Instruct has Llama architecture, 135M parameters, and an 8k context. The closest Llama in the catalog by parameter count is meta-llama/Llama-3.2-1B-Instruct, but its max context is 131k, much higher than the checkpoint supports. A better choice is togethercomputer/llama-2-7b-chat: larger than your checkpoint, but the max sequence length fits.
Watch and deploy
BYOM jobs use the same lifecycle as catalog jobs:- Poll the job with the SDK or CLI.
- Deploy the result on a dedicated endpoint. Your fine-tuned model appears under My Models in the dashboard once training completes.
client.endpoints.list_hardware(model=<base>) returns 404, the base can’t be deployed; pick a different one before training.
Troubleshooting
- Training failed with CUDA OOM: Reduce
batch_sizeor use a smaller base template. - Training failed with a checkpoint validation error: The architecture doesn’t match the base template or a parameter is out of range. Confirm the checkpoint is CausalLM and verify its
config.jsonagainst the base. - Training failed with a runtime error: Likely a corrupted or incomplete checkpoint. Re-upload to Hugging Face.
- Model uses
trust_remote_code: Not supported. Use a similar model that doesn’t, or contact support to add it to the catalog. - Internal errors: The platform notifies our team automatically. If the issue persists, contact support with the job ID.
FAQ
Can I fine-tune a LoRA adapter? Yes. The platform merges the adapter with the base during training, producing a full checkpoint rather than a separate adapter. Can I train a model I uploaded for dedicated inference? No. Models uploaded with custom-models are not visible to the fine-tuning API. Upload to Hugging Face instead and reference the repo asfrom_hf_model.
Will my fine-tuned model work for inference?
Yes, when the base you specified is supported, the architecture matches, and training completes successfully. Models built on unsupported architectures may not run reliably; contact support if you need that.