If you select your model in the models dashboard you can click CREATE DEDICATED ENDPOINT to create a dedicated endpoint for the fine-tuned model.You can also create a dedicated endpoint using the CLI. First, list your recent fine-tuning jobs to get the model output name:
tg fine-tuning list
Then use the “Model Output Name” from the list to create your endpoint:
Once it’s deployed, you can use the ID to query your new model using any of our APIs:
together chat.completions \ --model "[email protected]/Meta-Llama-3-8B-2024-07-11-22-57-17" \ --message "user" "What are some fun things to do in New York?"
Hosting your fine-tuned model is charged per minute hosted. You can see the hourly pricing for fine-tuned model inference in the pricing table.When you’re not using the model, be sure to stop the endpoint from the the models dashboard.Read more about dedicated inference here.
Your model will be downloaded to the location specified in output as a tar.zst file, which is an archive file format that uses the ZStandard algorithm. You’ll need to install ZStandard to decompress your model.On Macs, you can use Homebrew:
These can be used with various libraries and languages to run your model locally. Transformers is a popular Python library for working with pretrained models, and using it with your new model looks like this:
Space Robots are a great way to get your kids interested in science. After all, they are the future!
If you see the output, your new model is working!You now have a custom fine-tuned model that you can run completely locally, either on your own machine or on networked hardware of your choice.