Python library
Reference this guide to start fine-tuning a model using the Python library.
Quick Links
- Install the Library
- Prepare your Data
- Check and Upload your Data
- Start Fine-Tuning
- Monitor Progress
- Using a Downloaded Model
- Deploy your Fine-Tuned Model
- Colab Notebook Finetuning Project Tutorial
Install the Library
To get started, install the together
Python library:
pip install --upgrade together
Then, configure your API key by setting the TOGETHER_API_KEY
environment variable:
export TOGETHER_API_KEY=xxxxx
Upload your Data
See the data preparation instruction to understand the requirements and its readiness.
To upload your data, run the following code:
import os
from together import Together
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
resp = client.files.upload(file="joke_explanations.jsonl") # uploads a file
print(resp.dict())
Here is the output:
{
"id": "file-f6d02dc8-c9f9-4e38-ae63-7899fa603a86",
"object": "file",
"created_at": 1713481731,
"type": null,
"purpose": "fine-tune",
"filename": "joke_explanations.jsonl",
"bytes": 0,
"line_count": 0,
"processed": false
}
You will get back the file id
of the file you just uploaded, but if you forget it, you can get the id
's of all the files you have uploaded using files.list()
. You'll need these id
's that start with file-960be810-4d....
in order to start a fine-tuning job.
import os
from together import Together
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
filesUploaded = client.files.list() # lists all uploaded files
print(filesUploaded)
[{'filename': 'joke_explanations.jsonl',
'bytes': 40805,
'created_at': 1691710036,
'id': 'file-960be810-4d33-449a-885a-9f69bd8fd0e2',
'purpose': 'fine-tune',
'object': 'file',
'LineCount': 0,
'Processed': True},
{'filename': 'sample_jsonl.jsonl',
'bytes': 1235,
'created_at': 1692190883,
'id': 'file-d0d318cb-b7d9-493a-bd70-1cfe089d3815',
'purpose': 'fine-tune',
'object': 'file',
'LineCount': 0,
'Processed': True}]
Start Fine-Tuning
Once you've uploaded your dataset, copy your file id from the output above and select a base model to fine-tune. Check out the full models list available for fine-tuning.
Run the following command to start your fine-tuning job using fine_tuning.create
:
import os
from together import Together
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
resp = client.fine_tuning.create(
training_file = 'file-d0d318cb-b7d9-493a-bd70-1cfe089d3815',
model = 'meta-llama/Meta-Llama-3-8B',
n_epochs = 3,
n_checkpoints = 1,
batch_size = 4,
learning_rate = 1e-5,
wandb_api_key = '1a2b3c4d5e.......',
)
fine_tune_id = resp['id']
print(resp)
Here is an example of part of the resp
response to highlight some of the useful information about your finetune job.
{
"id": "ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f",
"training_file": "file-2490a204-16e2-481e-a3d5-5636a6f3a4ea",
"model": "meta-llama/Meta-Llama-3-8B",
"output_name": "[email protected]/Meta-Llama-3-8B-2024-04-18-19-37-52",
"n_epochs": 1,
"n_checkpoints": 1,
"batch_size": 32,
"learning_rate": 3e-05,
"created_at": "2024-04-18T19:37:52.611Z",
"updated_at": "2024-04-18T19:37:52.611Z",
"status": "pending",
"events": [
{
"object": "fine-tune-event",
"created_at": "2024-04-18T19:37:52.611Z",
"message": "Fine tune request created",
"type": "JOB_PENDING",
...
}
],
"training_file_size": 150047,
"model_output_path": "s3://together-dev/finetune/65987df6752090cead0c9056/[email protected]/Meta-Llama-3-8B-2024-04-18-19-37-52/ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f",
"user_id": "65987df6752090cead0c9056",
"owner_address": "0xf42ea9df7377257571fb0aae8799b6a357ba1bfb",
"enable_checkpoints": false,
...
}
You can retrieve all this information again by running the fine_tuning.retrieve()
method using the job ID provided above. For example, from the sample output above, ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f
is your Job ID.
You can also list all the events for a specific fine-tuning job to check the progress or cancel your job with the commands below.
print(client.fine_tuning.retrieve(id="ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f")) # retrieves information on finetune event
print(client.fine_tuning.list_events(id="ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f")) # Lists events of a fine-tune job
print(client.fine_tuning.cancel(id="ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f")) # Cancels a fine-tuning job
A fine-tune job can take anywhere between a couple minutes to hours depending on the base model, dataset size, number of epochs, and job queue.
Monitor Progress
You can check the completion progress of your fine-tuning job in Jobs tab of the playground.
If you provided your weights & biases API key, you can also check the learning progress of your fine-tuning job at wandb.ai, for example, with my wandb user configurations, I would go to: https://wandb.ai/<username>/together?workspace=user-<username>
where <username>
is your unique weights & biases user-name like mama-llama-88
.
🎉 Congratulations! You've just fine-tuned a model with the Together API. Now it's time to deploy your model.
Deploy your Fine-Tuned Model
Host your Model
Once the fine-tune job completes and you host your new model, you will be able to see your model in the Playground Models page. You can directly deploy the model through the web UI by clicking on the model, selecting your hardware and clicking play! Available hardware includes RTX6000, L40, L40S, A100 PCIe, A100 SXM and H100. Hardware options displayed depends on model constraints and overall hardware availability. Once the model is deployed, you can use the model through the playground or through our inference API. For the inference API, follow the instructions in the inference documentation.
For our model above, we can run inference on it with the following code:
import os
from together import Together
client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))
response = client.chat.completions.create(
model="[email protected]/Meta-Llama-3-8B-2024-04-18-19-37-52",
messages=[{"role": "user", "content": "tell me about new york"}],
)
print(response.choices[0].message.content)
Please note that hosting your fine-tuned model is charged per minute hosted. See the hourly pricing for fine-tuned model inference in the pricing table. When you are not using the model, be sure to stop the endpoint through the web UI. However, frequent starting and stopping may incur delay on your deployment.
To directly download the weights, see the instructions here.
Pricing
Pricing for fine-tuning is based on model size, the number of tokens, and the number of epochs. You can estimate fine-tuning pricing with our calculator.
The tokenization step is a part of the fine-tuning process on our API, and the exact number of tokens and the price of your job will be available after the tokenization step is done. You can find the information in the "JOBS" page or retrieve them by running together fine-tuning retrieve $JOB_ID
in your CLI.
Q: Is there a minimum price? The minimum price for a fine-tuning job is $5. For example, fine-tuning Llama-3-8B with 1B tokens for 1 epoch is $366. If you fine-tune this model for 1M tokens for 1 epoch, it is $0.37 based on the rate, and the final price will be $5.
Q: What happens if I cancel my job? The final price will be determined baed on the amount of tokens used to train your model up to the point of the cancellation. For example, if your fine-tuning job is using Llama-3-8B with a batch size of 8, and you cancelled the job after 1000 training steps, the total number of tokens used for training is 8192 [context length] x 8 [batch size] x 1000 [steps] = 65,536,000. This results in $27.21 as you can check in the pricing page.
Using a Downloaded Model
If you want to download your model locally, you can do so by following the steps below. The model will download as a tar.zst
file.
client.fine_tuning.download(
id="ft-eb167402-98ed-4ac5-b6f5-8140c4ba146e",
output = "my-model/model.tar.zst"
)
To uncompress this filetype on Mac you need to install zstd.
brew install zstd
cd my-model
zstd -d model.tar.zst
tar -xvf model.tar
cd ..
Within the folder that you uncompress the file, you will find a set of files like this:
ls my-model
tokenizer_config.json
special_tokens_map.json
pytorch_model.bin
generation_config.json
tokenizer.json
config.json
Use the folder path that contains these .bin
and .json
files to load your model
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained("./my-model")
model = AutoModelForCausalLM.from_pretrained(
"./my-model",
trust_remote_code=True,
).to(device)
input_context = "Space Robots are"
input_ids = tokenizer.encode(input_context, return_tensors="pt")
output = model.generate(input_ids.to(device), max_length=128, temperature=0.7).cpu()
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(output_text)
Space Robots are a great way to get your kids interested in science. After all, they are the future!
Updated 6 months ago