Python library

Reference this guide to start fine-tuning a model using the Python library.

Quick Links

Install the Library

To get started, install the together Python library:

pip install --upgrade together

Then, configure your API key by setting the TOGETHER_API_KEY environment variable:

export TOGETHER_API_KEY=xxxxx

Upload your Data

See the data preparation instruction to understand the requirements and its readiness.

To upload your data, run the following code:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

resp = client.files.upload(file="joke_explanations.jsonl") # uploads a file
print(resp.dict())

Here is the output:

{
  "id": "file-f6d02dc8-c9f9-4e38-ae63-7899fa603a86",
  "object": "file",
  "created_at": 1713481731,
  "type": null,
  "purpose": "fine-tune",
  "filename": "joke_explanations.jsonl",
  "bytes": 0,
  "line_count": 0,
  "processed": false
}

You will get back the file id of the file you just uploaded, but if you forget it, you can get the id's of all the files you have uploaded using files.list(). You'll need these id's that start with file-960be810-4d.... in order to start a fine-tuning job.

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

filesUploaded = client.files.list() # lists all uploaded files
print(filesUploaded)
[{'filename': 'joke_explanations.jsonl',
  'bytes': 40805,
  'created_at': 1691710036,
  'id': 'file-960be810-4d33-449a-885a-9f69bd8fd0e2',
  'purpose': 'fine-tune',
  'object': 'file',
  'LineCount': 0,
  'Processed': True},
 {'filename': 'sample_jsonl.jsonl',
  'bytes': 1235,
  'created_at': 1692190883,
  'id': 'file-d0d318cb-b7d9-493a-bd70-1cfe089d3815',
  'purpose': 'fine-tune',
  'object': 'file',
  'LineCount': 0,
  'Processed': True}]

Start Fine-Tuning

Once you've uploaded your dataset, copy your file id from the output above and select a base model to fine-tune. Check out the full models list available for fine-tuning.

Run the following command to start your fine-tuning job using fine_tuning.create:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

resp = client.fine_tuning.create(
  training_file = 'file-d0d318cb-b7d9-493a-bd70-1cfe089d3815',
  model = 'meta-llama/Meta-Llama-3-8B',
  n_epochs = 3,
  n_checkpoints = 1,
  batch_size = 4,
  learning_rate = 1e-5,
  wandb_api_key = '1a2b3c4d5e.......',
)

fine_tune_id = resp['id']
print(resp)

Here is an example of part of the resp response to highlight some of the useful information about your finetune job.

{
    "id": "ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f",
    "training_file": "file-2490a204-16e2-481e-a3d5-5636a6f3a4ea",
    "model": "meta-llama/Meta-Llama-3-8B",
    "output_name": "[email protected]/Meta-Llama-3-8B-2024-04-18-19-37-52",
    "n_epochs": 1,
    "n_checkpoints": 1,
    "batch_size": 32,
    "learning_rate": 3e-05,
    "created_at": "2024-04-18T19:37:52.611Z",
    "updated_at": "2024-04-18T19:37:52.611Z",
    "status": "pending",
    "events": [
        {
            "object": "fine-tune-event",
            "created_at": "2024-04-18T19:37:52.611Z",
            "message": "Fine tune request created",
            "type": "JOB_PENDING",
            ...
        }
    ],
    "training_file_size": 150047,
    "model_output_path": "s3://together-dev/finetune/65987df6752090cead0c9056/[email protected]/Meta-Llama-3-8B-2024-04-18-19-37-52/ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f",
    "user_id": "65987df6752090cead0c9056",
    "owner_address": "0xf42ea9df7377257571fb0aae8799b6a357ba1bfb",
    "enable_checkpoints": false,
    ...
}

You can retrieve all this information again by running the fine_tuning.retrieve() method using the job ID provided above. For example, from the sample output above, ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f is your Job ID.

You can also list all the events for a specific fine-tuning job to check the progress or cancel your job with the commands below.

print(client.fine_tuning.retrieve(id="ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f")) # retrieves information on finetune event
print(client.fine_tuning.list_events(id="ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f")) #  Lists events of a fine-tune job
print(client.fine_tuning.cancel(id="ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f")) # Cancels a fine-tuning job

A fine-tune job can take anywhere between a couple minutes to hours depending on the base model, dataset size, number of epochs, and job queue.

Monitor Progress

You can check the completion progress of your fine-tuning job in Jobs tab of the playground.

If you provided your weights & biases API key, you can also check the learning progress of your fine-tuning job at wandb.ai, for example, with my wandb user configurations, I would go to: https://wandb.ai/<username>/together?workspace=user-<username> where <username> is your unique weights & biases user-name like mama-llama-88.

πŸŽ‰ Congratulations! You've just fine-tuned a model with the Together API. Now it's time to deploy your model.

Deploy your Fine-Tuned Model

Host your Model

Once the fine-tune job completes and you host your new model, you will be able to see your model in the Playground Models page. You can directly deploy the model through the web UI by clicking on the model, selecting your hardware and clicking play! Available hardware includes RTX6000, L40, L40S, A100 PCIe, A100 SXM and H100. Hardware options displayed depends on model constraints and overall hardware availability. Once the model is deployed, you can use the model through the playground or through our inference API. For the inference API, follow the instructions in the inference documentation.

For our model above, we can run inference on it with the following code:

import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

response = client.chat.completions.create(
    model="[email protected]/Meta-Llama-3-8B-2024-04-18-19-37-52",
    messages=[{"role": "user", "content": "tell me about new york"}],
)
print(response.choices[0].message.content)

Please note that hosting your fine-tuned model is charged per minute hosted. See the hourly pricing for fine-tuned model inference in the pricing table. When you are not using the model, be sure to stop the endpoint through the web UI. However, frequent starting and stopping may incur delay on your deployment.

To directly download the weights, see the instructions here.

Pricing

Pricing for fine-tuning is based on model size, the number of tokens, and the number of epochs. You can estimate fine-tuning pricing with our calculator.

The tokenization step is a part of the fine-tuning process on our API, and the exact number of tokens and the price of your job will be available after the tokenization step is done. You can find the information in the "JOBS" page or retrieve them by running together fine-tuning retrieve $JOB_ID in your CLI.

Q: Is there a minimum price? The minimum price for a fine-tuning job is $5. For example, fine-tuning Llama-3-8B with 1B tokens for 1 epoch is $366. If you fine-tune this model for 1M tokens for 1 epoch, it is $0.37 based on the rate, and the final price will be $5.

Q: What happens if I cancel my job? The final price will be determined baed on the amount of tokens used to train your model up to the point of the cancellation. For example, if your fine-tuning job is using Llama-3-8B with a batch size of 8, and you cancelled the job after 1000 training steps, the total number of tokens used for training is 8192 [context length] x 8 [batch size] x 1000 [steps] = 65,536,000. This results in $27.21 as you can check in the pricing page.

Using a Downloaded Model

If you want to download your model locally, you can do so by following the steps below. The model will download as a tar.zst file.

client.fine_tuning.download(
    id="ft-eb167402-98ed-4ac5-b6f5-8140c4ba146e",
    output = "my-model/model.tar.zst"
)

To uncompress this filetype on Mac you need to install zstd.

brew install zstd
cd my-model
zstd -d model.tar.zst
tar -xvf model.tar
cd ..

Within the folder that you uncompress the file, you will find a set of files like this:
ls my-model

tokenizer_config.json
special_tokens_map.json
pytorch_model.bin
generation_config.json
tokenizer.json
config.json

Use the folder path that contains these .bin and .json files to load your model

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

tokenizer = AutoTokenizer.from_pretrained("./my-model")

model = AutoModelForCausalLM.from_pretrained(
  "./my-model", 
  trust_remote_code=True, 
).to(device)

input_context = "Space Robots are"
input_ids = tokenizer.encode(input_context, return_tensors="pt")
output = model.generate(input_ids.to(device), max_length=128, temperature=0.7).cpu()
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(output_text)
Space Robots are a great way to get your kids interested in science. After all, they are the future!