CLI

Reference this guide to start fine-tuning a model using the command-line interface.

Quick Links

Install the Library

To get started, install the together Python library:

pip install --upgrade together

Authenticate

The API Key can be configured by setting the TOGETHER_API_KEY environment variable, like this:

export TOGETHER_API_KEY=xxxxx

Find your API token in your account settings.

Prepare your Data

Prepare your dataset as a .jsonl file with a text field.

{"text": "..."}
{"text": "..."}

For more details and examples, check out this page.

To confirm that your dataset has the right format, run the following command:

together files check PATH_TO_DATA_FILE

Check and Upload your Data

Replace PATH_TO_DATA_FILE with the path to your dataset.

together files upload PATH_TO_DATA_FILE

The following example uploads an example dataset from HuggingFace. Here's what the output looks like:

together files upload unified_joke_explanations.jsonl
{
    "filename": "unified_joke_explanations.jsonl",
    "bytes": 150047,
    "created_at": 1687982638,
    "id": "file-d88343a5-3ba5-4b42-809a-9f1ee2b83861",
    "purpose": "fine-tune",
    "object": "file",
    "LineCount": 356
}

Start Fine-Tuning

Submit your fine-tuning job using the CLI:

together finetune create --training-file $FILE_ID --model $MODEL_NAME --wandb-api-key $WANDB_API_KEY

Replace FILE_ID with the ID of the training file.
Replace MODEL_NAME with the API name of the base model you want to fine-tune (refer to the models list).
Replace WANDB_API_KEY with your own Weights & Biases API key (Optional).

Additional parameters you can set when starting your fine-tuning job:

--suffix,-s (string, optional): Up to 40 characters that will be added to your fine-tuned model name. It is recommended to add this to differentiate fine-tuned models. Default: None.
--n-epochs, -ne (integer, optional): Number of epochs to fine-tune on the dataset. Default: 4, Min: 1, Max: 20
--n-checkpoints, -c (integer, optional): The number of checkpoints to save during training. Default: 1 One checkpoint is always saved on the last epoch for the trained model. The number of checkpoints must be < the number of epochs. If a larger number is given, the number of epochs will be used for the number of checkpoints.

  • For llama-2-70b & llama-2-70b-chat, the max n-checkpoints is 1.

--learning-rate, -lr (float optional): The learning rate multiplier to use for training. Default: 0.00001, Min: 0.00000001, Max: 0.01
--batch-size,-b (integer, optional): The batch size to use for each training iteration. The batch size is the number of training samples/examples used in a batch. Valid batch size is model dependent.

  • CodeLlama-7b - default: 16, Min: 4 max: 16
  • CodeLlama-13b - default: 8, Min: 4, max: 8
  • llama-2-70b & llama-2-70b-chat - default 32, min: 32, max 64
  • All other models - default: 32, Min: 4, Max: 128

You can also use suffix parameter to customize your model name. To see all input arguments and their details, visit this page.

Here's a sample output:

together finetune create --training-file file-d88343a5-3ba5-4b42-809a-9f1ee2b83861 --model togethercomputer/RedPajama-INCITE-7B-Chat
{
    "training_file": "file-d88343a5-3ba5-4b42-809a-9f1ee2b83861",
    "model_output_name": "username/togethercomputer/RedPajama-INCITE-7B-Chat",
    "model_output_path": "s3://together-dev/finetune/63e2b89da6382c4d75d5ef22/csris/togethercomputer/RedPajama-INCITE-7B-Chat",
    "Suffix": "",
    "model": "togethercomputer/RedPajama-INCITE-7B-Chat",
    "n_epochs": 4,
    "batch_size": 128,
    "learning_rate": 1e-06,
    "checkpoint_steps": 2,
    "created_at": 1687982945,
    "updated_at": 1687982945,
    "status": "pending",
    "id": "ft-5bf8990b-841d-4d63-a8a3-5248d73e045f",
    "job_id": "",
    "token_count": 0,
    "param_count": 0,
    "total_price": 0,
    "epochs_completed": 0,
    "events": [
        {
            "object": "fine-tune-event",
            "created_at": 1687982945,
            "level": "",
            "message": "Fine tune request created",
            "type": "JOB_PENDING",
            "param_count": 0,
            "token_count": 0,
            "checkpoint_path": "",
            "model_path": ""
        }
    ],
    "queue_depth": 0,
    "wandb_project_name": ""
}

Take note of the ID of the job ("id" from the output) as you'll need that to track progress and download model weights. For example, from the sample output above, ft-5bf8990b-841d-4d63-a8a3-5248d73e045f is your Job ID.

A fine-tune job can take anywhere between a couple minutes to hours depending on the base model, dataset size, number of epochs, and job queue.

Unless you set --quiet in the CLI, there will be a confirmation step to make sure you are aware of any defaults or arguments that needed to be reset from their original inputs for this specific finetune job. Type y then Enter to submit the job, or anything else to abort.

10-02-2023 11:14:27 - together.finetune - WARNING - Batch size must be 144 for togethercomputer/llama-2-70b-chat model. Setting batch size to 144 (finetune.py:114)
Note: Some hyperparameters may have been adjusted with their minimum/maximum values for a given model.

Job creation details:
{   'batch_size': 144,
    'learning_rate': 1e-05,
    'model': 'togethercomputer/llama-2-70b-chat',
    'n_checkpoints': 1,
    'n_epochs': 4,
    'suffix': None,
    'training_file': 'file-33ecca00-17ea-4968-ada2-9f82ef2f4cb8',
    'wandb_key': 'xxxx'}

Do you want to submit the job? [y/N]

Monitor Progress

View progress by navigating to the Jobs tab in the Playground. You can also monitor progress using the CLI:

together finetune list-events FINETUNE_ID

Replace FINETUNE_ID with the ID of the fine tuning job.

The output should be similar to:

together finetune list-events ft-5bf8990b-841d-4d63-a8a3-5248d73e045f
{
    "data": [
        {
            "object": "fine-tune-event",
            "created_at": 1687982945,
            "level": "",
            "message": "Fine tune request created",
            "type": "JOB_PENDING",
            "param_count": 0,
            "token_count": 0,
            "checkpoint_path": "",
            "model_path": ""
        },
        {
            "object": "fine-tune-event",
            "created_at": 1687982993,
            "level": "info",
            "message": "Training started at Wed Jun 28 13:09:51 PDT 2023",
            "type": "JOB_START",
            "param_count": 0,
            "token_count": 0,
            "checkpoint_path": "",
            "model_path": ""
        },
        {
            "object": "fine-tune-event",
            "created_at": 1687983122,
            "level": "info",
            "message": "Model data downloaded for togethercomputer/RedPajama-INCITE-7B-Chat at Wed Jun 28 13:12:01 PDT 2023",
            "type": "MODEL_DOWNLOAD_COMPLETE",
            "param_count": 0,
            "token_count": 0,
            "checkpoint_path": "",
            "model_path": ""
        },
        {
            "object": "fine-tune-event",
            "created_at": 1687983124,
            "level": "info",
            "message": "Training data downloaded for togethercomputer/RedPajama-INCITE-7B-Chat at Wed Jun 28 13:12:03 PDT 2023",
            "type": "TRAINING_DATA_DOWNLOAD_COMPLETE",
            "param_count": 0,
            "token_count": 0,
            "checkpoint_path": "",
            "model_path": ""
        }
    ],
    "object": "list"
}

🎉 Congratulations! You've just fine-tuned a model with the Together API. Now it's time to deploy your model.

Deploy your Fine-Tuned Model

Host your Model

Once the fine-tune job completes, you will be able to see your model in the Playground. To deploy this model, follow the instructions in the inference documentation.

To directly download the weights, see the instruction here.

Other commands

  1. List all of your current jobs

    together finetune list
    
  2. Cancel a job

    together finetune cancel FINETUNE_ID
    

Replace FINETUNE_ID with the ID of the fine tuning job.

Commands

Here are all the commands available through CLI

# list commands
together --help

# list available models
together models list

# start a model
together models start togethercomputer/RedPajama-INCITE-7B-Base

# create completion
together complete "Space robots are" -m togethercomputer/RedPajama-INCITE-7B-Base

# check which models are running
together models instances

# stop a model
together models stop togethercomputer/RedPajama-INCITE-7B-Base

# check your jsonl file
together files check jokes.jsonl

# upload your jsonl file
together files upload jokes.jsonl

# upload your jsonl file and disable file checking
together files upload jokes.jsonl --no-check

# list your uploaded files
together files list

# start fine-tuning a model on your jsonl file (use the id of your file given to after upload or from together files list)
together finetune create -t file-9263d6b7-736f-43fc-8d14-b7f0efae9079 -m togethercomputer/RedPajama-INCITE-Chat-3B-v1

# check the status of the finetune job
together finetune status ft-dd93c727-f35e-41c2-a370-7d55b54128fa

# retrieve progress updates about the finetune job
together finetune retrieve ft-dd93c727-f35e-41c2-a370-7d55b54128fa

# download your finetuned model (with your fine_tune_id from the id key given during create or from together finetune list)
together finetune download ft-dd93c727-f35e-41c2-a370-7d55b54128fa 

# check if your newly started finetuned model is ready for inference
together models ready yourname/ft-dd93c727-f35e-41c2-a370-7d55b54128fa-2023-08-16-10-15-09

# inference using your new finetuned model (with new finetuned model name from together models list)
together complete "Space robots are" -m yourname/ft-dd93c727-f35e-41c2-a370-7d55b54128fa-2023-08-16-10-15-09

Resources

See the list of base models available to fine-tune with the Together API.

Estimate fine-tuning pricing with our calculator. Pricing is based on model size, dataset size, and the number of epochs.

Follow along in our Colab (Google Colaboratory) Notebook Tutorial Example Finetuning Project