The fine-tuning function of the Together Python Library is used to create, manage, and monitor fine-tune jobs.

Help

See all commands with:

together fine-tuning --help

Create

To start a new fine-tune job:

together fine-tuning create --training-file <FILE-ID> -m <MODEL>

Other arguments:

  • --training-file,-t (string, required) -- Specifies a training file with the file-id of a previously uploaded file (See Files).
  • --model,-m (string, optional) -- Specifies the base model to fine-tune. Default: togethercomputer/RedPajama-INCITE-7B-Chat.
  • --suffix,-s (string, optional) -- Up to 40 characters that will be added to your fine-tuned model name. It is recommended to add this to differentiate fine-tuned models. Default: None.
  • --n-epochs, -ne (integer, optional) -- Number of epochs to fine-tune on the dataset. Default: 4, Min: 1, Max: 20.
  • --n-checkpoints, -c (integer, optional) -- The number of checkpoints to save during training. Default: 1 One checkpoint is always saved on the last epoch for the trained model. The number of checkpoints must be larger than 0, and equal to or less than the number of epochs (1 <= n-checkpoints <= n-epochs). If a larger number is given, the number of epochs will be used for the number of checkpoints.
  • --batch-size,-b (integer, optional) -- The batch size to use for each training iteration. The batch size is the number of training samples/examples used in a batch. See the model page for min and max batch sizes for each model.
  • --learning-rate, -lr (float optional) -- The learning rate multiplier to use for training. Default: 0.00001, Min: 0.00000001, Max: 0.01
  • --wandb-api-key (string, optional) -- Your own Weights & Biases API key. If you provide the key, you can monitor your job progress on your Weights & Biases page. If not set WANDB_API_KEY environment variable is used.

(LoRA arguments are supported with together >= 1.2.3)

  • --lora (bool, optional) -- Whether to enable LoRA training. If not provided, full fine-tuning will be applied. Default: False.
  • --lora-r (integer, optional) -- Rank for LoRA adapter weights. Default: 8, Min: 1, Max: 64.
  • --lora-alpha (integer, optional) -- The alpha value for LoRA adapter training. Default: 8. Min: 1. If a value less than 1 is given, it will default to --lora-r value to follow the recommendation of 1:1 scaling.
  • --lora-dropout (float, optional) -- The dropout probability for Lora layers. Default: 0.0, Min: 0.0, Max: 1.0.
  • --lora-trainable-modules (string, _optional) -- A list of LoRA trainable modules, separated by a comma. Default: all-linear(using all trainable modules). Trainable modules for each model are:
    • Mixtral 8x7B model family: k_proj, w2, w1, gate, w3, o_proj, q_proj, v_proj
    • All other models: k_proj, up_proj, o_proj, q_proj, down_proj, v_proj, gate_proj

The id field in the JSON response contains the value for the fine-tune job ID (ft-id) that can be used to get the status, retrieve logs, cancel the job, and download weights.

List

To list past and running fine-tune jobs:

together fine-tuning list

The jobs will be sorted oldest-to-newest with the newest jobs at the bottom of the list.

Retrieve

To retrieve metadata on a job:

together fine-tuning retrieve <FT-ID>

Monitor Events

To list events of a past or running job:

together fine-tuning list-events <FT-ID>

Cancel

To cancel a running job:

together fine-tuning cancel <FT-ID>

Status

To get the status of a job:

together fine-tuning status <FT-ID>

Checkpoints

To list saved-checkpoints of a job:

together fine-tuning checkpoints <FT-ID>

Download Model and Checkpoint Weights

To download the weights of a fine-tuned model, run:

together fine-tuning download <FT-ID>

This command will download ZSTD compressed weights of the model. To extract the weights, run tar -xf filename.

Other arguments:

  • --output,-o (filename, optional) -- Specify the output filename. Default: <MODEL-NAME>.tar.zst
  • --step,-s (integer, optional) -- Download a specific checkpoint's weights. Defaults to download the latest weights. Default: -1