You must provide either --model (to start from a base model) or --from-checkpoint (to resume from a previous job). Before the job is submitted, the CLI prints an estimated price and asks for confirmation; pass --confirm (or -y) to skip the prompt in scripts and CI.
If --training-file (or --validation-file) is a local path, the CLI uploads the file to the Files API automatically before kicking off the job.
required Training file ID from the Files API or a local path to upload. The maximum allowed file size is 25 GB.
--model [string]
Base model to fine-tune. See the model page. Required unless --from-checkpoint is set.
--from-checkpoint [string]
Continue training from a previous fine-tuning job. Format: JOB_ID/OUTPUT_MODEL_NAME:STEP. The step is optional; the final checkpoint is used when omitted. Mutually exclusive with --model.
--validation-file/-v [string]
Validation file ID from the Files API or a local path to upload. Required when --n-evals > 0. The maximum allowed file size is 25 GB.
--suffix [string]
Up to 40 characters appended to the fine-tuned model name. Recommended to differentiate fine-tuned models.
--packing/--no-packing
Whether to use sequence packing for training. Default: enabled.
--max-seq-length [integer]
Maximum sequence length to use for training. Required when --no-packing is set. Defaults to the maximum allowed for the model and training type.
--n-epochs/-ne [integer]
Number of epochs to fine-tune on the dataset. Default: 1. Min: 1. Max: 20.
--n-evals [integer]
Number of evaluation loops to run on the validation set. Default: 0. Min: 0. Max: 100.
--n-checkpoints/-c [integer]
The number of checkpoints to save during training. Default: 1. One checkpoint is always saved on the last epoch. Must be 1 ≤ n-checkpoints ≤ n-epochs.
--batch-size/-b [integer | max]
Batch size for each training iteration. See the model page for min and max batch sizes per model. Default: max.
Ratio of the final learning rate to the peak learning rate. Default: 0.0. Min: 0.0. Max: 1.0.
--scheduler-num-cycles [float]
Number or fraction of cycles for the cosine learning rate scheduler. Must be non-negative. Default: 0.5.
--warmup-ratio [float]
Fraction of steps at the start of training to linearly warm up the learning rate. Default: 0.0. Min: 0.0. Max: 1.0.
--max-grad-norm [float]
Max gradient norm for gradient clipping. Set to 0 to disable. Default: 1.0. Min: 0.0.
--weight-decay [float]
Weight decay for the optimizer. Default: 0.0. Min: 0.0.
--random-seed [integer]
Random seed for reproducible training. Uses the server default if unset.
--confirm/-y
Skip the price-confirmation prompt. Useful in scripts and CI.
--train-on-inputs [true | false | auto]
Whether to mask user messages in conversational data or prompts in instruction data.
auto infers from the data format:
Datasets with the "text" field (general format): inputs are not masked.
Datasets with the "messages" field (conversational format) or "prompt" and "completion" fields (instruction format): inputs are masked.
Default: auto.
--train-vision/--no-train-vision
Update the vision encoder parameters. Default: false. Only available for vision-language models.
--from-hf-model [string]
Hugging Face Hub repository to start training from. Should match the base model’s architecture and size. When --lora is set with --lora-trainable-modules all-linear, the modules k_proj, o_proj, q_proj, v_proj are targeted for adapter training.
--hf-model-revision [string]
Revision (branch name or commit hash) of the Hugging Face Hub model.
--hf-api-token [string]
Hugging Face API token for downloading from a private repo or uploading the output model.
--hf-output-repo-name [string]
Hugging Face repo to upload the fine-tuned model to.
Force LoRA fine-tuning (--lora) or full fine-tuning (--no-lora). When omitted, the API auto-detects: it defaults to LoRA on most base models, and inherits the parent job’s training type when --from-checkpoint is set.
Training method. sft is supervised fine-tuning; dpo is Direct Preference Optimization. Default: sft. The DPO method also accepts the RPO and SimPO loss modifiers below.
--dpo-beta [float]
Beta parameter for DPO training. Only used when --training-method dpo.
--dpo-normalize-logratios-by-length
Normalize logratios by sample length. Only used when --training-method dpo. Default: false.
--rpo-alpha [float]
RPO alpha parameter (adds NLL term to the DPO loss). Only used when --training-method dpo.
--simpo-gamma [float]
SimPO gamma parameter. Only used when --training-method dpo.
The id field in the JSON response contains the fine-tune job ID (ft-…) that you use to retrieve status, list events, cancel the job, and download weights.
Download a specific checkpoint’s weights. Defaults to the latest checkpoint.
--checkpoint-type/-c [merged | adapter | default]
Checkpoint type. merged and adapter apply to LoRA jobs only; default resolves to merged for LoRA jobs and to the full model for non-LoRA jobs. Default: merged.