Pricing

Together AI bills fine-tuning by the total number of tokens processed across training and validation. The per-token rate depends on three factors: the model size bracket, the training method (supervised or DPO), and the training type (LoRA or full fine-tuning). For current rates, see together.ai/pricing. After training, hosting on a dedicated endpoint is billed separately by the minute.

How tokens are counted

The total tokens processed in a job is equal to:

total_tokens = (n_epochs × tokens_per_training_dataset) + (n_evals × tokens_per_validation_dataset)

Tokenization occurs shortly after the job starts. Your final token count and price are calculated and recorded after tokenization completes, after which they appear on the fine-tuning jobs dashboard and in client.fine_tuning.retrieve(id=<JOB_ID>). If you disable packing, training tokens are computed as dataset_length × max_seq_length instead.

Estimate job cost

There are three ways to estimate the cost of a fine-tuning job before launching it:

CLI: When you submit a job with tg fine-tuning create, the CLI prints the estimated price and asks for confirmation before the job is submitted.
Web interface: On the new fine-tuning job page, the estimate appears once you select a model and dataset.
API/SDK: Call the estimate price endpoint with the same parameters you plan to submit to the create job endpoint. The response includes the estimated total price, the estimated training and evaluation token counts, your credit limit, and whether you are allowed to proceed:

import os

from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

estimate = client.fine_tuning.estimate_price(
    training_file="file-abc123",
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Reference",
    n_epochs=3,
    training_method={"method": "sft"},
    training_type={"type": "Lora", "lora_r": 8},
)

print(estimate)

import Together from "together-ai";

const client = new Together({ apiKey: process.env.TOGETHER_API_KEY });

const estimate = await client.fineTuning.estimatePrice({
  training_file: "file-abc123",
  model: "meta-llama/Meta-Llama-3.1-8B-Instruct-Reference",
  n_epochs: 3,
  training_method: { method: "sft" },
  training_type: { type: "Lora", lora_r: 8 },
});

console.log(estimate);

The cost estimate is only available after your input datasets pass server-side validation.

Cancelled and early-stopped jobs

When a running job is cancelled or stopped early, you pay for completed steps only. To check how many steps a job completed, retrieve it and read steps_completed:

tg fine-tuning retrieve <job-id> --json | jq '.steps_completed'

Failed jobs

If a job fails, for example due to an invalid input or an internal Together AI-side error, all charges are fully refunded, including any completed steps.

Minimum spend

Fine-tuning jobs have a $4.00 minimum charge. Some models are exempt. See fine-tuning pricing for the current rates and exceptions.

Hosting charges

After training, your fine-tuned model can be served on a dedicated endpoint that bills per minute based on the hardware attached. These charges are separate from your fine-tuning job cost and continue until you stop or delete the endpoint. See deployment for the full setup and teardown flow.

GET STARTED

SERVERLESS

INFERENCE APIS

DEDICATED MODEL INFERENCE

DEDICATED CONTAINER INFERENCE

GPU CLUSTERS

FINE-TUNING

CODE EXECUTION

ADMINISTRATION

How tokens are counted

Estimate job cost

Cancelled and early-stopped jobs

Failed jobs

Minimum spend

Hosting charges

​How tokens are counted

​Estimate job cost

​Cancelled and early-stopped jobs

​Failed jobs

​Minimum spend

​Hosting charges

How tokens are counted

Estimate job cost

Cancelled and early-stopped jobs

Failed jobs

Minimum spend

Hosting charges