Reasoning fine-tuning

Reasoning fine-tuning adapts a model that supports chain-of-thought reasoning. By providing reasoning or reasoning_content alongside the final assistant response, you shape how the model thinks through problems before producing an answer. This page covers the reasoning data shape, supported models, and launch parameters.

Reasoning models should always be fine-tuned with reasoning data. Training a reasoning model without it can degrade its reasoning ability. If your dataset doesn’t include reasoning, use an instruct model instead.

Supported models

The following models support reasoning fine-tuning. See supported models for context lengths and batch limits.

Supported models

Organization	Model	API ID
NVIDIA	NVIDIA Nemotron 3 Nano Omni 30B A3B Reasoning BF16	`nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-BF16`
Qwen	Qwen3.5 397B A17B	`Qwen/Qwen3.5-397B-A17B`
Qwen	Qwen3.5 122B A10B	`Qwen/Qwen3.5-122B-A10B`
Qwen	Qwen3.5 35B A3B	`Qwen/Qwen3.5-35B-A3B`
Qwen	Qwen3.5 35B A3B Base	`Qwen/Qwen3.5-35B-A3B-Base`
Qwen	Qwen3.5 27B	`Qwen/Qwen3.5-27B`
Qwen	Qwen3.5 9B	`Qwen/Qwen3.5-9B`
Qwen	Qwen3.5 4B	`Qwen/Qwen3.5-4B`
Qwen	Qwen3.5 2B	`Qwen/Qwen3.5-2B`
Qwen	Qwen3.5 0.8B	`Qwen/Qwen3.5-0.8B`
Qwen	Qwen3.6 35B A3B	`Qwen/Qwen3.6-35B-A3B`
Qwen	Qwen3 Next 80B A3B Thinking	`Qwen/Qwen3-Next-80B-A3B-Thinking`
Qwen	Qwen3 0.6B	`Qwen/Qwen3-0.6B`
Qwen	Qwen3 0.6B Base	`Qwen/Qwen3-0.6B-Base`
Qwen	Qwen3 1.7B	`Qwen/Qwen3-1.7B`
Qwen	Qwen3 1.7B Base	`Qwen/Qwen3-1.7B-Base`
Qwen	Qwen3 4B	`Qwen/Qwen3-4B`
Qwen	Qwen3 4B Base	`Qwen/Qwen3-4B-Base`
Qwen	Qwen3 8B	`Qwen/Qwen3-8B`
Qwen	Qwen3 8B Base	`Qwen/Qwen3-8B-Base`
Qwen	Qwen3 14B	`Qwen/Qwen3-14B`
Qwen	Qwen3 14B Base	`Qwen/Qwen3-14B-Base`
Qwen	Qwen3 32B	`Qwen/Qwen3-32B`
Qwen	Qwen3 30B A3B Base	`Qwen/Qwen3-30B-A3B-Base`
Qwen	Qwen3 30B A3B	`Qwen/Qwen3-30B-A3B`
Qwen	Qwen3 235B A22B	`Qwen/Qwen3-235B-A22B`
Z.ai	GLM 5.1	`zai-org/GLM-5.1`
Z.ai	GLM 5	`zai-org/GLM-5`
Z.ai	GLM 4.7	`zai-org/GLM-4.7`
Z.ai	GLM 4.6	`zai-org/GLM-4.6`
OpenAI	GPT-OSS 20B	`openai/gpt-oss-20b`
OpenAI	GPT-OSS 120B	`openai/gpt-oss-120b`
Google	Gemma 4 31B IT	`google/gemma-4-31B-it`
Google	Gemma 4 31B IT VLM	`google/gemma-4-31B-it-VLM`
Google	Gemma 4 26B A4B IT	`google/gemma-4-26B-A4B-it`

Prepare your data

Prepare data in a JSONL file. Each assistant message should carry the chain of thought in a reasoning (or reasoning_content) field and the final answer in content.

Conversational format

{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"},
    {
      "role": "assistant",
      "reasoning": "The user is asking about the capital of France. France is a country in Western Europe. Its capital city is Paris, which has been the capital since the 10th century.",
      "content": "The capital of France is Paris."
    }
  ]
}

When fine-tuning reasoning models on conversational data, only the last assistant message is trained on by default. For multi-turn reasoning, split the conversation so each assistant message is the final message in its own example.

Preference format

For preference fine-tuning, both outputs carry reasoning. See preference tuning for the broader DPO workflow.

{
  "input": {
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  },
  "preferred_output": [
    {
      "role": "assistant",
      "reasoning": "France is in Western Europe. Its capital is Paris.",
      "content": "The capital of France is Paris."
    }
  ],
  "non_preferred_output": [
    {
      "role": "assistant",
      "reasoning": "Let me think about European capitals.",
      "content": "The capital of France is Berlin."
    }
  ]
}

Validate and upload

Upload your data using the Together Python/TypeScript SDK or the Together CLI:

from together import Together

client = Together()

train_file = client.files.upload(
    file="reasoning_dataset.jsonl",
    purpose="fine-tune",
    check=True,
)
print(train_file.id)

import Together from "together-ai";
import fs from "node:fs";

const client = new Together();

const trainFile = await client.files.upload({
  file: fs.createReadStream("reasoning_dataset.jsonl"),
  purpose: "fine-tune",
});
console.log(trainFile.id);

tg files check "reasoning_dataset.jsonl"
tg files upload "reasoning_dataset.jsonl"

Launch the job

LoRA is the default. Pass lora=False for full fine-tuning.

job = client.fine_tuning.create(
    training_file=train_file.id,
    model="Qwen/Qwen3-8B",
    lora=True,
)
print(job.id)

const job = await client.fineTuning.create({
  training_file: trainFile.id,
  model: "Qwen/Qwen3-8B",
  lora: true,
});
console.log(job.id);

tg fine-tuning create \
  --training-file "<FILE_ID>" \
  --model "Qwen/Qwen3-8B" \
  --lora

For details on every available parameter, see the API reference.

Watch and deploy

Reasoning jobs use the same lifecycle as text jobs:

Poll the job with the SDK or CLI. Expect 10 to 30 minutes for a LoRA job on an 8B model with a few thousand examples.
Deploy the result on a dedicated endpoint.
Call the endpoint with the same chat-completions shape. The model emits reasoning_content alongside content for clients that surface it. See Inference → Reasoning for details.

GET STARTED

SERVERLESS

INFERENCE APIS

DEDICATED MODEL INFERENCE

DEDICATED CONTAINER INFERENCE

GPU CLUSTERS

FINE-TUNING

CODE EXECUTION

ADMINISTRATION

Supported models

Prepare your data

Conversational format

Preference format

Validate and upload

Launch the job

Watch and deploy

​Supported models

​Prepare your data

​Conversational format

​Preference format

​Validate and upload

​Launch the job

​Watch and deploy

Supported models

Prepare your data

Conversational format

Preference format

Validate and upload

Launch the job

Watch and deploy