Skip to main content

Introduction

Reasoning fine-tuning allows you to adapt models that support chain-of-thought reasoning. By providing reasoning or reasoning_content fields alongside assistant responses, you can shape how a model thinks through problems before producing an answer. This guide covers the specific steps for reasoning fine-tuning. For general fine-tuning concepts, environment setup, and hyperparameter details, refer to the Fine-tuning Guide.

Reasoning Dataset

Dataset Requirements:
  • Format: .jsonl file
  • Supported types: Conversational, Preferential — more details on their purpose here
  • Assistant messages support a reasoning or reasoning_content field containing the model’s chain of thought
  • The content field contains the final response shown to the user
Reasoning models should always be fine-tuned with reasoning data. Training without it can degrade the model’s reasoning ability. If your dataset doesn’t include reasoning, use an instruct model instead.

Conversation Reasoning Format

This is what one row/example from the reasoning dataset looks like in conversation format:
{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"},
    {
      "role": "assistant",
      "reasoning": "The user is asking about the capital of France. France is a country in Western Europe. Its capital city is Paris, which has been the capital since the 10th century.",
      "content": "The capital of France is Paris."
    }
  ]
}
When fine-tuning reasoning models on conversational data, only the last assistant message is trained on by default. For multi-turn reasoning, split the conversation so each assistant message is the final message in its own conversation.

Preference Reasoning Format

{
  "input": {
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  },
  "preferred_output": [
    {
      "role": "assistant",
      "reasoning": "The user is asking about the capital of France. France is a country in Western Europe. Its capital city is Paris.",
      "content": "The capital of France is Paris."
    }
  ],
  "non_preferred_output": [
    {
      "role": "assistant",
      "reasoning": "Hmm, let me think about European capitals.",
      "content": "The capital of France is Berlin."
    }
  ]
}

Supported Models

The following models support reasoning fine-tuning:
OrganizationModel NameModel String for API
QwenQwen 3 0.6B BaseQwen/Qwen3-0.6B-Base
QwenQwen 3 0.6BQwen/Qwen3-0.6B
QwenQwen 3 1.7B BaseQwen/Qwen3-1.7B-Base
QwenQwen 3 1.7BQwen/Qwen3-1.7B
QwenQwen 3 4B BaseQwen/Qwen3-4B-Base
QwenQwen 3 4BQwen/Qwen3-4B
QwenQwen 3 8B BaseQwen/Qwen3-8B-Base
QwenQwen 3 8BQwen/Qwen3-8B
QwenQwen 3 14B BaseQwen/Qwen3-14B-Base
QwenQwen 3 14BQwen/Qwen3-14B
QwenQwen 3 32BQwen/Qwen3-32B
QwenQwen 3 32B 16kQwen/Qwen3-32B-16k
QwenQwen 3 30B A3B BaseQwen/Qwen3-30B-A3B-Base
QwenQwen 3 30B A3BQwen/Qwen3-30B-A3B
QwenQwen 3 235B A22BQwen/Qwen3-235B-A22B
QwenQwen 3 Next 80B A3B ThinkingQwen/Qwen3-Next-80B-A3B-Thinking
Z.aiGLM 4.6zai-org/GLM-4.6
Z.aiGLM 4.7zai-org/GLM-4.7

Check and Upload Dataset

To upload your data, use the CLI or our Python library:
together files check "reasoning_dataset.jsonl"

together files upload "reasoning_dataset.jsonl"
You’ll see the following output once the upload finishes:
{
  "id": "file-629e58b4-ff73-438c-b2cc-f69542b27980",
  "object": "file",
  "created_at": 1732573871,
  "type": null,
  "purpose": "fine-tune",
  "filename": "reasoning_dataset.jsonl",
  "bytes": 0,
  "line_count": 0,
  "processed": false,
  "FileType": "jsonl"
}
You’ll be using your file’s ID (the string that begins with file-) to start your fine-tuning job, so store it somewhere before moving on.

Starting a Fine-tuning Job

We support both LoRA and full fine-tuning for reasoning models. For an exhaustive list of all the available fine-tuning parameters, refer to the Together AI Fine-tuning API Reference.
together fine-tuning create \
  --training-file "file-629e58b4-ff73-438c-b2cc-f69542b27980" \
  --model "Qwen/Qwen3-8B" \
  --lora

Full Fine-tuning

together fine-tuning create \
  --training-file "file-629e58b4-ff73-438c-b2cc-f69542b27980" \
  --model "Qwen/Qwen3-8B" \
  --no-lora
You can specify many more fine-tuning parameters to customize your job. See the full list of hyperparameters and their definitions here.

Monitoring Your Fine-tuning Job

Fine-tuning can take time depending on the model size, dataset size, and hyperparameters. Your job will progress through several states: Pending, Queued, Running, Uploading, and Completed. Dashboard Monitoring You can monitor your job on the Together AI jobs dashboard. Check Status via API
together fine-tuning retrieve "your-job-id"

together fine-tuning list-events "your-job-id"

Using Your Fine-tuned Model

Once your fine-tuning job completes, your model will be available for use. You can view your fine-tuned models in your models dashboard.

Dedicated Endpoint Deployment

You can now deploy your fine-tuned model on a dedicated endpoint for production use:
  1. Visit your models dashboard
  2. Find your fine-tuned model and click ”+ CREATE DEDICATED ENDPOINT”
  3. Select your hardware configuration and scaling options
  4. Click “DEPLOY”
You can also deploy programmatically:
import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

response = client.endpoints.create(
    display_name="Fine-tuned Qwen3-8B Reasoning",
    model="your-username/Qwen3-8B-your-suffix",
    hardware="4x_nvidia_h100_80gb_sxm",
    autoscaling={"min_replicas": 1, "max_replicas": 1},
)

print(response)
Running this code will deploy a dedicated endpoint, which incurs charges. For detailed documentation around how to deploy, delete and modify endpoints see the Endpoints API Reference. For more details, read the detailed walkthrough How-to: Fine-tuning.