Skip to main content

Introduction

Reasoning fine-tuning allows you to adapt models that support chain-of-thought reasoning. By providing reasoning or reasoning_content fields alongside assistant responses, you can shape how a model thinks through problems before producing an answer. This guide covers the specific steps for reasoning fine-tuning. For general fine-tuning concepts, environment setup, and hyperparameter details, refer to the Fine-tuning Guide.

Reasoning Dataset

Dataset Requirements:
  • Format: .jsonl file
  • Supported types: Conversational, Preferential — more details on their purpose here
  • Assistant messages support a reasoning or reasoning_content field containing the model’s chain of thought
  • The content field contains the final response shown to the user

Conversation Reasoning Format

This is what one row/example from the reasoning dataset looks like in conversation format:
{
  "messages": [
    {"role": "user", "content": "What is the capital of France?"},
    {
      "role": "assistant",
      "reasoning": "The user is asking about the capital of France. France is a country in Western Europe. Its capital city is Paris, which has been the capital since the 10th century.",
      "content": "The capital of France is Paris."
    }
  ]
}

Preference Reasoning Format

{
  "input": {
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  },
  "preferred_output": [
    {
      "role": "assistant",
      "reasoning": "The user is asking about the capital of France. France is a country in Western Europe. Its capital city is Paris.",
      "content": "The capital of France is Paris."
    }
  ],
  "non_preferred_output": [
    {
      "role": "assistant",
      "reasoning": "Hmm, let me think about European capitals.",
      "content": "The capital of France is Berlin."
    }
  ]
}

Supported Models

The following models support reasoning fine-tuning:
OrganizationModel NameModel String for API
QwenQwen 3 0.6B BaseQwen/Qwen3-0.6B-Base
QwenQwen 3 0.6BQwen/Qwen3-0.6B
QwenQwen 3 1.7B BaseQwen/Qwen3-1.7B-Base
QwenQwen 3 1.7BQwen/Qwen3-1.7B
QwenQwen 3 4B BaseQwen/Qwen3-4B-Base
QwenQwen 3 4BQwen/Qwen3-4B
QwenQwen 3 8B BaseQwen/Qwen3-8B-Base
QwenQwen 3 8BQwen/Qwen3-8B
QwenQwen 3 14B BaseQwen/Qwen3-14B-Base
QwenQwen 3 14BQwen/Qwen3-14B
QwenQwen 3 32BQwen/Qwen3-32B
QwenQwen 3 32B 16kQwen/Qwen3-32B-16k
QwenQwen 3 30B A3B BaseQwen/Qwen3-30B-A3B-Base
QwenQwen 3 30B A3BQwen/Qwen3-30B-A3B
QwenQwen 3 235B A22BQwen/Qwen3-235B-A22B
QwenQwen 3 Next 80B A3B ThinkingQwen/Qwen3-Next-80B-A3B-Thinking
Z.aiGLM 4.6zai-org/GLM-4.6
Z.aiGLM 4.7zai-org/GLM-4.7

Check and Upload Dataset

To upload your data, use the CLI or our Python library:
together files check "reasoning_dataset.jsonl"

together files upload "reasoning_dataset.jsonl"
You’ll see the following output once the upload finishes:
{
  "id": "file-629e58b4-ff73-438c-b2cc-f69542b27980",
  "object": "file",
  "created_at": 1732573871,
  "type": null,
  "purpose": "fine-tune",
  "filename": "reasoning_dataset.jsonl",
  "bytes": 0,
  "line_count": 0,
  "processed": false,
  "FileType": "jsonl"
}
You’ll be using your file’s ID (the string that begins with file-) to start your fine-tuning job, so store it somewhere before moving on.

Starting a Fine-tuning Job

We support both LoRA and full fine-tuning for reasoning models. For an exhaustive list of all the available fine-tuning parameters, refer to the Together AI Fine-tuning API Reference.
together fine-tuning create \
  --training-file "file-629e58b4-ff73-438c-b2cc-f69542b27980" \
  --model "Qwen/Qwen3-8B" \
  --lora

Full Fine-tuning

together fine-tuning create \
  --training-file "file-629e58b4-ff73-438c-b2cc-f69542b27980" \
  --model "Qwen/Qwen3-8B" \
  --no-lora
You can specify many more fine-tuning parameters to customize your job. See the full list of hyperparameters and their definitions here.

Monitoring Your Fine-tuning Job

Fine-tuning can take time depending on the model size, dataset size, and hyperparameters. Your job will progress through several states: Pending, Queued, Running, Uploading, and Completed. Dashboard Monitoring You can monitor your job on the Together AI jobs dashboard. Check Status via API
together fine-tuning retrieve "your-job-id"

together fine-tuning list-events "your-job-id"

Using Your Fine-tuned Model

Once your fine-tuning job completes, your model will be available for use. You can view your fine-tuned models in your models dashboard.

Dedicated Endpoint Deployment

You can now deploy your fine-tuned model on a dedicated endpoint for production use:
  1. Visit your models dashboard
  2. Find your fine-tuned model and click ”+ CREATE DEDICATED ENDPOINT”
  3. Select your hardware configuration and scaling options
  4. Click “DEPLOY”
You can also deploy programmatically:
import os
from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

response = client.endpoints.create(
    display_name="Fine-tuned Qwen3-8B Reasoning",
    model="your-username/Qwen3-8B-your-suffix",
    hardware="4x_nvidia_h100_80gb_sxm",
    autoscaling={"min_replicas": 1, "max_replicas": 1},
)

print(response)
Running this code will deploy a dedicated endpoint, which incurs charges. For detailed documentation around how to deploy, delete and modify endpoints see the Endpoints API Reference. For more details, read the detailed walkthrough How-to: Fine-tuning.