Introduction
Reasoning fine-tuning allows you to adapt models that support chain-of-thought reasoning. By providingreasoning or reasoning_content fields alongside assistant responses, you can shape how a model thinks through problems before producing an answer.
This guide covers the specific steps for reasoning fine-tuning. For general fine-tuning concepts, environment setup, and hyperparameter details, refer to the Fine-tuning Guide.
Quick Links
- Dataset Requirements
- Supported Models
- Check and Upload Dataset
- Start a Fine-tuning Job
- Monitor Progress
- Deploy Your Model
Reasoning Dataset
Dataset Requirements:- Format:
.jsonlfile - Supported types: Conversational, Preferential — more details on their purpose here
- Assistant messages support a
reasoningorreasoning_contentfield containing the model’s chain of thought - The
contentfield contains the final response shown to the user
Conversation Reasoning Format
This is what one row/example from the reasoning dataset looks like in conversation format:Preference Reasoning Format
Supported Models
The following models support reasoning fine-tuning:| Organization | Model Name | Model String for API |
|---|---|---|
| Qwen | Qwen 3 0.6B Base | Qwen/Qwen3-0.6B-Base |
| Qwen | Qwen 3 0.6B | Qwen/Qwen3-0.6B |
| Qwen | Qwen 3 1.7B Base | Qwen/Qwen3-1.7B-Base |
| Qwen | Qwen 3 1.7B | Qwen/Qwen3-1.7B |
| Qwen | Qwen 3 4B Base | Qwen/Qwen3-4B-Base |
| Qwen | Qwen 3 4B | Qwen/Qwen3-4B |
| Qwen | Qwen 3 8B Base | Qwen/Qwen3-8B-Base |
| Qwen | Qwen 3 8B | Qwen/Qwen3-8B |
| Qwen | Qwen 3 14B Base | Qwen/Qwen3-14B-Base |
| Qwen | Qwen 3 14B | Qwen/Qwen3-14B |
| Qwen | Qwen 3 32B | Qwen/Qwen3-32B |
| Qwen | Qwen 3 32B 16k | Qwen/Qwen3-32B-16k |
| Qwen | Qwen 3 30B A3B Base | Qwen/Qwen3-30B-A3B-Base |
| Qwen | Qwen 3 30B A3B | Qwen/Qwen3-30B-A3B |
| Qwen | Qwen 3 235B A22B | Qwen/Qwen3-235B-A22B |
| Qwen | Qwen 3 Next 80B A3B Thinking | Qwen/Qwen3-Next-80B-A3B-Thinking |
| Z.ai | GLM 4.6 | zai-org/GLM-4.6 |
| Z.ai | GLM 4.7 | zai-org/GLM-4.7 |
Check and Upload Dataset
To upload your data, use the CLI or our Python library:file-) to start your fine-tuning job, so store it somewhere before moving on.
Starting a Fine-tuning Job
We support both LoRA and full fine-tuning for reasoning models. For an exhaustive list of all the available fine-tuning parameters, refer to the Together AI Fine-tuning API Reference.LoRA Fine-tuning (Recommended)
Full Fine-tuning
Monitoring Your Fine-tuning Job
Fine-tuning can take time depending on the model size, dataset size, and hyperparameters. Your job will progress through several states: Pending, Queued, Running, Uploading, and Completed. Dashboard Monitoring You can monitor your job on the Together AI jobs dashboard. Check Status via APIUsing Your Fine-tuned Model
Once your fine-tuning job completes, your model will be available for use. You can view your fine-tuned models in your models dashboard.Dedicated Endpoint Deployment
You can now deploy your fine-tuned model on a dedicated endpoint for production use:- Visit your models dashboard
- Find your fine-tuned model and click ”+ CREATE DEDICATED ENDPOINT”
- Select your hardware configuration and scaling options
- Click “DEPLOY”