tool_calls with high reliability, useful for agents and any pipeline that depends on structured function invocation.
This page covers the function-calling data shape, supported models, and launch parameters.
Supported models
The following bases support function-calling fine-tuning. See supported models for context lengths and batch limits.| Organization | Model | API ID |
|---|---|---|
| Qwen | Qwen 2.5 (1.5B–72B) | Qwen/Qwen2.5-* |
| Qwen | Qwen 3 (0.6B–32B, 30B-A3B, 235B-A22B) | Qwen/Qwen3-* |
| Qwen | Qwen 3 Coder (30B-A3B, 480B-A35B) | Qwen/Qwen3-Coder-* |
| Qwen | Qwen 3 Next (80B-A3B Instruct, Thinking) | Qwen/Qwen3-Next-80B-A3B-* |
| Qwen | Qwen 3 VL (8B, 30B-A3B, 32B, 235B-A22B) | Qwen/Qwen3-VL-* |
| Qwen | Qwen 3.5 (0.8B–397B) | Qwen/Qwen3.5-* |
| Qwen | Qwen 3.6 35B A3B | Qwen/Qwen3.6-35B-A3B |
| Moonshot AI | Kimi K2 family (Base, Instruct, Thinking, 0905), Kimi K2.5 | moonshotai/Kimi-K2* |
| Z.ai | GLM 4.6, GLM 4.7, GLM 5, GLM 5.1 | zai-org/GLM-* |
| Gemma 4 31B IT, Gemma 4 26B A4B IT | google/gemma-4-* | |
| NVIDIA | Nemotron Nano 9B v2, Nemotron 3 Super 120B A12B BF16 | nvidia/NVIDIA-Nemotron-* |
| Meta | Llama 3.1 (8B, 70B, 405B), Llama 3.2 (1B, 3B), Llama 3.3 70B | meta-llama/Meta-Llama-3.1-*, meta-llama/Llama-3.2-*, meta-llama/Llama-3.3-* |
| Meta | Llama 4 Scout 17B 16E (Instruct, VLM), Llama 4 Maverick 17B 128E (Instruct, VLM) | meta-llama/Llama-4-* |
| OpenAI | GPT-OSS 20B, GPT-OSS 120B | openai/gpt-oss-* |
Prepare your data
Prepare data in a JSONL file. Each line should carry:messages: The conversation. Assistant messages can includetool_calls(a list of structured invocation objects) in place ofcontent. Tool results come back via messages with thetoolrole.tools: A list of available tools for the example.
Conversational format
Preference format
For preference fine-tuning, thetools array nests inside input. See Preference tuning for the broader DPO workflow.
Validate and upload
Upload your data using the Together Python/TypeScript SDK or the Together CLI:Launch the job
LoRA is the default and recommended training mode. Passlora=False for full fine-tuning.
Watch and deploy
Function-calling jobs use the same lifecycle as text jobs:- Poll the job with the SDK or CLI. Expect 10 to 30 minutes for a LoRA job on an 8B model with a few thousand examples.
- Deploy the result on a dedicated endpoint and call it with the same function-calling request shape as the base model.