-
Training Precision Type indicates the precision type used during training for each model.
- AMP (Automated Mixed Precision): AMP allows the training speed to be faster with less memory usage while preserving convergence behavior compared to using float32. Learn more about AMP in this PyTorch blog.
- bf16 (bfloat 16): This uses bf16 for all weights. Some large models on our platform use full bf16 training for better memory usage and training speed.
- For batch sizes of 1, Gradient accumulation 8 is used, so effectively you will get batch size 8 (iteration time is slower).
- Long-context fine-tuning of Llama 3.1 (8B) Reference, Llama 3.1 (70B) Reference, Llama 3.1 Instruct (70B) Reference for context sizes of 32K-131K is only supported using the LoRA method.
- For Llama 3.1 (405B) Fine-tuning, please contact us.
LoRA Fine-tuning
Organization | Model Name | Model String for API | Context Length (SFT) | Context Length (DPO) | Max Batch Size (SFT) | Max Batch Size (DPO) | Min Batch Size | Training Precision Type |
---|---|---|---|---|---|---|---|---|
OpenAI | gpt-oss-20b | openai/gpt-oss-20b | 16384 | 8192 | 8 | 8 | 8 | AMP |
OpenAI | gpt-oss-120b | openai/gpt-oss-120b | 16384 | 8192 | 16 | 16 | 16 | AMP |
DeepSeek | DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1-0528 | 16384 | 8192 | 32 | 32 | 32 | AMP |
DeepSeek | DeepSeek-R1 | deepseek-ai/DeepSeek-R1 | 16384 | 8192 | 32 | 32 | 32 | AMP |
DeepSeek | DeepSeek-V3.1 | deepseek-ai/DeepSeek-V3.1 | 16384 | 8192 | 32 | 32 | 32 | AMP |
DeepSeek | DeepSeek-V3-0324 | deepseek-ai/DeepSeek-V3-0324 | 16384 | 8192 | 32 | 32 | 32 | AMP |
DeepSeek | DeepSeek-V3 | deepseek-ai/DeepSeek-V3 | 16384 | 8192 | 32 | 32 | 32 | AMP |
DeepSeek | DeepSeek-V3.1-Base | deepseek-ai/DeepSeek-V3.1-Base | 16384 | 8192 | 32 | 32 | 32 | AMP |
DeepSeek | DeepSeek-V3-Base | deepseek-ai/DeepSeek-V3-Base | 16384 | 8192 | 32 | 32 | 32 | AMP |
DeepSeek | DeepSeek-R1-Distill-Llama-70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | 24576 | 8192 | 8 | 8 | 8 | bf16 |
DeepSeek | DeepSeek-R1-Distill-Qwen-14B | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 65536 | 12288 | 8 | 8 | 8 | AMP |
DeepSeek | DeepSeek-R1-Distill-Qwen-1.5B | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | 131072 | 16384 | 8 | 8 | 8 | AMP |
Meta | meta-llama/Llama-4-Scout-17B-16E | meta-llama/Llama-4-Scout-17B-16E | 16384 | 8192 | 8 | 8 | 8 | AMP |
Meta | meta-llama/Llama-4-Scout-17B-16E-Instruct | meta-llama/Llama-4-Scout-17B-16E-Instruct | 16384 | 8192 | 8 | 8 | 8 | AMP |
Meta | meta-llama/Llama-4-Maverick-17B-128E | meta-llama/Llama-4-Maverick-17B-128E | 16384 | 8192 | 16 | 16 | 16 | AMP |
Meta | meta-llama/Llama-4-Maverick-17B-128E-Instruct | meta-llama/Llama-4-Maverick-17B-128E-Instruct | 16384 | 8192 | 16 | 16 | 16 | AMP |
gemma-3-1b-it | google/gemma-3-1b-it | 32768 | 12288 | 8 | 8 | 8 | AMP | |
gemma-3-1b-pt | google/gemma-3-1b-pt | 32768 | 12288 | 8 | 8 | 8 | AMP | |
gemma-3-4b-it | google/gemma-3-4b-it | 131072 | 12288 | 8 | 8 | 8 | AMP | |
gemma-3-4b-pt | google/gemma-3-4b-pt | 131072 | 12288 | 8 | 8 | 8 | AMP | |
gemma-3-12b-it | google/gemma-3-12b-it | 16384 | 8192 | 8 | 8 | 8 | AMP | |
gemma-3-12b-pt | google/gemma-3-12b-pt | 65536 | 8192 | 8 | 8 | 8 | AMP | |
gemma-3-27b-it | google/gemma-3-27b-it | 49152 | 8192 | 8 | 8 | 8 | AMP | |
gemma-3-27b-pt | google/gemma-3-27b-pt | 49152 | 8192 | 8 | 8 | 8 | AMP | |
Qwen | Qwen3-0.6B | Qwen/Qwen3-0.6B | 32768 | 24576 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-0.6B-Base | Qwen/Qwen3-0.6B-Base | 32768 | 24576 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-1.7B | Qwen/Qwen3-1.7B | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-1.7B-Base | Qwen/Qwen3-1.7B-Base | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-4B | Qwen/Qwen3-4B | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-4B-Base | Qwen/Qwen3-4B-Base | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-8B | Qwen/Qwen3-8B | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-8B-Base | Qwen/Qwen3-8B-Base | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-14B | Qwen/Qwen3-14B | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-14B-Base | Qwen/Qwen3-14B-Base | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-32B | Qwen/Qwen3-32B | 24576 | 4096 | 8 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-30B-A3B-Base | Qwen/Qwen3-30B-A3B-Base | 8192 | 8192 | 16 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-30B-A3B | Qwen/Qwen3-30B-A3B | 8192 | 8192 | 16 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-30B-A3B-Instruct-2507 | Qwen/Qwen3-30B-A3B-Instruct-2507 | 8192 | 8192 | 16 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-235B-A22B | Qwen/Qwen3-235B-A22B | 32768 | 16384 | 1 | 1 | 1 | AMP |
Qwen | Qwen/Qwen3-235B-A22B-Instruct-2507 | Qwen/Qwen3-235B-A22B-Instruct-2507 | 32768 | 16384 | 1 | 1 | 1 | AMP |
Qwen | Qwen/Qwen3-Coder-30B-A3B-Instruct | Qwen/Qwen3-Coder-30B-A3B-Instruct | 8192 | 8192 | 16 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-Coder-480B-A35B-Instruct | Qwen/Qwen3-Coder-480B-A35B-Instruct | 131072 | 32768 | 1 | 1 | 1 | AMP |
Meta | Llama-3.3-70B-Instruct-Reference | meta-llama/Llama-3.3-70B-Instruct-Reference | 24576 | 8192 | 8 | 8 | 8 | bf16 |
Meta | Llama-3.2-3B-Instruct | meta-llama/Llama-3.2-3B-Instruct | 131072 | 24576 | 8 | 8 | 8 | AMP |
Meta | Llama-3.2-3B | meta-llama/Llama-3.2-3B | 131072 | 24576 | 8 | 8 | 8 | AMP |
Meta | Llama-3.2-1B-Instruct | meta-llama/Llama-3.2-1B-Instruct | 131072 | 24576 | 8 | 8 | 8 | AMP |
Meta | Llama-3.2-1B | meta-llama/Llama-3.2-1B | 131072 | 24576 | 8 | 8 | 8 | AMP |
Meta | Meta-Llama-3.1-8B-Instruct-Reference | meta-llama/Meta-Llama-3.1-8B-Instruct-Reference | 131072 | 16384 | 8 | 8 | 8 | AMP |
Meta | Meta-Llama-3.1-8B-Reference | meta-llama/Meta-Llama-3.1-8B-Reference | 131072 | 16384 | 8 | 8 | 8 | AMP |
Meta | Meta-Llama-3.1-70B-Instruct-Reference | meta-llama/Meta-Llama-3.1-70B-Instruct-Reference | 24576 | 8192 | 8 | 8 | 8 | bf16 |
Meta | Meta-Llama-3.1-70B-Reference | meta-llama/Meta-Llama-3.1-70B-Reference | 24576 | 8192 | 8 | 8 | 8 | bf16 |
Meta | Meta-Llama-3-8B-Instruct | meta-llama/Meta-Llama-3-8B-Instruct | 8192 | 8192 | 16 | 16 | 8 | AMP |
Meta | Meta-Llama-3-8B | meta-llama/Meta-Llama-3-8B | 8192 | 8192 | 16 | 16 | 8 | AMP |
Meta | Meta-Llama-3-70B-Instruct | meta-llama/Meta-Llama-3-70B-Instruct | 8192 | 8192 | 8 | 8 | 8 | bf16 |
Qwen | Qwen2.5-72B-Instruct | Qwen/Qwen2.5-72B-Instruct | 24576 | 8192 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-72B | Qwen/Qwen2.5-72B | 24576 | 8192 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-32B-Instruct | Qwen/Qwen2.5-32B-Instruct | 32768 | 12288 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-32B | Qwen/Qwen2.5-32B | 49152 | 12288 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-14B-Instruct | Qwen/Qwen2.5-14B-Instruct | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-14B | Qwen/Qwen2.5-14B | 65536 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-7B-Instruct | Qwen/Qwen2.5-7B-Instruct | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-7B | Qwen/Qwen2.5-7B | 131072 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-3B-Instruct | Qwen/Qwen2.5-3B-Instruct | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-3B | Qwen/Qwen2.5-3B | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-1.5B-Instruct | Qwen/Qwen2.5-1.5B-Instruct | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2.5-1.5B | Qwen/Qwen2.5-1.5B | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2-72B-Instruct | Qwen/Qwen2-72B-Instruct | 32768 | 8192 | 16 | 16 | 16 | AMP |
Qwen | Qwen2-72B | Qwen/Qwen2-72B | 32768 | 8192 | 16 | 16 | 16 | AMP |
Qwen | Qwen2-7B-Instruct | Qwen/Qwen2-7B-Instruct | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2-7B | Qwen/Qwen2-7B | 131072 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2-1.5B-Instruct | Qwen/Qwen2-1.5B-Instruct | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2-1.5B | Qwen/Qwen2-1.5B | 131072 | 16384 | 8 | 8 | 8 | AMP |
Mistral AI | Mixtral-8x7B-Instruct-v0.1 | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32768 | 32768 | 8 | 8 | 8 | bf16 |
Mistral AI | Mixtral-8x7B-v0.1 | mistralai/Mixtral-8x7B-v0.1 | 32768 | 32768 | 8 | 8 | 8 | bf16 |
Mistral AI | Mistral-7B-Instruct-v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 | 32768 | 8 | 8 | 8 | AMP |
Mistral AI | Mistral-7B-v0.1 | mistralai/Mistral-7B-v0.1 | 32768 | 32768 | 8 | 8 | 8 | AMP |
Teknium | OpenHermes-2p5-Mistral-7B | teknium/OpenHermes-2p5-Mistral-7B | 32768 | 32768 | 8 | 8 | 8 | AMP |
Meta | CodeLlama-7b-hf | codellama/CodeLlama-7b-hf | 16384 | 16384 | 16 | 16 | 8 | AMP |
Together | llama-2-7b-chat | togethercomputer/llama-2-7b-chat | 4096 | 4096 | 64 | 64 | 8 | AMP |
LoRA Long-context Fine-tuning
Organization | Model Name | Model String for API | Context Length (SFT) | Context Length (DPO) | Max Batch Size (SFT) | Max Batch Size (DPO) | Min Batch Size | Training Precision Type |
---|---|---|---|---|---|---|---|---|
Deepseek | DeepSeek-R1-Distill-Llama-70B-32k | deepseek-ai/DeepSeek-R1-Distill-Llama-70B-32k | 32768 | 16384 | 1 | 1 | 1 | AMP |
Deepseek | DeepSeek-R1-Distill-Llama-70B-131k | deepseek-ai/DeepSeek-R1-Distill-Llama-70B-131k | 131072 | 16384 | 1 | 1 | 1 | AMP |
Meta | Llama-3.3-70B-32k-Instruct-Reference | meta-llama/Llama-3.3-70B-32k-Instruct-Reference | 32768 | 32768 | 1 | 1 | 1 | AMP |
Meta | Llama-3.3-70B-131k-Instruct-Reference | meta-llama/Llama-3.3-70B-131k-Instruct-Reference | 131072 | 65536 | 1 | 1 | 1 | AMP |
Meta | Meta-Llama-3.1-8B-131k-Instruct-Reference | meta-llama/Meta-Llama-3.1-8B-131k-Instruct-Reference | 131072 | 131072 | 1 | 1 | 1 | AMP |
Meta | Meta-Llama-3.1-8B-131k-Reference | meta-llama/Meta-Llama-3.1-8B-131k-Reference | 131072 | 131072 | 1 | 1 | 1 | AMP |
Meta | Meta-Llama-3.1-70B-32k-Instruct-Reference | meta-llama/Meta-Llama-3.1-70B-32k-Instruct-Reference | 32768 | 32768 | 1 | 1 | 1 | AMP |
Meta | Meta-Llama-3.1-70B-131k-Instruct-Reference | meta-llama/Meta-Llama-3.1-70B-131k-Instruct-Reference | 131072 | 65536 | 1 | 1 | 1 | AMP |
Meta | Meta-Llama-3.1-70B-32k-Reference | meta-llama/Meta-Llama-3.1-70B-32k-Reference | 32768 | 32768 | 1 | 1 | 1 | AMP |
Meta | Meta-Llama-3.1-70B-131k-Reference | meta-llama/Meta-Llama-3.1-70B-131k-Reference | 131072 | 65536 | 1 | 1 | 1 | AMP |
Full Fine-tuning
Organization | Model Name | Model String for API | Context Length (SFT) | Context Length (DPO) | Max Batch Size (SFT) | Max Batch Size (DPO) | Min Batch Size | Training Precision Type |
---|---|---|---|---|---|---|---|---|
Deepseek | DeepSeek-R1-Distill-Llama-70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | 24576 | 8192 | 16 | 16 | 16 | bf16 |
Deepseek | DeepSeek-R1-Distill-Qwen-14B | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 65536 | 12288 | 8 | 8 | 8 | AMP |
Deepseek | DeepSeek-R1-Distill-Qwen-1.5B | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | 131072 | 16384 | 8 | 8 | 8 | AMP |
gemma-3-1b-it | google/gemma-3-1b-it | 32768 | 12288 | 8 | 8 | 8 | AMP | |
gemma-3-1b-pt | google/gemma-3-1b-pt | 32768 | 12288 | 8 | 8 | 8 | AMP | |
gemma-3-4b-it | google/gemma-3-4b-it | 131072 | 12288 | 8 | 8 | 8 | AMP | |
gemma-3-4b-pt | google/gemma-3-4b-pt | 131072 | 12288 | 8 | 8 | 8 | AMP | |
gemma-3-12b-it | google/gemma-3-12b-it | 16384 | 8192 | 8 | 8 | 8 | AMP | |
gemma-3-12b-pt | google/gemma-3-12b-pt | 65536 | 8192 | 8 | 8 | 8 | AMP | |
gemma-3-27b-it | google/gemma-3-27b-it | 49152 | 8192 | 16 | 16 | 16 | AMP | |
gemma-3-27b-pt | google/gemma-3-27b-pt | 49152 | 8192 | 16 | 16 | 16 | AMP | |
Qwen | Qwen3-0.6B | Qwen/Qwen3-0.6B | 32768 | 24576 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-0.6B-Base | Qwen/Qwen3-0.6B-Base | 32768 | 24576 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-1.7B | Qwen/Qwen3-1.7B | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-1.7B-Base | Qwen/Qwen3-1.7B-Base | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-4B | Qwen/Qwen3-4B | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-4B-Base | Qwen/Qwen3-4B-Base | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-8B | Qwen/Qwen3-8B | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-8B-Base | Qwen/Qwen3-8B-Base | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-14B | Qwen/Qwen3-14B | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-14B-Base | Qwen/Qwen3-14B-Base | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen3-32B | Qwen/Qwen3-32B | 24576 | 4096 | 16 | 16 | 16 | AMP |
Qwen | Qwen/Qwen3-30B-A3B-Base | Qwen/Qwen3-30B-A3B-Base | 8192 | 8192 | 8 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-30B-A3B | Qwen/Qwen3-30B-A3B | 8192 | 8192 | 8 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-30B-A3B-Instruct-2507 | Qwen/Qwen3-30B-A3B-Instruct-2507 | 8192 | 8192 | 8 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-Coder-30B-A3B-Instruct | Qwen/Qwen3-Coder-30B-A3B-Instruct | 8192 | 8192 | 8 | 8 | 8 | AMP |
Meta | Llama-3.3-70B-Instruct-Reference | meta-llama/Llama-3.3-70B-Instruct-Reference | 24576 | 8192 | 16 | 16 | 16 | bf16 |
Meta | Llama-3.2-3B-Instruct | meta-llama/Llama-3.2-3B-Instruct | 131072 | 24576 | 8 | 8 | 8 | AMP |
Meta | Llama-3.2-3B | meta-llama/Llama-3.2-3B | 131072 | 24576 | 8 | 8 | 8 | AMP |
Meta | Llama-3.2-1B-Instruct | meta-llama/Llama-3.2-1B-Instruct | 131072 | 24576 | 8 | 8 | 8 | AMP |
Meta | Llama-3.2-1B | meta-llama/Llama-3.2-1B | 131072 | 24576 | 8 | 8 | 8 | AMP |
Meta | Meta-Llama-3.1-8B-Instruct-Reference | meta-llama/Meta-Llama-3.1-8B-Instruct-Reference | 131072 | 16384 | 8 | 8 | 8 | AMP |
Meta | Meta-Llama-3.1-8B-Reference | meta-llama/Meta-Llama-3.1-8B-Reference | 131072 | 16384 | 8 | 8 | 8 | AMP |
Meta | Meta-Llama-3.1-70B-Instruct-Reference | meta-llama/Meta-Llama-3.1-70B-Instruct-Reference | 24576 | 8192 | 16 | 16 | 16 | bf16 |
Meta | Meta-Llama-3.1-70B-Reference | meta-llama/Meta-Llama-3.1-70B-Reference | 24576 | 8192 | 16 | 16 | 16 | bf16 |
Meta | Meta-Llama-3-8B-Instruct | meta-llama/Meta-Llama-3-8B-Instruct | 8192 | 8192 | 16 | 16 | 8 | AMP |
Meta | Meta-Llama-3-8B | meta-llama/Meta-Llama-3-8B | 8192 | 8192 | 16 | 16 | 8 | AMP |
Meta | Meta-Llama-3-70B-Instruct | meta-llama/Meta-Llama-3-70B-Instruct | 8192 | 8192 | 16 | 16 | 16 | bf16 |
Qwen | Qwen2-7B-Instruct | Qwen/Qwen2-7B-Instruct | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2-7B | Qwen/Qwen2-7B | 131072 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2-1.5B-Instruct | Qwen/Qwen2-1.5B-Instruct | 32768 | 16384 | 8 | 8 | 8 | AMP |
Qwen | Qwen2-1.5B | Qwen/Qwen2-1.5B | 131072 | 16384 | 8 | 8 | 8 | AMP |
Mistral AI | Mixtral-8x7B-Instruct-v0.1 | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32768 | 32768 | 16 | 16 | 16 | bf16 |
Mistral AI | Mixtral-8x7B-v0.1 | mistralai/Mixtral-8x7B-v0.1 | 32768 | 32768 | 16 | 16 | 16 | bf16 |
Mistral AI | Mistral-7B-Instruct-v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 | 32768 | 8 | 8 | 8 | AMP |
Mistral AI | Mistral-7B-v0.1 | mistralai/Mistral-7B-v0.1 | 32768 | 32768 | 8 | 8 | 8 | AMP |
Teknium | OpenHermes-2p5-Mistral-7B | teknium/OpenHermes-2p5-Mistral-7B | 32768 | 32768 | 8 | 8 | 8 | AMP |
Meta | CodeLlama-7b-hf | codellama/CodeLlama-7b-hf | 16384 | 16384 | 16 | 16 | 8 | AMP |
Together | llama-2-7b-chat | togethercomputer/llama-2-7b-chat | 4096 | 4096 | 64 | 64 | 8 | AMP |