Supported Models

A list of all the models available for fine-tuning.

The following models are available to use with our fine-tuning API. Get started with fine-tuning a model!

  • Training Precision Type indicates the precision type used during training for each model.
    • AMP (Automated Mixed Precision): AMP allows the training speed to be faster with less memory usage while preserving convergence behavior compared to using float32. Learn more about AMP in this PyTorch blog.
    • bf16 (bfloat 16): This uses bf16 for all weights. Some large models on our platform uses full bf16 training for better memory usage and training speed.
  • Long-context fine-tuning of Llama 3.1 (8B) Reference, Llama 3.1 (8B) Reference, Llama 3.1 (70B) Reference, Llama 3.1 Instruct (70B) Reference for context sizes of 32K-131K is only supported using the LoRA method.
  • For Llama 3.1 (405B) Fine-tuning, please contact us.

LoRA Fine-tuning

OrganizationModel NameModel String for APIContext LengthMax Batch SizeMax Batch Size (DPO)Min Batch SizeTraining Precision Type*
Googlegoogle/gemma-3-27b-itgoogle/gemma-3-27b-it12288888AMP
Googlegoogle/gemma-3-27b-ptgoogle/gemma-3-27b-pt12288888AMP
Googlegoogle/gemma-3-12b-itgoogle/gemma-3-12b-it16384888AMP
Googlegoogle/gemma-3-12b-ptgoogle/gemma-3-12b-pt16384888AMP
Googlegoogle/gemma-3-4b-itgoogle/gemma-3-4b-it16384888AMP
Googlegoogle/gemma-3-4b-ptgoogle/gemma-3-4b-pt16384888AMP
Googlegoogle/gemma-3-1b-itgoogle/gemma-3-1b-it16384888AMP
Googlegoogle/gemma-3-1b-ptgoogle/gemma-3-1b-pt16384888AMP
QwenQwen/Qwen3-32BQwen/Qwen3-32B81921688AMP
QwenQwen/Qwen3-14BQwen/Qwen3-14B81922488AMP
QwenQwen/Qwen3-14B-BaseQwen/Qwen3-14B-Base81922488AMP
QwenQwen/Qwen3-8BQwen/Qwen3-8B819232168AMP
QwenQwen/Qwen3-8B-BaseQwen/Qwen3-8B-Base819232168AMP
QwenQwen/Qwen3-4BQwen/Qwen3-4B819232168AMP
QwenQwen/Qwen3-4B-BaseQwen/Qwen3-4B-Base819232168AMP
QwenQwen/Qwen3-1.7BQwen/Qwen3-1.7B819240168AMP
QwenQwen/Qwen3-1.7B-BaseQwen/Qwen3-1.7B-Base819240168AMP
QwenQwen/Qwen3-0.6BQwen/Qwen3-0.6B819240168AMP
QwenQwen/Qwen3-0.6B-BaseQwen/Qwen3-0.6B-Base819240168AMP
DeepseekDeepSeek-R1-Distill-Llama-70Bdeepseek-ai/DeepSeek-R1-Distill-Llama-70B8192888AMP
DeepseekDeepSeek-R1-Distill-Qwen-14Bdeepseek-ai/DeepSeek-R1-Distill-Qwen-14B819240168AMP
DeepseekDeepSeek-R1-Distill-Qwen-1.5Bdeepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B819248248AMP
MetaLlama 3.3 Instruct (70B) Referencemeta-llama/Llama-3.3-70B-Instruct-Reference8192888AMP
MetaLlama 3.2 Instruct (3B)meta-llama/Llama-3.2-3B-Instruct819232168AMP
MetaLlama 3.2 Instruct (1B)meta-llama/Llama-3.2-1B-Instruct819232168AMP
MetaLlama 3.1 (8B) Referencemeta-llama/Meta-Llama-3.1-8B-Reference819232168AMP
MetaLlama 3.1 Instruct (8B) Referencemeta-llama/Meta-Llama-3.1-8B-Instruct-Reference819232168AMP
MetaLlama 3.1 (70B) Referencemeta-llama/Meta-Llama-3.1-70B-Reference8192888AMP
MetaLlama 3.1 Instruct (70B) Referencemeta-llama/Meta-Llama-3.1-70B-Instruct-Reference8192888AMP
MetaLlama 3 (8B)meta-llama/Meta-Llama-3-8B819232168AMP
MetaLlama 3 Instruct (8B)meta-llama/Meta-Llama-3-8B-Instruct819232168AMP
MetaLlama 3 Instruct (70B)meta-llama/Meta-Llama-3-70B-Instruct8192888AMP
MetaLlama-2 Chat (7B)togethercomputer/llama-2-7b-chat4096128648AMP
MetaCodeLlama (7B)codellama/CodeLlama-7b-hf1638432168AMP
Mistral AIMixtral-8x7B (46.7B)mistralai/Mixtral-8x7B-v0.1327681688AMP
Mistral AIMixtral-8x7B Instruct (46.7B)mistralai/Mixtral-8x7B-Instruct-v0.1327681688AMP
Mistral AIMistral 7B Instruct v0.2mistralai/Mistral-7B-Instruct-v0.2327681688AMP
Mistral AIMistral 7B v0.1mistralai/Mistral-7B-v0.1819264328AMP
QwenQwen2.5-72BQwen/Qwen2.5-72B-Instruct81921688AMP
QwenQwen2.5-14BQwen/Qwen2.5-14B-Instruct819240168AMP
QwenQwen2-1.5BQwen/Qwen2-1.5B819248248AMP
QwenQwen2-1.5B-InstructQwen/Qwen2-1.5B-Instruct819248248AMP
QwenQwen2-7BQwen/Qwen2-7B819232168AMP
QwenQwen2-7B-InstructQwen/Qwen2-7B-Instruct819232168AMP
QwenQwen2-72BQwen/Qwen2-72B8192888AMP
QwenQwen2-72B-InstructQwen/Qwen2-72B-Instruct8192888AMP
TekniumOpenHermes 2.5 Mistral 7Bteknium/OpenHermes-2p5-Mistral-7B819264328AMP

LoRA Long-context Fine-tuning

OrganizationModel NameModel String for APIContext LengthMax Batch SizeMax Batch Size (DPO)Min Batch SizeTraining Precision Type*
Deepseekdeepseek-ai/DeepSeek-R1-Distill-Llama-70Bdeepseek-ai/DeepSeek-R1-Distill-Llama-70B-32k327681*1*1*AMP
MetaLlama 3.3 Instruct (70B) Referencemeta-llama/Llama-3.3-70B-32k-Instruct-Reference327681*1*1*AMP
MetaLlama 3.1 (8B) Referencemeta-llama/Meta-Llama-3.1-8B-32k-Reference32768888AMP
MetaLlama 3.1 Instruct (8B) Referencemeta-llama/Meta-Llama-3.1-8B-32k-Instruct-Reference32768888AMP
MetaLlama 3.1 (70B) Referencemeta-llama/Meta-Llama-3.1-70B-32k-Reference327681*1*1*AMP
MetaLlama 3.1 Instruct (70B) Referencemeta-llama/Meta-Llama-3.1-70B-32k-Instruct-Reference327681*1*1*AMP

1* -- Gradient accumulation 8 is used, so effectively you will get batch size 8 (iteration time is slower).

Full Fine-tuning

OrganizationModel NameModel String for APIContext LengthMax Batch SizeMax Batch Size (DPO)Min Batch SizeTraining Precision Type*
Googlegoogle/gemma-3-27b-itgoogle/gemma-3-27b-it1228816816AMP
Googlegoogle/gemma-3-27b-ptgoogle/gemma-3-27b-pt1228816816AMP
Googlegoogle/gemma-3-12b-itgoogle/gemma-3-12b-it16384888AMP
Googlegoogle/gemma-3-12b-ptgoogle/gemma-3-12b-pt16384888AMP
Googlegoogle/gemma-3-4b-itgoogle/gemma-3-4b-it16384888AMP
Googlegoogle/gemma-3-4b-ptgoogle/gemma-3-4b-pt16384888AMP
Googlegoogle/gemma-3-1b-itgoogle/gemma-3-1b-it163842488AMP
Googlegoogle/gemma-3-1b-ptgoogle/gemma-3-1b-pt163842488AMP
QwenQwen/Qwen3-32BQwen/Qwen3-32B8192888AMP
QwenQwen/Qwen3-14BQwen/Qwen3-14B81921688AMP
QwenQwen/Qwen3-14B-BaseQwen/Qwen3-14B-Base81921688AMP
QwenQwen/Qwen3-8BQwen/Qwen3-8B81922488AMP
QwenQwen/Qwen3-8B-BaseQwen/Qwen3-8B-Base81922488AMP
QwenQwen/Qwen3-4BQwen/Qwen3-4B819232168AMP
QwenQwen/Qwen3-4B-BaseQwen/Qwen3-4B-Base819232168AMP
QwenQwen/Qwen3-1.7BQwen/Qwen3-1.7B819240168AMP
QwenQwen/Qwen3-1.7B-BaseQwen/Qwen3-1.7B-Base819240168AMP
QwenQwen/Qwen3-0.6BQwen/Qwen3-0.6B819240168AMP
QwenQwen/Qwen3-0.6B-BaseQwen/Qwen3-0.6B-Base819240168AMP
DeepseekDeepSeek-R1-Distill-Llama-70Bdeepseek-ai/DeepSeek-R1-Distill-Llama-70B819216816bf16
DeepseekDeepSeek-R1-Distill-Qwen-14Bdeepseek-ai/DeepSeek-R1-Distill-Qwen-14B819232168AMP
DeepseekDeepSeek-R1-Distill-Qwen-1.5Bdeepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B819248248AMP
MetaLlama 3.3 Instruct (70B) Referencemeta-llama/Llama-3.3-70B-Instruct-Reference819216816bf16
MetaLlama 3.1 (8B) Referencemeta-llama/Meta-Llama-3.1-8B-Reference81922488AMP
MetaLlama 3.1 Instruct (8B) Referencemeta-llama/Meta-Llama-3.1-8B-Instruct-Reference81922488AMP
MetaLlama 3.1 (70B) Referencemeta-llama/Meta-Llama-3.1-70B-Reference819216816bf16
MetaLlama 3.1 Instruct (70B) Referencemeta-llama/Meta-Llama-3.1-70B-Instruct-Reference819216816bf16
MetaLlama 3 (8B)meta-llama/Meta-Llama-3-8B81922488AMP
MetaLlama 3 Instruct (8B)meta-llama/Meta-Llama-3-8B-Instruct81922488AMP
MetaLlama 3 Instruct (70B)meta-llama/Meta-Llama-3-70B-Instruct819216816bf16
MetaLlama-2 Chat (7B)togethercomputer/llama-2-7b-chat409696488AMP
MetaCodeLlama (7B)codellama/CodeLlama-7b-hf1638432168AMP
Mistral AIMixtral-8x7B (46.7B)mistralai/Mixtral-8x7B-v0.13276816816bf16
Mistral AIMixtral-8x7B Instruct (46.7B)mistralai/Mixtral-8x7B-Instruct-v0.13276816816bf16
Mistral AIMistral 7B Instruct v0.2mistralai/Mistral-7B-Instruct-v0.2327681688AMP
Mistral AIMistral 7B v0.1mistralai/Mistral-7B-v0.1819264328AMP
QwenQwen2-1.5BQwen/Qwen2-1.5B819248248AMP
QwenQwen2-1.5B-InstructQwen/Qwen2-1.5B-Instruct819248248AMP
QwenQwen2-7BQwen/Qwen2-7B81922488AMP
QwenQwen2-7B-InstructQwen/Qwen2-7B-Instruct81922488AMP
TekniumOpenHermes 2.5 Mistral 7Bteknium/OpenHermes-2p5-Mistral-7B819264328AMP


Request a model