A list of all the models available for fine-tuning.
Organization | Model Name | Model String for API | Context Length | Max Batch Size | Max Batch Size (DPO) | Min Batch Size | Training Precision Type* |
---|---|---|---|---|---|---|---|
google/gemma-3-27b-it | google/gemma-3-27b-it | 12288 | 8 | 8 | 8 | AMP | |
google/gemma-3-27b-pt | google/gemma-3-27b-pt | 12288 | 8 | 8 | 8 | AMP | |
google/gemma-3-12b-it | google/gemma-3-12b-it | 16384 | 8 | 8 | 8 | AMP | |
google/gemma-3-12b-pt | google/gemma-3-12b-pt | 16384 | 8 | 8 | 8 | AMP | |
google/gemma-3-4b-it | google/gemma-3-4b-it | 16384 | 8 | 8 | 8 | AMP | |
google/gemma-3-4b-pt | google/gemma-3-4b-pt | 16384 | 8 | 8 | 8 | AMP | |
google/gemma-3-1b-it | google/gemma-3-1b-it | 16384 | 8 | 8 | 8 | AMP | |
google/gemma-3-1b-pt | google/gemma-3-1b-pt | 16384 | 8 | 8 | 8 | AMP | |
Qwen | Qwen/Qwen3-32B | Qwen/Qwen3-32B | 8192 | 16 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-14B | Qwen/Qwen3-14B | 8192 | 24 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-14B-Base | Qwen/Qwen3-14B-Base | 8192 | 24 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-8B | Qwen/Qwen3-8B | 8192 | 32 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-8B-Base | Qwen/Qwen3-8B-Base | 8192 | 32 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-4B | Qwen/Qwen3-4B | 8192 | 32 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-4B-Base | Qwen/Qwen3-4B-Base | 8192 | 32 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-1.7B | Qwen/Qwen3-1.7B | 8192 | 40 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-1.7B-Base | Qwen/Qwen3-1.7B-Base | 8192 | 40 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-0.6B | Qwen/Qwen3-0.6B | 8192 | 40 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-0.6B-Base | Qwen/Qwen3-0.6B-Base | 8192 | 40 | 16 | 8 | AMP |
Deepseek | DeepSeek-R1-Distill-Llama-70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | 8192 | 8 | 8 | 8 | AMP |
Deepseek | DeepSeek-R1-Distill-Qwen-14B | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 8192 | 40 | 16 | 8 | AMP |
Deepseek | DeepSeek-R1-Distill-Qwen-1.5B | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | 8192 | 48 | 24 | 8 | AMP |
Meta | Llama 3.3 Instruct (70B) Reference | meta-llama/Llama-3.3-70B-Instruct-Reference | 8192 | 8 | 8 | 8 | AMP |
Meta | Llama 3.2 Instruct (3B) | meta-llama/Llama-3.2-3B-Instruct | 8192 | 32 | 16 | 8 | AMP |
Meta | Llama 3.2 Instruct (1B) | meta-llama/Llama-3.2-1B-Instruct | 8192 | 32 | 16 | 8 | AMP |
Meta | Llama 3.1 (8B) Reference | meta-llama/Meta-Llama-3.1-8B-Reference | 8192 | 32 | 16 | 8 | AMP |
Meta | Llama 3.1 Instruct (8B) Reference | meta-llama/Meta-Llama-3.1-8B-Instruct-Reference | 8192 | 32 | 16 | 8 | AMP |
Meta | Llama 3.1 (70B) Reference | meta-llama/Meta-Llama-3.1-70B-Reference | 8192 | 8 | 8 | 8 | AMP |
Meta | Llama 3.1 Instruct (70B) Reference | meta-llama/Meta-Llama-3.1-70B-Instruct-Reference | 8192 | 8 | 8 | 8 | AMP |
Meta | Llama 3 (8B) | meta-llama/Meta-Llama-3-8B | 8192 | 32 | 16 | 8 | AMP |
Meta | Llama 3 Instruct (8B) | meta-llama/Meta-Llama-3-8B-Instruct | 8192 | 32 | 16 | 8 | AMP |
Meta | Llama 3 Instruct (70B) | meta-llama/Meta-Llama-3-70B-Instruct | 8192 | 8 | 8 | 8 | AMP |
Meta | Llama-2 Chat (7B) | togethercomputer/llama-2-7b-chat | 4096 | 128 | 64 | 8 | AMP |
Meta | CodeLlama (7B) | codellama/CodeLlama-7b-hf | 16384 | 32 | 16 | 8 | AMP |
Mistral AI | Mixtral-8x7B (46.7B) | mistralai/Mixtral-8x7B-v0.1 | 32768 | 16 | 8 | 8 | AMP |
Mistral AI | Mixtral-8x7B Instruct (46.7B) | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32768 | 16 | 8 | 8 | AMP |
Mistral AI | Mistral 7B Instruct v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 | 16 | 8 | 8 | AMP |
Mistral AI | Mistral 7B v0.1 | mistralai/Mistral-7B-v0.1 | 8192 | 64 | 32 | 8 | AMP |
Qwen | Qwen2.5-72B | Qwen/Qwen2.5-72B-Instruct | 8192 | 16 | 8 | 8 | AMP |
Qwen | Qwen2.5-14B | Qwen/Qwen2.5-14B-Instruct | 8192 | 40 | 16 | 8 | AMP |
Qwen | Qwen2-1.5B | Qwen/Qwen2-1.5B | 8192 | 48 | 24 | 8 | AMP |
Qwen | Qwen2-1.5B-Instruct | Qwen/Qwen2-1.5B-Instruct | 8192 | 48 | 24 | 8 | AMP |
Qwen | Qwen2-7B | Qwen/Qwen2-7B | 8192 | 32 | 16 | 8 | AMP |
Qwen | Qwen2-7B-Instruct | Qwen/Qwen2-7B-Instruct | 8192 | 32 | 16 | 8 | AMP |
Qwen | Qwen2-72B | Qwen/Qwen2-72B | 8192 | 8 | 8 | 8 | AMP |
Qwen | Qwen2-72B-Instruct | Qwen/Qwen2-72B-Instruct | 8192 | 8 | 8 | 8 | AMP |
Teknium | OpenHermes 2.5 Mistral 7B | teknium/OpenHermes-2p5-Mistral-7B | 8192 | 64 | 32 | 8 | AMP |
Organization | Model Name | Model String for API | Context Length | Max Batch Size | Max Batch Size (DPO) | Min Batch Size | Training Precision Type* |
---|---|---|---|---|---|---|---|
Deepseek | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B-32k | 32768 | 1* | 1* | 1* | AMP |
Meta | Llama 3.3 Instruct (70B) Reference | meta-llama/Llama-3.3-70B-32k-Instruct-Reference | 32768 | 1* | 1* | 1* | AMP |
Meta | Llama 3.1 (8B) Reference | meta-llama/Meta-Llama-3.1-8B-32k-Reference | 32768 | 8 | 8 | 8 | AMP |
Meta | Llama 3.1 Instruct (8B) Reference | meta-llama/Meta-Llama-3.1-8B-32k-Instruct-Reference | 32768 | 8 | 8 | 8 | AMP |
Meta | Llama 3.1 (70B) Reference | meta-llama/Meta-Llama-3.1-70B-32k-Reference | 32768 | 1* | 1* | 1* | AMP |
Meta | Llama 3.1 Instruct (70B) Reference | meta-llama/Meta-Llama-3.1-70B-32k-Instruct-Reference | 32768 | 1* | 1* | 1* | AMP |
Organization | Model Name | Model String for API | Context Length | Max Batch Size | Max Batch Size (DPO) | Min Batch Size | Training Precision Type* |
---|---|---|---|---|---|---|---|
google/gemma-3-27b-it | google/gemma-3-27b-it | 12288 | 8 | 8 | 8 | AMP | |
google/gemma-3-27b-pt | google/gemma-3-27b-pt | 12288 | 8 | 8 | 8 | AMP | |
google/gemma-3-12b-it | google/gemma-3-12b-it | 16384 | 8 | 8 | 8 | AMP | |
google/gemma-3-12b-pt | google/gemma-3-12b-pt | 16384 | 8 | 8 | 8 | AMP | |
google/gemma-3-4b-it | google/gemma-3-4b-it | 16384 | 8 | 8 | 8 | AMP | |
google/gemma-3-4b-pt | google/gemma-3-4b-pt | 16384 | 8 | 8 | 8 | AMP | |
google/gemma-3-1b-it | google/gemma-3-1b-it | 16384 | 24 | 8 | 8 | AMP | |
google/gemma-3-1b-pt | google/gemma-3-1b-pt | 16384 | 24 | 8 | 8 | AMP | |
Qwen | Qwen/Qwen3-32B | Qwen/Qwen3-32B | 8192 | 8 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-14B | Qwen/Qwen3-14B | 8192 | 16 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-14B-Base | Qwen/Qwen3-14B-Base | 8192 | 16 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-8B | Qwen/Qwen3-8B | 8192 | 24 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-8B-Base | Qwen/Qwen3-8B-Base | 8192 | 24 | 8 | 8 | AMP |
Qwen | Qwen/Qwen3-4B | Qwen/Qwen3-4B | 8192 | 32 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-4B-Base | Qwen/Qwen3-4B-Base | 8192 | 32 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-1.7B | Qwen/Qwen3-1.7B | 8192 | 40 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-1.7B-Base | Qwen/Qwen3-1.7B-Base | 8192 | 40 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-0.6B | Qwen/Qwen3-0.6B | 8192 | 40 | 16 | 8 | AMP |
Qwen | Qwen/Qwen3-0.6B-Base | Qwen/Qwen3-0.6B-Base | 8192 | 40 | 16 | 8 | AMP |
Deepseek | DeepSeek-R1-Distill-Llama-70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | 8192 | 16 | 8 | 16 | bf16 |
Deepseek | DeepSeek-R1-Distill-Qwen-14B | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 8192 | 32 | 16 | 8 | AMP |
Deepseek | DeepSeek-R1-Distill-Qwen-1.5B | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B | 8192 | 48 | 24 | 8 | AMP |
Meta | Llama 3.3 Instruct (70B) Reference | meta-llama/Llama-3.3-70B-Instruct-Reference | 8192 | 16 | 8 | 16 | bf16 |
Meta | Llama 3.1 (8B) Reference | meta-llama/Meta-Llama-3.1-8B-Reference | 8192 | 24 | 8 | 8 | AMP |
Meta | Llama 3.1 Instruct (8B) Reference | meta-llama/Meta-Llama-3.1-8B-Instruct-Reference | 8192 | 24 | 8 | 8 | AMP |
Meta | Llama 3.1 (70B) Reference | meta-llama/Meta-Llama-3.1-70B-Reference | 8192 | 16 | 8 | 16 | bf16 |
Meta | Llama 3.1 Instruct (70B) Reference | meta-llama/Meta-Llama-3.1-70B-Instruct-Reference | 8192 | 16 | 8 | 16 | bf16 |
Meta | Llama 3 (8B) | meta-llama/Meta-Llama-3-8B | 8192 | 24 | 8 | 8 | AMP |
Meta | Llama 3 Instruct (8B) | meta-llama/Meta-Llama-3-8B-Instruct | 8192 | 24 | 8 | 8 | AMP |
Meta | Llama 3 Instruct (70B) | meta-llama/Meta-Llama-3-70B-Instruct | 8192 | 16 | 8 | 16 | bf16 |
Meta | Llama-2 Chat (7B) | togethercomputer/llama-2-7b-chat | 4096 | 96 | 48 | 8 | AMP |
Meta | CodeLlama (7B) | codellama/CodeLlama-7b-hf | 16384 | 32 | 16 | 8 | AMP |
Mistral AI | Mixtral-8x7B (46.7B) | mistralai/Mixtral-8x7B-v0.1 | 32768 | 16 | 8 | 16 | bf16 |
Mistral AI | Mixtral-8x7B Instruct (46.7B) | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32768 | 16 | 8 | 16 | bf16 |
Mistral AI | Mistral 7B Instruct v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 | 16 | 8 | 8 | AMP |
Mistral AI | Mistral 7B v0.1 | mistralai/Mistral-7B-v0.1 | 8192 | 64 | 32 | 8 | AMP |
Qwen | Qwen2-1.5B | Qwen/Qwen2-1.5B | 8192 | 48 | 24 | 8 | AMP |
Qwen | Qwen2-1.5B-Instruct | Qwen/Qwen2-1.5B-Instruct | 8192 | 48 | 24 | 8 | AMP |
Qwen | Qwen2-7B | Qwen/Qwen2-7B | 8192 | 24 | 8 | 8 | AMP |
Qwen | Qwen2-7B-Instruct | Qwen/Qwen2-7B-Instruct | 8192 | 24 | 8 | 8 | AMP |
Teknium | OpenHermes 2.5 Mistral 7B | teknium/OpenHermes-2p5-Mistral-7B | 8192 | 64 | 32 | 8 | AMP |