Fine-tuning Models
A list of all the models available for fine-tuning.
The following models are available to use with our fine-tuning API. Get started with fine-tuning a model!
- Training Precision Type indicates the precision type used during training for each model.
- AMP (Automated Mixed Precision): AMP allows the training speed to be faster with less memory usage while preserving convergence behavior compared to using float32. Learn more about AMP in this PyTorch blog.
- bf16 (bfloat 16): This uses bf16 for all weights. Some large models on our platform uses full bf16 training for better memory usage and training speed.
- Long-context fine-tuning of Llama 3.1 (8B) Reference, Llama 3.1 (8B) Reference, Llama 3.1 (70B) Reference, Llama 3.1 Instruct (70B) Reference for context sizes of 32K-131K is only supported using the LoRA method.
- For Llama 3.1 (405B) Fine-tuning, please contact us.
LoRA Fine-tuning
Organization | Model Name | Model String for API | Context Length | Max Batch Size | Min Batch Size | Training Precision Type* |
---|---|---|---|---|---|---|
Meta | Llama 3.3 Instruct (70B) Reference | meta-llama/Llama-3.3-70B-Instruct-Reference | 8192 | 8 | 8 | AMP |
Meta | Llama 3.1 (8B) Reference | meta-llama/Meta-Llama-3.1-8B-Reference | 8192 | 32 | 8 | AMP |
Meta | Llama 3.1 Instruct (8B) Reference | meta-llama/Meta-Llama-3.1-8B-Instruct-Reference | 8192 | 32 | 8 | AMP |
Meta | Llama 3.1 (70B) Reference | meta-llama/Meta-Llama-3.1-70B-Reference | 8192 | 8 | 8 | AMP |
Meta | Llama 3.1 Instruct (70B) Reference | meta-llama/Meta-Llama-3.1-70B-Instruct-Reference | 8192 | 8 | 8 | AMP |
Meta | Llama 3 (8B) | meta-llama/Meta-Llama-3-8B | 8192 | 32 | 8 | AMP |
Meta | Llama 3 Instruct (8B) | meta-llama/Meta-Llama-3-8B-Instruct | 8192 | 32 | 8 | AMP |
Meta | Llama 3 (70B) | meta-llama/Meta-Llama-3-70B | 8192 | 8 | 8 | AMP |
Meta | Llama 3 Instruct (70B) | meta-llama/Meta-Llama-3-70B-Instruct | 8192 | 8 | 8 | AMP |
Meta | Llama-2 (7B) | togethercomputer/llama-2-7b | 4096 | 128 | 8 | AMP |
Meta | Llama-2 Chat (7B) | togethercomputer/llama-2-7b-chat | 4096 | 128 | 8 | AMP |
Meta | Llama-2 (13B) | togethercomputer/llama-2-13b | 4096 | 96 | 8 | AMP |
Meta | Llama-2 Chat (13B) | togethercomputer/llama-2-13b-chat | 4096 | 96 | 8 | AMP |
Meta | Llama-2 (70B) | togethercomputer/llama-2-70b | 4096 | 48 | 8 | AMP |
Meta | Llama-2 Chat (70B) | togethercomputer/llama-2-70b-chat | 4096 | 48 | 8 | AMP |
Meta | CodeLlama (7B) | codellama/CodeLlama-7b-hf | 16384 | 32 | 8 | AMP |
Meta | CodeLlama Python (7B) | codellama/CodeLlama-7b-Python-hf | 16384 | 32 | 8 | AMP |
Meta | CodeLlama Instruct (7B) | codellama/CodeLlama-7b-Instruct-hf | 16384 | 32 | 8 | AMP |
Meta | CodeLlama (13B) | codellama/CodeLlama-13b-hf | 16384 | 32 | 8 | AMP |
Meta | CodeLlama Python (13B) | codellama/CodeLlama-13b-Python-hf | 16384 | 32 | 8 | AMP |
Meta | CodeLlama Instruct (13B) | codellama/CodeLlama-13b-Instruct-hf | 16384 | 32 | 8 | AMP |
Mistral AI | Mixtral-8x7B (46.7B) | mistralai/Mixtral-8x7B-v0.1 | 32768 | 16 | 8 | AMP |
Mistral AI | Mixtral-8x7B Instruct (46.7B) | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32768 | 16 | 8 | AMP |
NousResearch | Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B) | NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO | 32768 | 16 | 8 | AMP |
NousResearch | Nous Hermes 2 - Mixtral 8x7B-SFT (46.7B) | NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT | 32768 | 16 | 8 | AMP |
Mistral AI | Mistral 7B Instruct v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 | 16 | 8 | AMP |
Mistral AI | Mistral 7B Instruct v0.1 | mistralai/Mistral-7B-Instruct-v0.1 | 8192 | 64 | 8 | AMP |
Mistral AI | Mistral 7B v0.1 | mistralai/Mistral-7B-v0.1 | 8192 | 64 | 8 | AMP |
Qwen | Qwen2-1.5B | Qwen/Qwen2-1.5B | 8192 | 32 | 8 | AMP |
Qwen | Qwen2-1.5B-Instruct | Qwen/Qwen2-1.5B-Instruct | 8192 | 32 | 8 | AMP |
Qwen | Qwen2-7B | Qwen/Qwen2-7B | 8192 | 32 | 8 | AMP |
Qwen | Qwen2-7B-Instruct | Qwen/Qwen2-7B-Instruct | 8192 | 32 | 8 | AMP |
Qwen | Qwen2-72B | Qwen/Qwen2-72B | 8192 | 8 | 8 | AMP |
Qwen | Qwen2-72B-Instruct | Qwen/Qwen2-72B-Instruct | 8192 | 8 | 8 | AMP |
Teknium | OpenHermes 2.5 Mistral 7B | teknium/OpenHermes-2p5-Mistral-7B | 8192 | 64 | 8 | AMP |
Hugging Face H4 | Zephyr 7B ß | HuggingFaceH4/zephyr-7b-beta | 8192 | 96 | 8 | AMP |
Upstage | SOLAR Instruct v1 (11B) | upstage/SOLAR-10.7B-Instruct-v1.0 | 4096 | 32 | 8 | AMP |
LoRA Long-context Fine-tuning
Organization | Model Name | Model String for API | Context Length | Max Batch Size | Min Batch Size | Training Precision Type* |
---|---|---|---|---|---|---|
Meta | Llama 3.3 Instruct (70B) Reference | meta-llama/Llama-3.3-70B-32k-Instruct-Reference | 32768 | 1* | 1* | AMP |
Meta | Llama 3.1 (8B) Reference | meta-llama/Meta-Llama-3.1-8B-32k-Reference | 32768 | 8 | 8 | AMP |
Meta | Llama 3.1 Instruct (8B) Reference | meta-llama/Meta-Llama-3.1-8B-32k-Instruct-Reference | 32768 | 8 | 8 | AMP |
Meta | Llama 3.1 (70B) Reference | meta-llama/Meta-Llama-3.1-70B-32k-Reference | 32768 | 1* | 1* | AMP |
Meta | Llama 3.1 Instruct (70B) Reference | meta-llama/Meta-Llama-3.1-70B-32k-Instruct-Reference | 32768 | 1* | 1* | AMP |
Meta | Llama 3 (8B) | togethercomputer/Llama-3-8b-32k | 32768 | 16 | 8 | AMP |
Meta | Llama-2-7B-32K (7B) | togethercomputer/LLaMA-2-7B-32K | 32768 | 16 | 8 | AMP |
Meta | Llama-2-7B-32K-Instruct (7B) | togethercomputer/LLaMA-2-7B-32K-Instruct | 32768 | 16 | 8 | AMP |
1* -- Gradient accumulation 8 is used, so effectively you will get batch size 8 (iteration time is slower).
Full Fine-tuning
Organization | Model Name | Model String for API | Context Length | Max Batch Size | Min Batch Size | Training Precision Type* |
---|---|---|---|---|---|---|
Meta | Llama 3.3 Instruct (70B) Reference | meta-llama/Llama-3.3-70B-Instruct-Reference | 8192 | 16 | 16 | bf16 |
Meta | Llama 3.1 (8B) Reference | meta-llama/Meta-Llama-3.1-8B-Reference | 8192 | 24 | 8 | AMP |
Meta | Llama 3.1 Instruct (8B) Reference | meta-llama/Meta-Llama-3.1-8B-Instruct-Reference | 8192 | 24 | 8 | AMP |
Meta | Llama 3.1 (70B) Reference | meta-llama/Meta-Llama-3.1-70B-Reference | 8192 | 16 | 16 | bf16 |
Meta | Llama 3.1 Instruct (70B) Reference | meta-llama/Meta-Llama-3.1-70B-Instruct-Reference | 8192 | 16 | 16 | bf16 |
Meta | Llama 3 (8B) | meta-llama/Meta-Llama-3-8B | 8192 | 24 | 8 | AMP |
Meta | Llama 3 Instruct (8B) | meta-llama/Meta-Llama-3-8B-Instruct | 8192 | 24 | 8 | AMP |
Meta | Llama 3 (70B) | meta-llama/Meta-Llama-3-70B | 8192 | 16 | 16 | bf16 |
Meta | Llama 3 Instruct (70B) | meta-llama/Meta-Llama-3-70B-Instruct | 8192 | 16 | 16 | bf16 |
Together | Llama-2-7B-32K (7B) | togethercomputer/LLaMA-2-7B-32K | 32768 | 16 | 8 | AMP |
Together | Llama-2-7B-32K-Instruct (7B) | togethercomputer/Llama-2-7B-32K-Instruct | 32768 | 16 | 8 | AMP |
Meta | Llama-2 (7B) | togethercomputer/llama-2-7b | 4096 | 96 | 8 | AMP |
Meta | Llama-2 Chat (7B) | togethercomputer/llama-2-7b-chat | 4096 | 96 | 8 | AMP |
Meta | Llama-2 (13B) | togethercomputer/llama-2-13b | 4096 | 40 | 8 | AMP |
Meta | Llama-2 Chat (13B) | togethercomputer/llama-2-13b-chat | 4096 | 40 | 8 | AMP |
Meta | Llama-2 (70B) | togethercomputer/llama-2-70b | 4096 | 64 | 16 | bf16 |
Meta | Llama-2 Chat (70B) | togethercomputer/llama-2-70b-chat | 4096 | 64 | 16 | bf16 |
Meta | CodeLlama (7B) | codellama/CodeLlama-7b-hf | 16384 | 32 | 8 | AMP |
Meta | CodeLlama Python (7B) | codellama/CodeLlama-7b-Python-hf | 16384 | 32 | 8 | AMP |
Meta | CodeLlama Instruct (7B) | codellama/CodeLlama-7b-Instruct-hf | 16384 | 32 | 8 | AMP |
Meta | CodeLlama (13B) | codellama/CodeLlama-13b-hf | 16384 | 16 | 8 | AMP |
Meta | CodeLlama Python (13B) | codellama/CodeLlama-13b-Python-hf | 16384 | 16 | 8 | AMP |
Meta | CodeLlama Instruct (13B) | codellama/CodeLlama-13b-Instruct-hf | 16384 | 16 | 8 | AMP |
Mistral AI | Mixtral-8x7B (46.7B) | mistralai/Mixtral-8x7B-v0.1 | 32768 | 16 | 16 | bf16 |
Mistral AI | Mixtral-8x7B Instruct (46.7B) | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32768 | 16 | 16 | bf16 |
NousResearch | Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B) | NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO | 32768 | 16 | 16 | bf16 |
NousResearch | Nous Hermes 2 - Mixtral 8x7B-SFT (46.7B) | NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT | 32768 | 16 | 16 | bf16 |
Mistral AI | Mistral 7B Instruct v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 | 16 | 8 | AMP |
Mistral AI | Mistral 7B Instruct v0.1 | mistralai/Mistral-7B-Instruct-v0.1 | 8192 | 64 | 8 | AMP |
Mistral AI | Mistral 7B v0.1 | mistralai/Mistral-7B-v0.1 | 8192 | 64 | 8 | AMP |
Qwen | Qwen2-1.5B | Qwen/Qwen2-1.5B | 8192 | 32 | 8 | AMP |
Qwen | Qwen2-1.5B-Instruct | Qwen/Qwen2-1.5B-Instruct | 8192 | 32 | 8 | AMP |
Qwen | Qwen2-7B | Qwen/Qwen2-7B | 8192 | 24 | 8 | AMP |
Qwen | Qwen2-7B-Instruct | Qwen/Qwen2-7B-Instruct | 8192 | 24 | 8 | AMP |
Teknium | OpenHermes 2.5 Mistral 7B | teknium/OpenHermes-2p5-Mistral-7B | 8192 | 64 | 8 | AMP |
Hugging Face H4 | Zephyr 7B ß | HuggingFaceH4/zephyr-7b-beta | 8192 | 64 | 8 | AMP |
Upstage | SOLAR Instruct v1 (11B) | upstage/SOLAR-10.7B-Instruct-v1.0 | 4096 | 32 | 8 | AMP |
Updated 2 days ago