> ## Documentation Index
> Fetch the complete documentation index at: https://docs.together.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Supported models

> A list of all the models available for fine-tuning.

The following models are available to use with our fine-tuning API. Get started with [fine-tuning a model](/docs/fine-tuning-quickstart)!

**Note:** The batch sizes listed below refer to packed batch sizes for text formats. For more details on packing behavior and data formats, see the [Data Preparation](/docs/fine-tuning-data-preparation) page.

<Warning>Models with the `-Reference` suffix can be fine-tuned but **cannot be deployed as dedicated endpoints**. To verify deployability before training, run `client.endpoints.list_hardware(model="<base-model>")`. A 404 means the base can't host a fine-tune.</Warning>

[*Request a model*](https://www.together.ai/forms/model-requests)

## LoRA Fine-tuning

| Organization | Model Name                                 | Model String for API                                  | Context Length (SFT) | Context Length (DPO) | Max Batch Size (SFT) | Max Batch Size (DPO) | Min Batch Size | Gradient Accumulation Steps |
| ------------ | ------------------------------------------ | ----------------------------------------------------- | -------------------- | -------------------- | -------------------- | -------------------- | -------------- | --------------------------- |
| Qwen         | Qwen3.5-397B-A17B                          | Qwen/Qwen3.5-397B-A17B                                | 32768                | 16384                | 16                   | 16                   | 16             | 1                           |
| Qwen         | Qwen3.5-122B-A10B                          | Qwen/Qwen3.5-122B-A10B                                | 65536                | 32768                | 16                   | 16                   | 16             | 1                           |
| Qwen         | Qwen3.5-35B-A3B                            | Qwen/Qwen3.5-35B-A3B                                  | 65536                | 32768                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3.6-35B-A3B                            | Qwen/Qwen3.6-35B-A3B                                  | 65536                | 32768                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3.5-35B-A3B-Base                       | Qwen/Qwen3.5-35B-A3B-Base                             | 65536                | 32768                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3.5-27B                                | Qwen/Qwen3.5-27B                                      | 32768                | 16384                | 16                   | 16                   | 16             | 1                           |
| Qwen         | Qwen3.5-9B                                 | Qwen/Qwen3.5-9B                                       | 65536                | 49152                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3.5-4B                                 | Qwen/Qwen3.5-4B                                       | 131072               | 65536                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3.5-2B                                 | Qwen/Qwen3.5-2B                                       | 131072               | 131072               | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3.5-0.8B                               | Qwen/Qwen3.5-0.8B                                     | 131072               | 131072               | 8                    | 8                    | 8              | 1                           |
| Moonshot AI  | Kimi-K2.5                                  | moonshotai/Kimi-K2.5                                  | 32768                | 16384                | 4                    | 4                    | 4              | 8                           |
| Moonshot AI  | Kimi-K2-Thinking                           | moonshotai/Kimi-K2-Thinking                           | 32768                | 16384                | 4                    | 4                    | 4              | 8                           |
| Moonshot AI  | Kimi-K2-Instruct-0905                      | moonshotai/Kimi-K2-Instruct-0905                      | 32768                | 16384                | 4                    | 4                    | 4              | 8                           |
| Moonshot AI  | Kimi-K2-Instruct                           | moonshotai/Kimi-K2-Instruct                           | 32768                | 16384                | 4                    | 4                    | 4              | 8                           |
| Moonshot AI  | Kimi-K2-Base                               | moonshotai/Kimi-K2-Base                               | 32768                | 16384                | 4                    | 4                    | 4              | 8                           |
| Z.ai         | GLM-5.1                                    | zai-org/GLM-5.1                                       | 50688                | 25344                | 1                    | 1                    | 1              | 1                           |
| Z.ai         | GLM-5                                      | zai-org/GLM-5                                         | 50688                | 25344                | 1                    | 1                    | 1              | 1                           |
| Z.ai         | GLM-4.7                                    | zai-org/GLM-4.7                                       | 128000               | 64000                | 1                    | 1                    | 1              | 8                           |
| Z.ai         | GLM-4.6                                    | zai-org/GLM-4.6                                       | 128000               | 64000                | 1                    | 1                    | 1              | 8                           |
| OpenAI       | gpt-oss-20b                                | openai/gpt-oss-20b                                    | 24576                | 24576                | 8                    | 8                    | 8              | 1                           |
| OpenAI       | gpt-oss-120b                               | openai/gpt-oss-120b                                   | 16384                | 16384                | 16                   | 16                   | 16             | 1                           |
| DeepSeek     | DeepSeek-R1-0528                           | deepseek-ai/DeepSeek-R1-0528                          | 131072               | 32768                | 2                    | 2                    | 2              | 8                           |
| DeepSeek     | DeepSeek-R1                                | deepseek-ai/DeepSeek-R1                               | 131072               | 49152                | 2                    | 2                    | 2              | 8                           |
| DeepSeek     | DeepSeek-V3.1                              | deepseek-ai/DeepSeek-V3.1                             | 131072               | 32768                | 2                    | 2                    | 2              | 8                           |
| DeepSeek     | DeepSeek-V3-0324                           | deepseek-ai/DeepSeek-V3-0324                          | 131072               | 32768                | 2                    | 2                    | 2              | 8                           |
| DeepSeek     | DeepSeek-V3                                | deepseek-ai/DeepSeek-V3                               | 131072               | 32768                | 2                    | 2                    | 2              | 8                           |
| DeepSeek     | DeepSeek-V3.1-Base                         | deepseek-ai/DeepSeek-V3.1-Base                        | 131072               | 32768                | 2                    | 2                    | 2              | 8                           |
| DeepSeek     | DeepSeek-V3-Base                           | deepseek-ai/DeepSeek-V3-Base                          | 131072               | 32768                | 2                    | 2                    | 2              | 8                           |
| DeepSeek     | DeepSeek-R1-Distill-Llama-70B              | deepseek-ai/DeepSeek-R1-Distill-Llama-70B             | 24576                | 12288                | 8                    | 8                    | 8              | 1                           |
| DeepSeek     | DeepSeek-R1-Distill-Llama-70B-32k          | deepseek-ai/DeepSeek-R1-Distill-Llama-70B-32k         | 32768                | 32768                | 1                    | 1                    | 1              | 8                           |
| DeepSeek     | DeepSeek-R1-Distill-Llama-70B-131k         | deepseek-ai/DeepSeek-R1-Distill-Llama-70B-131k        | 131072               | 32768                | 1                    | 1                    | 1              | 8                           |
| DeepSeek     | DeepSeek-R1-Distill-Qwen-14B               | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B              | 65536                | 32768                | 8                    | 8                    | 8              | 1                           |
| DeepSeek     | DeepSeek-R1-Distill-Qwen-1.5B              | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B             | 131072               | 131072               | 8                    | 8                    | 8              | 1                           |
| Meta         | Llama-4-Scout-17B-16E                      | meta-llama/Llama-4-Scout-17B-16E                      | 65536                | 12288                | 8                    | 8                    | 8              | 1                           |
| Meta         | Llama-4-Scout-17B-16E-Instruct             | meta-llama/Llama-4-Scout-17B-16E-Instruct             | 65536                | 12288                | 8                    | 8                    | 8              | 1                           |
| Meta         | Llama-4-Scout-17B-16E-Instruct-VLM         | meta-llama/Llama-4-Scout-17B-16E-Instruct-VLM         | 32768                | 32768                | 8                    | 8                    | 8              | 1                           |
| Meta         | Llama-4-Maverick-17B-128E                  | meta-llama/Llama-4-Maverick-17B-128E                  | 16384                | 16384                | 16                   | 16                   | 16             | 1                           |
| Meta         | Llama-4-Maverick-17B-128E-Instruct         | meta-llama/Llama-4-Maverick-17B-128E-Instruct         | 16384                | 24576                | 16                   | 16                   | 16             | 1                           |
| Meta         | Llama-4-Maverick-17B-128E-Instruct-VLM     | meta-llama/Llama-4-Maverick-17B-128E-Instruct-VLM     | 16384                | 16384                | 16                   | 16                   | 16             | 1                           |
| Google       | gemma-3-270m                               | google/gemma-3-270m                                   | 32768                | 32768                | 128                  | 128                  | 8              | 1                           |
| Google       | gemma-3-270m-it                            | google/gemma-3-270m-it                                | 32768                | 32768                | 128                  | 128                  | 8              | 1                           |
| Google       | gemma-3-1b-it                              | google/gemma-3-1b-it                                  | 32768                | 32768                | 32                   | 32                   | 8              | 1                           |
| Google       | gemma-3-1b-pt                              | google/gemma-3-1b-pt                                  | 32768                | 32768                | 32                   | 32                   | 8              | 1                           |
| Google       | gemma-3-4b-it                              | google/gemma-3-4b-it                                  | 131072               | 65536                | 8                    | 8                    | 8              | 1                           |
| Google       | gemma-3-4b-it-VLM                          | google/gemma-3-4b-it-VLM                              | 32768                | 32768                | 8                    | 8                    | 8              | 1                           |
| Google       | gemma-3-4b-pt                              | google/gemma-3-4b-pt                                  | 131072               | 65536                | 8                    | 8                    | 8              | 1                           |
| Google       | gemma-3-12b-it                             | google/gemma-3-12b-it                                 | 65536                | 49152                | 8                    | 8                    | 8              | 1                           |
| Google       | gemma-3-12b-it-VLM                         | google/gemma-3-12b-it-VLM                             | 32768                | 32768                | 8                    | 8                    | 8              | 1                           |
| Google       | gemma-3-12b-pt                             | google/gemma-3-12b-pt                                 | 65536                | 49152                | 8                    | 8                    | 8              | 1                           |
| Google       | gemma-3-27b-it                             | google/gemma-3-27b-it                                 | 49152                | 24576                | 8                    | 8                    | 8              | 1                           |
| Google       | gemma-3-27b-it-VLM                         | google/gemma-3-27b-it-VLM                             | 32768                | 24576                | 8                    | 8                    | 8              | 1                           |
| Google       | gemma-3-27b-pt                             | google/gemma-3-27b-pt                                 | 49152                | 24576                | 8                    | 8                    | 8              | 1                           |
| Google       | gemma-4-31B-it                             | google/gemma-4-31B-it                                 | 49152                | 24576                | 4                    | 4                    | 4              | 2                           |
| Google       | gemma-4-26B-A4B-it                         | google/gemma-4-26B-A4B-it                             | 49152                | 24576                | 4                    | 4                    | 4              | 2                           |
| Qwen         | Qwen3-Next-80B-A3B-Instruct                | Qwen/Qwen3-Next-80B-A3B-Instruct                      | 16384                | 24576                | 16                   | 16                   | 16             | 1                           |
| Qwen         | Qwen3-Next-80B-A3B-Thinking                | Qwen/Qwen3-Next-80B-A3B-Thinking                      | 16384                | 24576                | 16                   | 16                   | 16             | 1                           |
| Qwen         | Qwen3-0.6B                                 | Qwen/Qwen3-0.6B                                       | 40960                | 40960                | 64                   | 64                   | 8              | 1                           |
| Qwen         | Qwen3-0.6B-Base                            | Qwen/Qwen3-0.6B-Base                                  | 32768                | 32768                | 64                   | 64                   | 8              | 1                           |
| Qwen         | Qwen3-1.7B                                 | Qwen/Qwen3-1.7B                                       | 40960                | 40960                | 32                   | 32                   | 8              | 1                           |
| Qwen         | Qwen3-1.7B-Base                            | Qwen/Qwen3-1.7B-Base                                  | 32768                | 32768                | 32                   | 32                   | 8              | 1                           |
| Qwen         | Qwen3-4B                                   | Qwen/Qwen3-4B                                         | 40960                | 40960                | 16                   | 16                   | 8              | 1                           |
| Qwen         | Qwen3-4B-Base                              | Qwen/Qwen3-4B-Base                                    | 32768                | 32768                | 16                   | 16                   | 8              | 1                           |
| Qwen         | Qwen3-8B                                   | Qwen/Qwen3-8B                                         | 40960                | 40960                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3-8B-Base                              | Qwen/Qwen3-8B-Base                                    | 32768                | 32768                | 16                   | 16                   | 8              | 1                           |
| Qwen         | Qwen3-14B                                  | Qwen/Qwen3-14B                                        | 40960                | 40960                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3-14B-Base                             | Qwen/Qwen3-14B-Base                                   | 32768                | 32768                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3-32B                                  | Qwen/Qwen3-32B                                        | 40960                | 24576                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3-30B-A3B-Base                         | Qwen/Qwen3-30B-A3B-Base                               | 8192                 | 32768                | 16                   | 16                   | 8              | 1                           |
| Qwen         | Qwen3-30B-A3B                              | Qwen/Qwen3-30B-A3B                                    | 8192                 | 32768                | 16                   | 16                   | 8              | 1                           |
| Qwen         | Qwen3-30B-A3B-Instruct-2507                | Qwen/Qwen3-30B-A3B-Instruct-2507                      | 8192                 | 32768                | 16                   | 16                   | 8              | 1                           |
| Qwen         | Qwen3-235B-A22B                            | Qwen/Qwen3-235B-A22B                                  | 40960                | 32768                | 8                    | 8                    | 8              | 2                           |
| Qwen         | Qwen3-235B-A22B-Instruct-2507              | Qwen/Qwen3-235B-A22B-Instruct-2507                    | 49152                | 32768                | 8                    | 8                    | 8              | 2                           |
| Qwen         | Qwen3-Coder-30B-A3B-Instruct               | Qwen/Qwen3-Coder-30B-A3B-Instruct                     | 262144               | 262144               | 2                    | 2                    | 2              | 4                           |
| Qwen         | Qwen3-Coder-480B-A35B-Instruct             | Qwen/Qwen3-Coder-480B-A35B-Instruct                   | 262144               | 65536                | 2                    | 2                    | 2              | 8                           |
| Qwen         | Qwen3-VL-8B-Instruct                       | Qwen/Qwen3-VL-8B-Instruct                             | 24576                | 16384                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3-VL-32B-Instruct                      | Qwen/Qwen3-VL-32B-Instruct                            | 16384                | 16384                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3-VL-30B-A3B-Instruct                  | Qwen/Qwen3-VL-30B-A3B-Instruct                        | 16384                | 16384                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen3-VL-235B-A22B-Instruct                | Qwen/Qwen3-VL-235B-A22B-Instruct                      | 16384                | 12288                | 16                   | 16                   | 16             | 1                           |
| NVIDIA       | NVIDIA-Nemotron-Nano-9B-v2                 | nvidia/NVIDIA-Nemotron-Nano-9B-v2                     | 32768                | 16384                | 8                    | 8                    | 8              | 1                           |
| Meta         | Llama-3.3-70B-Instruct-Reference           | meta-llama/Llama-3.3-70B-Instruct-Reference           | 24576                | 12288                | 8                    | 8                    | 8              | 1                           |
| Meta         | Llama-3.3-70B-32k-Instruct-Reference       | meta-llama/Llama-3.3-70B-32k-Instruct-Reference       | 32768                | 32768                | 1                    | 1                    | 1              | 8                           |
| Meta         | Llama-3.3-70B-131k-Instruct-Reference      | meta-llama/Llama-3.3-70B-131k-Instruct-Reference      | 131072               | 65536                | 1                    | 1                    | 1              | 8                           |
| Meta         | Llama-3.2-3B-Instruct                      | meta-llama/Llama-3.2-3B-Instruct                      | 131072               | 65536                | 8                    | 8                    | 8              | 1                           |
| Meta         | Llama-3.2-3B                               | meta-llama/Llama-3.2-3B                               | 131072               | 65536                | 8                    | 8                    | 8              | 1                           |
| Meta         | Llama-3.2-1B-Instruct                      | meta-llama/Llama-3.2-1B-Instruct                      | 131072               | 131072               | 8                    | 8                    | 8              | 1                           |
| Meta         | Llama-3.2-1B                               | meta-llama/Llama-3.2-1B                               | 131072               | 131072               | 8                    | 8                    | 8              | 1                           |
| Meta         | Meta-Llama-3.1-8B-Instruct-Reference       | meta-llama/Meta-Llama-3.1-8B-Instruct-Reference       | 131072               | 65536                | 8                    | 8                    | 8              | 1                           |
| Meta         | Meta-Llama-3.1-8B-131k-Instruct-Reference  | meta-llama/Meta-Llama-3.1-8B-131k-Instruct-Reference  | 131072               | 131072               | 4                    | 4                    | 1              | 1                           |
| Meta         | Meta-Llama-3.1-8B-Reference                | meta-llama/Meta-Llama-3.1-8B-Reference                | 131072               | 65536                | 8                    | 8                    | 8              | 1                           |
| Meta         | Meta-Llama-3.1-8B-131k-Reference           | meta-llama/Meta-Llama-3.1-8B-131k-Reference           | 131072               | 131072               | 4                    | 4                    | 1              | 1                           |
| Meta         | Meta-Llama-3.1-70B-Instruct-Reference      | meta-llama/Meta-Llama-3.1-70B-Instruct-Reference      | 24576                | 12288                | 8                    | 8                    | 8              | 1                           |
| Meta         | Meta-Llama-3.1-70B-32k-Instruct-Reference  | meta-llama/Meta-Llama-3.1-70B-32k-Instruct-Reference  | 32768                | 32768                | 1                    | 1                    | 1              | 8                           |
| Meta         | Meta-Llama-3.1-70B-131k-Instruct-Reference | meta-llama/Meta-Llama-3.1-70B-131k-Instruct-Reference | 131072               | 65536                | 1                    | 1                    | 1              | 8                           |
| Meta         | Meta-Llama-3.1-70B-Reference               | meta-llama/Meta-Llama-3.1-70B-Reference               | 24576                | 12288                | 8                    | 8                    | 8              | 1                           |
| Meta         | Meta-Llama-3.1-70B-32k-Reference           | meta-llama/Meta-Llama-3.1-70B-32k-Reference           | 32768                | 32768                | 1                    | 1                    | 1              | 8                           |
| Meta         | Meta-Llama-3.1-70B-131k-Reference          | meta-llama/Meta-Llama-3.1-70B-131k-Reference          | 131072               | 65536                | 1                    | 1                    | 1              | 8                           |
| Meta         | Meta-Llama-3-8B-Instruct                   | meta-llama/Meta-Llama-3-8B-Instruct                   | 8192                 | 8192                 | 64                   | 64                   | 8              | 1                           |
| Meta         | Meta-Llama-3-8B                            | meta-llama/Meta-Llama-3-8B                            | 8192                 | 8192                 | 64                   | 64                   | 8              | 1                           |
| Meta         | Meta-Llama-3-70B-Instruct                  | meta-llama/Meta-Llama-3-70B-Instruct                  | 8192                 | 8192                 | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2.5-72B-Instruct                       | Qwen/Qwen2.5-72B-Instruct                             | 24576                | 12288                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2.5-72B                                | Qwen/Qwen2.5-72B                                      | 24576                | 12288                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2.5-32B-Instruct                       | Qwen/Qwen2.5-32B-Instruct                             | 32768                | 32768                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2.5-32B                                | Qwen/Qwen2.5-32B                                      | 49152                | 32768                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2.5-14B-Instruct                       | Qwen/Qwen2.5-14B-Instruct                             | 32768                | 32768                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2.5-14B                                | Qwen/Qwen2.5-14B                                      | 65536                | 49152                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2.5-7B-Instruct                        | Qwen/Qwen2.5-7B-Instruct                              | 32768                | 32768                | 16                   | 16                   | 8              | 1                           |
| Qwen         | Qwen2.5-7B                                 | Qwen/Qwen2.5-7B                                       | 131072               | 65536                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2.5-3B-Instruct                        | Qwen/Qwen2.5-3B-Instruct                              | 32768                | 32768                | 32                   | 32                   | 8              | 1                           |
| Qwen         | Qwen2.5-3B                                 | Qwen/Qwen2.5-3B                                       | 32768                | 32768                | 32                   | 32                   | 8              | 1                           |
| Qwen         | Qwen2.5-1.5B-Instruct                      | Qwen/Qwen2.5-1.5B-Instruct                            | 32768                | 32768                | 32                   | 32                   | 8              | 1                           |
| Qwen         | Qwen2.5-1.5B                               | Qwen/Qwen2.5-1.5B                                     | 131072               | 131072               | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2-72B-Instruct                         | Qwen/Qwen2-72B-Instruct                               | 32768                | 16384                | 16                   | 16                   | 16             | 1                           |
| Qwen         | Qwen2-72B                                  | Qwen/Qwen2-72B                                        | 32768                | 16384                | 16                   | 16                   | 16             | 1                           |
| Qwen         | Qwen2-7B-Instruct                          | Qwen/Qwen2-7B-Instruct                                | 32768                | 32768                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2-7B                                   | Qwen/Qwen2-7B                                         | 131072               | 24576                | 8                    | 8                    | 8              | 1                           |
| Qwen         | Qwen2-1.5B-Instruct                        | Qwen/Qwen2-1.5B-Instruct                              | 32768                | 32768                | 32                   | 32                   | 8              | 1                           |
| Qwen         | Qwen2-1.5B                                 | Qwen/Qwen2-1.5B                                       | 131072               | 131072               | 8                    | 8                    | 8              | 1                           |
| Mistral      | Mixtral-8x7B-Instruct-v0.1                 | mistralai/Mixtral-8x7B-Instruct-v0.1                  | 32768                | 32768                | 8                    | 8                    | 8              | 1                           |
| Mistral      | Mixtral-8x7B-v0.1                          | mistralai/Mixtral-8x7B-v0.1                           | 32768                | 32768                | 8                    | 8                    | 8              | 1                           |
| Mistral      | Mistral-7B-Instruct-v0.2                   | mistralai/Mistral-7B-Instruct-v0.2                    | 32768                | 32768                | 16                   | 16                   | 8              | 1                           |
| Mistral      | Mistral-7B-v0.1                            | mistralai/Mistral-7B-v0.1                             | 32768                | 32768                | 16                   | 16                   | 8              | 1                           |
| Together     | llama-2-7b-chat                            | togethercomputer/llama-2-7b-chat                      | 4096                 | 4096                 | 128                  | 128                  | 8              | 1                           |

## Full Fine-tuning

| Organization | Model Name                            | Model String for API                             | Context Length (SFT) | Context Length (DPO) | Max Batch Size (SFT) | Max Batch Size (DPO) | Min Batch Size |
| ------------ | ------------------------------------- | ------------------------------------------------ | -------------------- | -------------------- | -------------------- | -------------------- | -------------- |
| Qwen         | Qwen3.5-27B                           | Qwen/Qwen3.5-27B                                 | 32768                | 16384                | 16                   | 16                   | 16             |
| Qwen         | Qwen3.5-9B                            | Qwen/Qwen3.5-9B                                  | 65536                | 49152                | 8                    | 8                    | 8              |
| Qwen         | Qwen3.5-4B                            | Qwen/Qwen3.5-4B                                  | 131072               | 65536                | 8                    | 8                    | 8              |
| Qwen         | Qwen3.5-2B                            | Qwen/Qwen3.5-2B                                  | 131072               | 131072               | 8                    | 8                    | 8              |
| Qwen         | Qwen3.5-0.8B                          | Qwen/Qwen3.5-0.8B                                | 131072               | 131072               | 8                    | 8                    | 8              |
| DeepSeek     | DeepSeek-R1-Distill-Llama-70B         | deepseek-ai/DeepSeek-R1-Distill-Llama-70B        | 24576                | 12288                | 32                   | 32                   | 32             |
| DeepSeek     | DeepSeek-R1-Distill-Qwen-14B          | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B         | 65536                | 32768                | 8                    | 8                    | 8              |
| DeepSeek     | DeepSeek-R1-Distill-Qwen-1.5B         | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B        | 131072               | 131072               | 8                    | 8                    | 8              |
| Google       | gemma-3-270m                          | google/gemma-3-270m                              | 32768                | 32768                | 128                  | 128                  | 8              |
| Google       | gemma-3-270m-it                       | google/gemma-3-270m-it                           | 32768                | 32768                | 128                  | 128                  | 8              |
| Google       | gemma-3-1b-it                         | google/gemma-3-1b-it                             | 32768                | 32768                | 64                   | 64                   | 8              |
| Google       | gemma-3-1b-pt                         | google/gemma-3-1b-pt                             | 32768                | 32768                | 64                   | 64                   | 8              |
| Google       | gemma-3-4b-it                         | google/gemma-3-4b-it                             | 131072               | 65536                | 8                    | 8                    | 8              |
| Google       | gemma-3-4b-it-VLM                     | google/gemma-3-4b-it-VLM                         | 32768                | 32768                | 8                    | 8                    | 8              |
| Google       | gemma-3-4b-pt                         | google/gemma-3-4b-pt                             | 131072               | 65536                | 8                    | 8                    | 8              |
| Google       | gemma-3-12b-it                        | google/gemma-3-12b-it                            | 65536                | 49152                | 8                    | 8                    | 8              |
| Google       | gemma-3-12b-it-VLM                    | google/gemma-3-12b-it-VLM                        | 32768                | 32768                | 8                    | 8                    | 8              |
| Google       | gemma-3-12b-pt                        | google/gemma-3-12b-pt                            | 65536                | 49152                | 8                    | 8                    | 8              |
| Google       | gemma-3-27b-it                        | google/gemma-3-27b-it                            | 49152                | 24576                | 16                   | 16                   | 16             |
| Google       | gemma-3-27b-it-VLM                    | google/gemma-3-27b-it-VLM                        | 32768                | 24576                | 16                   | 16                   | 16             |
| Google       | gemma-3-27b-pt                        | google/gemma-3-27b-pt                            | 49152                | 24576                | 16                   | 16                   | 16             |
| Google       | gemma-4-31B-it                        | google/gemma-4-31B-it                            | 49152                | 24576                | 8                    | 8                    | 8              |
| Google       | gemma-4-26B-A4B-it                    | google/gemma-4-26B-A4B-it                        | 49152                | 24576                | 8                    | 8                    | 8              |
| Qwen         | Qwen3-0.6B                            | Qwen/Qwen3-0.6B                                  | 40960                | 40960                | 64                   | 64                   | 8              |
| Qwen         | Qwen3-0.6B-Base                       | Qwen/Qwen3-0.6B-Base                             | 32768                | 32768                | 64                   | 64                   | 8              |
| Qwen         | Qwen3-1.7B                            | Qwen/Qwen3-1.7B                                  | 40960                | 40960                | 32                   | 32                   | 8              |
| Qwen         | Qwen3-1.7B-Base                       | Qwen/Qwen3-1.7B-Base                             | 32768                | 32768                | 32                   | 32                   | 8              |
| Qwen         | Qwen3-4B                              | Qwen/Qwen3-4B                                    | 40960                | 40960                | 16                   | 16                   | 8              |
| Qwen         | Qwen3-4B-Base                         | Qwen/Qwen3-4B-Base                               | 32768                | 32768                | 16                   | 16                   | 8              |
| Qwen         | Qwen3-8B                              | Qwen/Qwen3-8B                                    | 40960                | 40960                | 8                    | 8                    | 8              |
| Qwen         | Qwen3-8B-Base                         | Qwen/Qwen3-8B-Base                               | 32768                | 32768                | 16                   | 16                   | 8              |
| Qwen         | Qwen3-14B                             | Qwen/Qwen3-14B                                   | 40960                | 40960                | 8                    | 8                    | 8              |
| Qwen         | Qwen3-14B-Base                        | Qwen/Qwen3-14B-Base                              | 32768                | 32768                | 8                    | 8                    | 8              |
| Qwen         | Qwen3-32B                             | Qwen/Qwen3-32B                                   | 40960                | 24576                | 16                   | 16                   | 16             |
| Qwen         | Qwen3-VL-8B-Instruct                  | Qwen/Qwen3-VL-8B-Instruct                        | 24576                | 16384                | 8                    | 8                    | 8              |
| Qwen         | Qwen3-VL-32B-Instruct                 | Qwen/Qwen3-VL-32B-Instruct                       | 16384                | 16384                | 16                   | 16                   | 16             |
| Qwen         | Qwen3-VL-30B-A3B-Instruct             | Qwen/Qwen3-VL-30B-A3B-Instruct                   | 16384                | 16384                | 8                    | 8                    | 8              |
| NVIDIA       | NVIDIA-Nemotron-Nano-9B-v2            | nvidia/NVIDIA-Nemotron-Nano-9B-v2                | 32768                | 16384                | 8                    | 8                    | 8              |
| Meta         | Llama-3.3-70B-Instruct-Reference      | meta-llama/Llama-3.3-70B-Instruct-Reference      | 24576                | 12288                | 32                   | 32                   | 32             |
| Meta         | Llama-3.2-3B-Instruct                 | meta-llama/Llama-3.2-3B-Instruct                 | 131072               | 65536                | 8                    | 8                    | 8              |
| Meta         | Llama-3.2-3B                          | meta-llama/Llama-3.2-3B                          | 131072               | 65536                | 8                    | 8                    | 8              |
| Meta         | Llama-3.2-1B-Instruct                 | meta-llama/Llama-3.2-1B-Instruct                 | 131072               | 131072               | 8                    | 8                    | 8              |
| Meta         | Llama-3.2-1B                          | meta-llama/Llama-3.2-1B                          | 131072               | 131072               | 8                    | 8                    | 8              |
| Meta         | Meta-Llama-3.1-8B-Instruct-Reference  | meta-llama/Meta-Llama-3.1-8B-Instruct-Reference  | 131072               | 65536                | 8                    | 8                    | 8              |
| Meta         | Meta-Llama-3.1-8B-Reference           | meta-llama/Meta-Llama-3.1-8B-Reference           | 131072               | 65536                | 8                    | 8                    | 8              |
| Meta         | Meta-Llama-3.1-70B-Instruct-Reference | meta-llama/Meta-Llama-3.1-70B-Instruct-Reference | 24576                | 12288                | 32                   | 32                   | 32             |
| Meta         | Meta-Llama-3.1-70B-Reference          | meta-llama/Meta-Llama-3.1-70B-Reference          | 24576                | 12288                | 32                   | 32                   | 32             |
| Meta         | Meta-Llama-3-8B-Instruct              | meta-llama/Meta-Llama-3-8B-Instruct              | 8192                 | 8192                 | 64                   | 64                   | 8              |
| Meta         | Meta-Llama-3-8B                       | meta-llama/Meta-Llama-3-8B                       | 8192                 | 8192                 | 64                   | 64                   | 8              |
| Meta         | Meta-Llama-3-70B-Instruct             | meta-llama/Meta-Llama-3-70B-Instruct             | 8192                 | 8192                 | 32                   | 32                   | 32             |
| Qwen         | Qwen2-7B-Instruct                     | Qwen/Qwen2-7B-Instruct                           | 32768                | 32768                | 8                    | 8                    | 8              |
| Qwen         | Qwen2-7B                              | Qwen/Qwen2-7B                                    | 131072               | 24576                | 8                    | 8                    | 8              |
| Qwen         | Qwen2-1.5B-Instruct                   | Qwen/Qwen2-1.5B-Instruct                         | 32768                | 32768                | 32                   | 32                   | 8              |
| Qwen         | Qwen2-1.5B                            | Qwen/Qwen2-1.5B                                  | 131072               | 131072               | 8                    | 8                    | 8              |
| Mistral      | Mixtral-8x7B-Instruct-v0.1            | mistralai/Mixtral-8x7B-Instruct-v0.1             | 32768                | 32768                | 16                   | 16                   | 16             |
| Mistral      | Mixtral-8x7B-v0.1                     | mistralai/Mixtral-8x7B-v0.1                      | 32768                | 32768                | 16                   | 16                   | 16             |
| Mistral      | Mistral-7B-Instruct-v0.2              | mistralai/Mistral-7B-Instruct-v0.2               | 32768                | 32768                | 16                   | 16                   | 8              |
| Mistral      | Mistral-7B-v0.1                       | mistralai/Mistral-7B-v0.1                        | 32768                | 32768                | 16                   | 16                   | 8              |
| Together     | llama-2-7b-chat                       | togethercomputer/llama-2-7b-chat                 | 4096                 | 4096                 | 128                  | 128                  | 8              |
