Chat Models

See which open-source chat models we currently host, or learn how to configure and host your own.

Our Chat API has built-in support for many popular models we host via our serverless endpoints, as well as any model that you configure and host yourself using our dedicated GPU infrastructure.

When using one of our serverless models, you'll be charged based on the amount of tokens you use in your queries. For dedicated models that you configure and run yourself, you'll be charged per minute as long as your endpoint is running. You can start or stop your endpoint at any time using our online playground.

To learn more about the pricing for our serverless endoints, check out our pricing page.

We recently added vision models! To find them, see Vision Models.

Hosted models

In the table below, models marked as "Turbo" are quantized to FP8 and those marked as "Lite" are INT4. All our other models are at full precision (FP16).

If you're not sure which chat model to use, we currently recommend Llama 3.1 8B Turbo (meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo) to get started.

OrganizationModel NameAPI Model StringContext lengthQuantization
MetaLlama 3.1 8B Instruct Turbometa-llama/Meta-Llama-3.1-8B-Instruct-Turbo131072FP8
MetaLlama 3.1 70B Instruct Turbometa-llama/Meta-Llama-3.1-70B-Instruct-Turbo131072FP8
MetaLlama 3.1 405B Instruct Turbometa-llama/Meta-Llama-3.1-405B-Instruct-Turbo130815FP8
MetaLlama 3 8B Instruct Turbometa-llama/Meta-Llama-3-8B-Instruct-Turbo8192FP8
MetaLlama 3 70B Instruct Turbometa-llama/Meta-Llama-3-70B-Instruct-Turbo8192FP8
MetaLlama 3.2 3B Instruct Turbometa-llama/Llama-3.2-3B-Instruct-Turbo131072FP16
MetaLlama 3 8B Instruct Litemeta-llama/Meta-Llama-3-8B-Instruct-Lite8192INT4
MetaLlama 3 70B Instruct Litemeta-llama/Meta-Llama-3-70B-Instruct-Lite8192INT4
MetaLlama 3 8B Instruct Referencemeta-llama/Llama-3-8b-chat-hf8192FP16
MetaLlama 3 70B Instruct Referencemeta-llama/Llama-3-70b-chat-hf8192FP16
MicrosoftWizardLM-2 8x22Bmicrosoft/WizardLM-2-8x22B65536FP16
GoogleGemma 2 27Bgoogle/gemma-2-27b-it8192FP16
GoogleGemma 2 9Bgoogle/gemma-2-9b-it8192FP16
databricksDBRX Instructdatabricks/dbrx-instruct32768FP16
DeepSeekDeepSeek LLM Chat (67B)deepseek-ai/deepseek-llm-67b-chat4096FP16
GoogleGemma Instruct (2B)google/gemma-2b-it8192FP16
GrypheMythoMax-L2 (13B)Gryphe/MythoMax-L2-13b4096FP16
MetaLLaMA-2 Chat (13B)meta-llama/Llama-2-13b-chat-hf4096FP16
mistralaiMistral (7B) Instructmistralai/Mistral-7B-Instruct-v0.18192FP16
mistralaiMistral (7B) Instruct v0.2mistralai/Mistral-7B-Instruct-v0.232768FP16
mistralaiMistral (7B) Instruct v0.3mistralai/Mistral-7B-Instruct-v0.332768FP16
mistralaiMixtral-8x7B Instruct (46.7B)mistralai/Mixtral-8x7B-Instruct-v0.132768FP16
mistralaiMixtral-8x22B Instruct (141B)mistralai/Mixtral-8x22B-Instruct-v0.165536FP16
NousResearchNous Hermes 2 - Mixtral 8x7B-DPO (46.7B)NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO32768FP16
NousResearch[Deprecating 10/1] Nous Hermes-2 Yi (34B)NousResearch/Nous-Hermes-2-Yi-34B4096FP16
NousResearch[Deprecating 10/1] Hermes 3 - Llama-3.1 405BNousResearch/Hermes-3-Llama-3.1-405B-Turbo8192FP8
QwenQwen 1.5 Chat (72B)Qwen/Qwen1.5-72B-Chat32768FP16
QwenQwen 1.5 Chat (110B)Qwen/Qwen1.5-110B-Chat32768FP16
QwenQwen 2 Instruct (72B)Qwen/Qwen2-72B-Instruct32768FP16
TogetherStripedHyena Nous (7B)togethercomputer/StripedHyena-Nous-7B32768FP16
upstageUpstage SOLAR Instruct v1 (11B)upstage/SOLAR-10.7B-Instruct-v1.04096FP16

Request a model

Don't see a model you want to use?

Send us a Model Request here →