Inference Models

Explore all the open source models we offer.

Serverless Endpoints

Pre-configured instances of popular models hosted for free, priced per 1M tokens used. The below models are available through our inference API as serverless endpoints.

Request for a model to be added to serverless endpoints or for dedicated instance or capacity for these models.

Chat Models

Use our Chat Completions endpoint for Chat Models.

OrganizationModel NameModel String for APIContext length
01.AI01-ai Yi Chat (34B)zero-one-ai/Yi-34B-Chat4096
AustismChronos Hermes (13B)Austism/chronos-hermes-13b2048
cognitivecomputationsDolphin 2.5 Mixtral 8x7bcognitivecomputations/dolphin-2.5-mixtral-8x7b32768
databricksDBRX Instructdatabricks/dbrx-instruct32768
DeepSeekDeepseek Coder Instruct (33B)deepseek-ai/deepseek-coder-33b-instruct16384
DeepSeekDeepSeek LLM Chat (67B)deepseek-ai/deepseek-llm-67b-chat4096
garage-bAIndPlatypus2 Instruct (70B)garage-bAInd/Platypus2-70B-instruct4096
GoogleGemma Instruct (2B)google/gemma-2b-it8192
GoogleGemma Instruct (7B)google/gemma-7b-it8192
GrypheMythoMax-L2 (13B)Gryphe/MythoMax-L2-13b4096
GrypheMythoMax-L2 Lite (13B)Gryphe/MythoMax-L2-13b-Lite4096
LM SysVicuna v1.5 (13B)lmsys/vicuna-13b-v1.54096
LM SysVicuna v1.5 (7B)lmsys/vicuna-7b-v1.54096
MetaCode Llama Instruct (13B)codellama/CodeLlama-13b-Instruct-hf16384
MetaCode Llama Instruct (34B)codellama/CodeLlama-34b-Instruct-hf16384
MetaCode Llama Instruct (70B)codellama/CodeLlama-70b-Instruct-hf4096
MetaCode Llama Instruct (7B)codellama/CodeLlama-7b-Instruct-hf16384
MetaLLaMA-2 Chat (70B)meta-llama/Llama-2-70b-chat-hf4096
MetaLLaMA-2 Chat (13B)meta-llama/Llama-2-13b-chat-hf4096
MetaLLaMA-2 Chat (7B)meta-llama/Llama-2-7b-chat-hf4096
MetaLLaMA-3 Chat (8B)meta-llama/Llama-3-8b-chat-hf8192
MetaLLaMA-3 Chat (70B)meta-llama/Llama-3-70b-chat-hf8192
MetaLLaMA-3 Chat (8B) Turbometa-llama/Meta-Llama-3-8B-Instruct-Turbo8192
MetaLLaMA-3 Chat (70B) Turbometa-llama/Meta-Llama-3-70B-Instruct-Turbo8192
MetaLlama 3.1 8B Instruct Turbometa-llama/Meta-Llama-3.1-8B-Instruct-Turbo8192
MetaLlama 3.1 70B Instruct Turbometa-llama/Meta-Llama-3.1-70B-Instruct-Turbo8192
MetaLlama 3.1 405B Instruct Turbometa-llama/Meta-Llama-3.1-405B-Instruct-Turbo4096
mistralaiMistral (7B) Instructmistralai/Mistral-7B-Instruct-v0.18192
mistralaiMistral (7B) Instruct v0.2mistralai/Mistral-7B-Instruct-v0.232768
mistralaiMistral (7B) Instruct v0.3mistralai/Mistral-7B-Instruct-v0.332768
mistralaiMixtral-8x7B Instruct (46.7B)mistralai/Mixtral-8x7B-Instruct-v0.132768
mistralaiMixtral-8x22B Instruct (141B)mistralai/Mixtral-8x22B-Instruct-v0.165536
NousResearchNous Capybara v1.9 (7B)NousResearch/Nous-Capybara-7B-V1p98192
NousResearchNous Hermes 2 - Mistral DPO (7B)NousResearch/Nous-Hermes-2-Mistral-7B-DPO32768
NousResearchNous Hermes 2 - Mixtral 8x7B-DPO (46.7B)NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO32768
NousResearchNous Hermes 2 - Mixtral 8x7B-SFT (46.7B)NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT32768
NousResearchNous Hermes LLaMA-2 (7B)NousResearch/Nous-Hermes-llama-2-7b4096
NousResearchNous Hermes Llama-2 (13B)NousResearch/Nous-Hermes-Llama2-13b4096
NousResearchNous Hermes-2 Yi (34B)NousResearch/Nous-Hermes-2-Yi-34B4096
OpenChatOpenChat 3.5 (7B)openchat/openchat-3.5-12108192
OpenOrcaOpenOrca Mistral (7B) 8KOpen-Orca/Mistral-7B-OpenOrca8192
QwenQwen 1.5 Chat (0.5B)Qwen/Qwen1.5-0.5B-Chat32768
QwenQwen 1.5 Chat (1.8B)Qwen/Qwen1.5-1.8B-Chat32768
QwenQwen 1.5 Chat (4B)Qwen/Qwen1.5-4B-Chat32768
QwenQwen 1.5 Chat (7B)Qwen/Qwen1.5-7B-Chat32768
QwenQwen 1.5 Chat (14B)Qwen/Qwen1.5-14B-Chat32768
QwenQwen 1.5 Chat (32B)Qwen/Qwen1.5-32B-Chat32768
QwenQwen 1.5 Chat (72B)Qwen/Qwen1.5-72B-Chat32768
QwenQwen 1.5 Chat (110B)Qwen/Qwen1.5-110B-Chat32768
QwenQwen 2 Instruct (72B)Qwen/Qwen2-72B-Instruct32768
Snorkel AISnorkel Mistral PairRM DPO (7B)snorkelai/Snorkel-Mistral-PairRM-DPO32768
SnowflakeSnowflake Arctic InstructSnowflake/snowflake-arctic-instruct4096
StanfordAlpaca (7B)togethercomputer/alpaca-7b2048
TekniumOpenHermes-2-Mistral (7B)teknium/OpenHermes-2-Mistral-7B8192
TekniumOpenHermes-2.5-Mistral (7B)teknium/OpenHermes-2p5-Mistral-7B8192
TogetherLLaMA-2-7B-32K-Instruct (7B)togethercomputer/Llama-2-7B-32K-Instruct32768
TogetherRedPajama-INCITE Chat (3B)togethercomputer/RedPajama-INCITE-Chat-3B-v12048
TogetherRedPajama-INCITE Chat (7B)togethercomputer/RedPajama-INCITE-7B-Chat2048
TogetherStripedHyena Nous (7B)togethercomputer/StripedHyena-Nous-7B32768
Undi95ReMM SLERP L2 (13B)Undi95/ReMM-SLERP-L2-13B4096
Undi95Toppy M (7B)Undi95/Toppy-M-7B4096
WizardLMWizardLM v1.2 (13B)WizardLM/WizardLM-13B-V1.24096
upstageUpstage SOLAR Instruct v1 (11B)upstage/SOLAR-10.7B-Instruct-v1.04096

Language Models

Use our Completions endpoint for Language Models.

OrganizationModel NameModel String for APIContext length
01.AI01-ai Yi Base (34B)zero-one-ai/Yi-34B4096
01.AI01-ai Yi Base (6B)zero-one-ai/Yi-6B4096
Mistral AIMixtral 8X22Bmistralai/Mixtral-8x22B65536
GoogleGemma (2B)google/gemma-2b8192
GoogleGemma (7B)google/gemma-7b8192
MetaLLaMA-2 (70B)meta-llama/Llama-2-70b-hf4096
MetaLLaMA-2 (13B)meta-llama/Llama-2-13b-hf4096
MetaLLaMA-2 (7B)meta-llama/Llama-2-7b-hf4096
MetaLLaMA-3 (8B)meta-llama/Llama-3-8b-hf8192
MetaLLaMA-3 (70B)meta-llama/Meta-Llama-3-70B8192
MicrosoftMicrosoft Phi-2microsoft/phi-22048
NexusflowNexusRaven (13B)Nexusflow/NexusRaven-V2-13B16384
QwenQwen 1.5 (0.5B)Qwen/Qwen1.5-0.5B32768
QwenQwen 1.5 (1.8B)Qwen/Qwen1.5-1.8B32768
QwenQwen 1.5 (4B)Qwen/Qwen1.5-4B32768
QwenQwen 1.5 (7B)Qwen/Qwen1.5-7B32768
QwenQwen 1.5 (14B)Qwen/Qwen1.5-14B32768
QwenQwen 1.5 (32B)Qwen/Qwen1.5-32B32768
QwenQwen 1.5 (72B)Qwen/Qwen1.5-72B4096
TogetherGPT-JT-Moderation (6B)togethercomputer/GPT-JT-Moderation-6B2048
TogetherLLaMA-2-32K (7B)togethercomputer/LLaMA-2-7B-32K32768
TogetherRedPajama-INCITE (3B)togethercomputer/RedPajama-INCITE-Base-3B-v12048
TogetherRedPajama-INCITE (7B)togethercomputer/RedPajama-INCITE-7B-Base2048
TogetherRedPajama-INCITE Instruct (3B)togethercomputer/RedPajama-INCITE-Instruct-3B-v12048
TogetherRedPajama-INCITE Instruct (7B)togethercomputer/RedPajama-INCITE-7B-Instruct2048
TogetherStripedHyena Hessian (7B)togethercomputer/StripedHyena-Hessian-7B32768
mistralaiMistral (7B)mistralai/Mistral-7B-v0.18192
mistralaiMixtral-8x7B (46.7B)mistralai/Mixtral-8x7B-v0.132768

Code Models

Use our Completions endpoint for Code Models.

OrganizationModel NameModel String for APIContext length
MetaCode Llama Python (70B)codellama/CodeLlama-70b-Python-hf4096
MetaCode Llama Python (34B)codellama/CodeLlama-34b-Python-hf16384
MetaCode Llama Python (13B)codellama/CodeLlama-13b-Python-hf16384
MetaCode Llama Python (7B)codellama/CodeLlama-7b-Python-hf16384
PhindPhind Code LLaMA v2 (34B)Phind/Phind-CodeLlama-34B-v216384
WizardLMWizardCoder Python v1.0 (34B)WizardLM/WizardCoder-Python-34B-V1.08192

Image Models

Use our Completions endpoint for Image Models.

OrganizationModel NameModel String for API
Prompt HeroOpenjourney v4prompthero/openjourney
Runway MLStable Diffusion 1.5runwayml/stable-diffusion-v1-5
SG161222Realistic Vision 3.0SG161222/Realistic_Vision_V3.0_VAE
Stability AIStable Diffusion 2.1stabilityai/stable-diffusion-2-1
Stability AIStable Diffusion XL 1.0stabilityai/stable-diffusion-xl-base-1.0
WavymulderAnalog Diffusionwavymulder/Analog-Diffusion

Moderation Models

Use our Completions endpoint to run a moderation model as a standalone classifier, or use it alongside any of the other models above as a filter to safeguard responses from 100+ models, by specifying the parameter "safety_model": "MODEL_API_STRING"

OrganizationModel NameModel String for APIContext length
MetaLlama Guard (7B)Meta-Llama/Llama-Guard-7b4096
MetaLLama Guard 3 (8B)meta-llama/Meta-Llama-Guard-3-8B4096

Genomic Models

Use our Completions endpoint for Genomic Models.

OrganizationModel NameModel String for APIContext length
TogetherEvo-1 Base (8K)togethercomputer/evo-1-8k-base8192*
TogetherEvo-1 Base (131K)togethercomputer/evo-1-131k-base131072*

* Evo-1 models can handle up to 4096 input tokens, while output sequences can extend up to the difference between the context length and the input sequence length.

Model Request

Don't see a model you want to use? Go to our contact page and add or upvote the model(s) you'd like to use on our API!

Dedicated Instances

Customizable on-demand deployable model instances, priced by hour hosted. All models in the serverless endpoints are available for hosting as private dedicated instances. Additionally, the below models are also available for hosting as private dedicated instances. Request an instance.

Chat Models

OrganizationModel NameModel String for API
DatabricksDolly v2 (12B)databricks/dolly-v2-12b
DatabricksDolly v2 (3B)databricks/dolly-v2-3b
DatabricksDolly v2 (7B)databricks/dolly-v2-7b
DiscoResearchDiscoLM Mixtral 8x7b (46.7B)DiscoResearch/DiscoLM-mixtral-8x7b-v2
HuggingFaceZephyr-7B-ßHuggingFaceH4/zephyr-7b-beta
HuggingFaceH4StarCoderChat Alpha (16B)HuggingFaceH4/starchat-alpha
LAIONOpen-Assistant StableLM SFT-7 (7B)OpenAssistant/stablelm-7b-sft-v7-epoch-3
LM SysKoala (13B)togethercomputer/Koala-13B
LM SysKoala (7B)togethercomputer/Koala-7B
LM SysVicuna v1.3 (13B)lmsys/vicuna-13b-v1.3
LM SysVicuna v1.3 (7B)lmsys/vicuna-7b-v1.3
LM SysVicuna-FastChat-T5 (3B)lmsys/fastchat-t5-3b-v1.0
Mosaic MLMPT-Chat (30B)togethercomputer/mpt-30b-chat
Mosaic MLMPT-Chat (7B)togethercomputer/mpt-7b-chat
NousResearchNous Hermes LLaMA-2 (70B)NousResearch/Nous-Hermes-Llama2-70b
QwenQwen Chat (7B)Qwen/Qwen-7B-Chat
QwenQwen Chat (14B)Qwen/Qwen-14B-Chat
TIIFalcon Instruct (7B)tiiuae/falcon-7b-instruct
TIIFalcon Instruct (40B)tiiuae/falcon-40b-instruct
Tim DettmersGuanaco (13B)togethercomputer/guanaco-13b
Tim DettmersGuanaco (33B)togethercomputer/guanaco-33b
Tim DettmersGuanaco (65B)togethercomputer/guanaco-65b
Tim DettmersGuanaco (7B)togethercomputer/guanaco-7b
TogetherGPT-NeoXT-Chat-Base (20B)togethercomputer/GPT-NeoXT-Chat-Base-20B

Language Models

OrganizationModel NameModel String for APIContext length
DefogSqlcoder (15B)defog/sqlcoder8192
EleutherAIGPT-J (6B)EleutherAI/gpt-j-6b2048
EleutherAIGPT-NeoX (20B)EleutherAI/gpt-neox-20b2048
EleutherAILlemma (7B)EleutherAI/llemma_7b4096
GoogleFlan T5 XL (3B)google/flan-t5-xl512
GoogleFlan T5 XXL (11B)google/flan-t5-xxl512
MetaLLaMA (7B)huggyllama/llama-7b2048
MetaLLaMA (13B)huggyllama/llama-13b2048
MetaLLaMA (30B)huggyllama/llama-30b2048
MetaLLaMA (65B)huggyllama/llama-65b2048
MetaLLaMA (7B)huggyllama/llama-7b2048
Mosaic MLMPT (7B)mosaicml/mpt-7b2048
Mosaic MLMPT-Instruct (7B)mosaicml/mpt-7b-instruct2048
Nous ResearchNous Hermes (13B)NousResearch/Nous-Hermes-13b2048
Numbers StationNSQL (6B)NumbersStation/nsql-6B2048
QwenQwen (7B)Qwen/Qwen-7B2048
QwenQwen (14B)Qwen/Qwen-14B2048
Stability AIStableLM-Base-Alpha (3B)stabilityai/stablelm-base-alpha-3b4096
Stability AIStableLM-Base-Alpha (7B)stabilityai/stablelm-base-alpha-7b4096
TIIFalcon (7B)tiiuae/falcon-7b2048
TIIFalcon (7B)tiiuae/falcon-40b2048
TogetherGPT-JT (6B)togethercomputer/GPT-JT-6B-v12048
WizardLMWizardLM v1.0 (70B)WizardLM/WizardLM-70B-V1.04096

Code Models

OrganizationModel NameModel String for APIContext length
BigCodeStarCoder (16B)bigcode/starcoder8192
MetaCode Llama (70B)codellama/CodeLlama-70b-hf16384
MetaCode Llama Python (70B)codellama/CodeLlama-70b-Python-hf4096
MetaCode Llama Instruct (70B)codellama/CodeLlama-70b-Instruct-hf4096
Numbers StationNSQL LLaMA-2 (7B)NumbersStation/nsql-llama-2-7B4096
PhindPhind Code LLaMA Python v1 (34B)Phind/Phind-CodeLlama-34B-Python-v116384
ReplitReplit-Code-v1 (3B)replit/replit-code-v1-3b2048
SalesforceCodeGen2 (16B)Salesforce/codegen2-16B2048
SalesforceCodeGen2 (7B)Salesforce/codegen2-7B2048


Request a model