Inference Models
Explore all the open source models we offer.
Serverless Endpoints
Pre-configured instances of popular models hosted for free, priced per 1M tokens used. The below models are available through our inference API as serverless endpoints.
Request for a model to be added to serverless endpoints or for dedicated instance or capacity for these models.
Chat Models
Use our Chat Completions endpoint for Chat Models.
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
01.AI | 01-ai Yi Chat (34B) | zero-one-ai/Yi-34B-Chat | 4096 |
Austism | Chronos Hermes (13B) | Austism/chronos-hermes-13b | 2048 |
cognitivecomputations | Dolphin 2.5 Mixtral 8x7b | cognitivecomputations/dolphin-2.5-mixtral-8x7b | 32768 |
databricks | DBRX Instruct | databricks/dbrx-instruct | 32768 |
DeepSeek | Deepseek Coder Instruct (33B) | deepseek-ai/deepseek-coder-33b-instruct | 16384 |
DeepSeek | DeepSeek LLM Chat (67B) | deepseek-ai/deepseek-llm-67b-chat | 4096 |
garage-bAInd | Platypus2 Instruct (70B) | garage-bAInd/Platypus2-70B-instruct | 4096 |
Gemma Instruct (2B) | google/gemma-2b-it | 8192 | |
Gemma Instruct (7B) | google/gemma-7b-it | 8192 | |
Gryphe | MythoMax-L2 (13B) | Gryphe/MythoMax-L2-13b | 4096 |
Gryphe | MythoMax-L2 Lite (13B) | Gryphe/MythoMax-L2-13b-Lite | 4096 |
LM Sys | Vicuna v1.5 (13B) | lmsys/vicuna-13b-v1.5 | 4096 |
LM Sys | Vicuna v1.5 (7B) | lmsys/vicuna-7b-v1.5 | 4096 |
Meta | Code Llama Instruct (13B) | codellama/CodeLlama-13b-Instruct-hf | 16384 |
Meta | Code Llama Instruct (34B) | codellama/CodeLlama-34b-Instruct-hf | 16384 |
Meta | Code Llama Instruct (70B) | codellama/CodeLlama-70b-Instruct-hf | 4096 |
Meta | Code Llama Instruct (7B) | codellama/CodeLlama-7b-Instruct-hf | 16384 |
Meta | LLaMA-2 Chat (70B) | meta-llama/Llama-2-70b-chat-hf | 4096 |
Meta | LLaMA-2 Chat (13B) | meta-llama/Llama-2-13b-chat-hf | 4096 |
Meta | LLaMA-2 Chat (7B) | meta-llama/Llama-2-7b-chat-hf | 4096 |
Meta | LLaMA-3 Chat (8B) | meta-llama/Llama-3-8b-chat-hf | 8192 |
Meta | LLaMA-3 Chat (70B) | meta-llama/Llama-3-70b-chat-hf | 8192 |
Meta | LLaMA-3 Chat (8B) Turbo | meta-llama/Meta-Llama-3-8B-Instruct-Turbo | 8192 |
Meta | LLaMA-3 Chat (70B) Turbo | meta-llama/Meta-Llama-3-70B-Instruct-Turbo | 8192 |
Meta | Llama 3.1 8B Instruct Turbo | meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 8192 |
Meta | Llama 3.1 70B Instruct Turbo | meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo | 8192 |
Meta | Llama 3.1 405B Instruct Turbo | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 4096 |
mistralai | Mistral (7B) Instruct | mistralai/Mistral-7B-Instruct-v0.1 | 8192 |
mistralai | Mistral (7B) Instruct v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 |
mistralai | Mistral (7B) Instruct v0.3 | mistralai/Mistral-7B-Instruct-v0.3 | 32768 |
mistralai | Mixtral-8x7B Instruct (46.7B) | mistralai/Mixtral-8x7B-Instruct-v0.1 | 32768 |
mistralai | Mixtral-8x22B Instruct (141B) | mistralai/Mixtral-8x22B-Instruct-v0.1 | 65536 |
NousResearch | Nous Capybara v1.9 (7B) | NousResearch/Nous-Capybara-7B-V1p9 | 8192 |
NousResearch | Nous Hermes 2 - Mistral DPO (7B) | NousResearch/Nous-Hermes-2-Mistral-7B-DPO | 32768 |
NousResearch | Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B) | NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO | 32768 |
NousResearch | Nous Hermes 2 - Mixtral 8x7B-SFT (46.7B) | NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT | 32768 |
NousResearch | Nous Hermes LLaMA-2 (7B) | NousResearch/Nous-Hermes-llama-2-7b | 4096 |
NousResearch | Nous Hermes Llama-2 (13B) | NousResearch/Nous-Hermes-Llama2-13b | 4096 |
NousResearch | Nous Hermes-2 Yi (34B) | NousResearch/Nous-Hermes-2-Yi-34B | 4096 |
OpenChat | OpenChat 3.5 (7B) | openchat/openchat-3.5-1210 | 8192 |
OpenOrca | OpenOrca Mistral (7B) 8K | Open-Orca/Mistral-7B-OpenOrca | 8192 |
Qwen | Qwen 1.5 Chat (0.5B) | Qwen/Qwen1.5-0.5B-Chat | 32768 |
Qwen | Qwen 1.5 Chat (1.8B) | Qwen/Qwen1.5-1.8B-Chat | 32768 |
Qwen | Qwen 1.5 Chat (4B) | Qwen/Qwen1.5-4B-Chat | 32768 |
Qwen | Qwen 1.5 Chat (7B) | Qwen/Qwen1.5-7B-Chat | 32768 |
Qwen | Qwen 1.5 Chat (14B) | Qwen/Qwen1.5-14B-Chat | 32768 |
Qwen | Qwen 1.5 Chat (32B) | Qwen/Qwen1.5-32B-Chat | 32768 |
Qwen | Qwen 1.5 Chat (72B) | Qwen/Qwen1.5-72B-Chat | 32768 |
Qwen | Qwen 1.5 Chat (110B) | Qwen/Qwen1.5-110B-Chat | 32768 |
Qwen | Qwen 2 Instruct (72B) | Qwen/Qwen2-72B-Instruct | 32768 |
Snorkel AI | Snorkel Mistral PairRM DPO (7B) | snorkelai/Snorkel-Mistral-PairRM-DPO | 32768 |
Snowflake | Snowflake Arctic Instruct | Snowflake/snowflake-arctic-instruct | 4096 |
Stanford | Alpaca (7B) | togethercomputer/alpaca-7b | 2048 |
Teknium | OpenHermes-2-Mistral (7B) | teknium/OpenHermes-2-Mistral-7B | 8192 |
Teknium | OpenHermes-2.5-Mistral (7B) | teknium/OpenHermes-2p5-Mistral-7B | 8192 |
Together | LLaMA-2-7B-32K-Instruct (7B) | togethercomputer/Llama-2-7B-32K-Instruct | 32768 |
Together | RedPajama-INCITE Chat (3B) | togethercomputer/RedPajama-INCITE-Chat-3B-v1 | 2048 |
Together | RedPajama-INCITE Chat (7B) | togethercomputer/RedPajama-INCITE-7B-Chat | 2048 |
Together | StripedHyena Nous (7B) | togethercomputer/StripedHyena-Nous-7B | 32768 |
Undi95 | ReMM SLERP L2 (13B) | Undi95/ReMM-SLERP-L2-13B | 4096 |
Undi95 | Toppy M (7B) | Undi95/Toppy-M-7B | 4096 |
WizardLM | WizardLM v1.2 (13B) | WizardLM/WizardLM-13B-V1.2 | 4096 |
upstage | Upstage SOLAR Instruct v1 (11B) | upstage/SOLAR-10.7B-Instruct-v1.0 | 4096 |
Language Models
Use our Completions endpoint for Language Models.
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
01.AI | 01-ai Yi Base (34B) | zero-one-ai/Yi-34B | 4096 |
01.AI | 01-ai Yi Base (6B) | zero-one-ai/Yi-6B | 4096 |
Mistral AI | Mixtral 8X22B | mistralai/Mixtral-8x22B | 65536 |
Gemma (2B) | google/gemma-2b | 8192 | |
Gemma (7B) | google/gemma-7b | 8192 | |
Meta | LLaMA-2 (70B) | meta-llama/Llama-2-70b-hf | 4096 |
Meta | LLaMA-2 (13B) | meta-llama/Llama-2-13b-hf | 4096 |
Meta | LLaMA-2 (7B) | meta-llama/Llama-2-7b-hf | 4096 |
Meta | LLaMA-3 (8B) | meta-llama/Llama-3-8b-hf | 8192 |
Meta | LLaMA-3 (70B) | meta-llama/Meta-Llama-3-70B | 8192 |
Microsoft | Microsoft Phi-2 | microsoft/phi-2 | 2048 |
Nexusflow | NexusRaven (13B) | Nexusflow/NexusRaven-V2-13B | 16384 |
Qwen | Qwen 1.5 (0.5B) | Qwen/Qwen1.5-0.5B | 32768 |
Qwen | Qwen 1.5 (1.8B) | Qwen/Qwen1.5-1.8B | 32768 |
Qwen | Qwen 1.5 (4B) | Qwen/Qwen1.5-4B | 32768 |
Qwen | Qwen 1.5 (7B) | Qwen/Qwen1.5-7B | 32768 |
Qwen | Qwen 1.5 (14B) | Qwen/Qwen1.5-14B | 32768 |
Qwen | Qwen 1.5 (32B) | Qwen/Qwen1.5-32B | 32768 |
Qwen | Qwen 1.5 (72B) | Qwen/Qwen1.5-72B | 4096 |
Together | GPT-JT-Moderation (6B) | togethercomputer/GPT-JT-Moderation-6B | 2048 |
Together | LLaMA-2-32K (7B) | togethercomputer/LLaMA-2-7B-32K | 32768 |
Together | RedPajama-INCITE (3B) | togethercomputer/RedPajama-INCITE-Base-3B-v1 | 2048 |
Together | RedPajama-INCITE (7B) | togethercomputer/RedPajama-INCITE-7B-Base | 2048 |
Together | RedPajama-INCITE Instruct (3B) | togethercomputer/RedPajama-INCITE-Instruct-3B-v1 | 2048 |
Together | RedPajama-INCITE Instruct (7B) | togethercomputer/RedPajama-INCITE-7B-Instruct | 2048 |
Together | StripedHyena Hessian (7B) | togethercomputer/StripedHyena-Hessian-7B | 32768 |
mistralai | Mistral (7B) | mistralai/Mistral-7B-v0.1 | 8192 |
mistralai | Mixtral-8x7B (46.7B) | mistralai/Mixtral-8x7B-v0.1 | 32768 |
Code Models
Use our Completions endpoint for Code Models.
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
Meta | Code Llama Python (70B) | codellama/CodeLlama-70b-Python-hf | 4096 |
Meta | Code Llama Python (34B) | codellama/CodeLlama-34b-Python-hf | 16384 |
Meta | Code Llama Python (13B) | codellama/CodeLlama-13b-Python-hf | 16384 |
Meta | Code Llama Python (7B) | codellama/CodeLlama-7b-Python-hf | 16384 |
Phind | Phind Code LLaMA v2 (34B) | Phind/Phind-CodeLlama-34B-v2 | 16384 |
WizardLM | WizardCoder Python v1.0 (34B) | WizardLM/WizardCoder-Python-34B-V1.0 | 8192 |
Image Models
Use our Completions endpoint for Image Models.
Organization | Model Name | Model String for API |
---|---|---|
Prompt Hero | Openjourney v4 | prompthero/openjourney |
Runway ML | Stable Diffusion 1.5 | runwayml/stable-diffusion-v1-5 |
SG161222 | Realistic Vision 3.0 | SG161222/Realistic_Vision_V3.0_VAE |
Stability AI | Stable Diffusion 2.1 | stabilityai/stable-diffusion-2-1 |
Stability AI | Stable Diffusion XL 1.0 | stabilityai/stable-diffusion-xl-base-1.0 |
Wavymulder | Analog Diffusion | wavymulder/Analog-Diffusion |
Moderation Models
Use our Completions endpoint to run a moderation model as a standalone classifier, or use it alongside any of the other models above as a filter to safeguard responses from 100+ models, by specifying the parameter "safety_model": "MODEL_API_STRING"
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
Meta | Llama Guard (7B) | Meta-Llama/Llama-Guard-7b | 4096 |
Meta | LLama Guard 3 (8B) | meta-llama/Meta-Llama-Guard-3-8B | 4096 |
Genomic Models
Use our Completions endpoint for Genomic Models.
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
Together | Evo-1 Base (8K) | togethercomputer/evo-1-8k-base | 8192* |
Together | Evo-1 Base (131K) | togethercomputer/evo-1-131k-base | 131072* |
* Evo-1 models can handle up to 4096 input tokens, while output sequences can extend up to the difference between the context length and the input sequence length.
Model Request
Don't see a model you want to use? Go to our contact page and add or upvote the model(s) you'd like to use on our API!
Dedicated Instances
Customizable on-demand deployable model instances, priced by hour hosted. All models in the serverless endpoints are available for hosting as private dedicated instances. Additionally, the below models are also available for hosting as private dedicated instances. Request an instance.
Chat Models
Organization | Model Name | Model String for API | |
---|---|---|---|
Databricks | Dolly v2 (12B) | databricks/dolly-v2-12b | |
Databricks | Dolly v2 (3B) | databricks/dolly-v2-3b | |
Databricks | Dolly v2 (7B) | databricks/dolly-v2-7b | |
DiscoResearch | DiscoLM Mixtral 8x7b (46.7B) | DiscoResearch/DiscoLM-mixtral-8x7b-v2 | |
HuggingFace | Zephyr-7B-ß | HuggingFaceH4/zephyr-7b-beta | |
HuggingFaceH4 | StarCoderChat Alpha (16B) | HuggingFaceH4/starchat-alpha | |
LAION | Open-Assistant StableLM SFT-7 (7B) | OpenAssistant/stablelm-7b-sft-v7-epoch-3 | |
LM Sys | Koala (13B) | togethercomputer/Koala-13B | |
LM Sys | Koala (7B) | togethercomputer/Koala-7B | |
LM Sys | Vicuna v1.3 (13B) | lmsys/vicuna-13b-v1.3 | |
LM Sys | Vicuna v1.3 (7B) | lmsys/vicuna-7b-v1.3 | |
LM Sys | Vicuna-FastChat-T5 (3B) | lmsys/fastchat-t5-3b-v1.0 | |
Mosaic ML | MPT-Chat (30B) | togethercomputer/mpt-30b-chat | |
Mosaic ML | MPT-Chat (7B) | togethercomputer/mpt-7b-chat | |
NousResearch | Nous Hermes LLaMA-2 (70B) | NousResearch/Nous-Hermes-Llama2-70b | |
Qwen | Qwen Chat (7B) | Qwen/Qwen-7B-Chat | |
Qwen | Qwen Chat (14B) | Qwen/Qwen-14B-Chat | |
TII | Falcon Instruct (7B) | tiiuae/falcon-7b-instruct | |
TII | Falcon Instruct (40B) | tiiuae/falcon-40b-instruct | |
Tim Dettmers | Guanaco (13B) | togethercomputer/guanaco-13b | |
Tim Dettmers | Guanaco (33B) | togethercomputer/guanaco-33b | |
Tim Dettmers | Guanaco (65B) | togethercomputer/guanaco-65b | |
Tim Dettmers | Guanaco (7B) | togethercomputer/guanaco-7b | |
Together | GPT-NeoXT-Chat-Base (20B) | togethercomputer/GPT-NeoXT-Chat-Base-20B |
Language Models
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
Defog | Sqlcoder (15B) | defog/sqlcoder | 8192 |
EleutherAI | GPT-J (6B) | EleutherAI/gpt-j-6b | 2048 |
EleutherAI | GPT-NeoX (20B) | EleutherAI/gpt-neox-20b | 2048 |
EleutherAI | Llemma (7B) | EleutherAI/llemma_7b | 4096 |
Flan T5 XL (3B) | google/flan-t5-xl | 512 | |
Flan T5 XXL (11B) | google/flan-t5-xxl | 512 | |
Meta | LLaMA (7B) | huggyllama/llama-7b | 2048 |
Meta | LLaMA (13B) | huggyllama/llama-13b | 2048 |
Meta | LLaMA (30B) | huggyllama/llama-30b | 2048 |
Meta | LLaMA (65B) | huggyllama/llama-65b | 2048 |
Meta | LLaMA (7B) | huggyllama/llama-7b | 2048 |
Mosaic ML | MPT (7B) | mosaicml/mpt-7b | 2048 |
Mosaic ML | MPT-Instruct (7B) | mosaicml/mpt-7b-instruct | 2048 |
Nous Research | Nous Hermes (13B) | NousResearch/Nous-Hermes-13b | 2048 |
Numbers Station | NSQL (6B) | NumbersStation/nsql-6B | 2048 |
Qwen | Qwen (7B) | Qwen/Qwen-7B | 2048 |
Qwen | Qwen (14B) | Qwen/Qwen-14B | 2048 |
Stability AI | StableLM-Base-Alpha (3B) | stabilityai/stablelm-base-alpha-3b | 4096 |
Stability AI | StableLM-Base-Alpha (7B) | stabilityai/stablelm-base-alpha-7b | 4096 |
TII | Falcon (7B) | tiiuae/falcon-7b | 2048 |
TII | Falcon (7B) | tiiuae/falcon-40b | 2048 |
Together | GPT-JT (6B) | togethercomputer/GPT-JT-6B-v1 | 2048 |
WizardLM | WizardLM v1.0 (70B) | WizardLM/WizardLM-70B-V1.0 | 4096 |
Code Models
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
BigCode | StarCoder (16B) | bigcode/starcoder | 8192 |
Meta | Code Llama (70B) | codellama/CodeLlama-70b-hf | 16384 |
Meta | Code Llama Python (70B) | codellama/CodeLlama-70b-Python-hf | 4096 |
Meta | Code Llama Instruct (70B) | codellama/CodeLlama-70b-Instruct-hf | 4096 |
Numbers Station | NSQL LLaMA-2 (7B) | NumbersStation/nsql-llama-2-7B | 4096 |
Phind | Phind Code LLaMA Python v1 (34B) | Phind/Phind-CodeLlama-34B-Python-v1 | 16384 |
Replit | Replit-Code-v1 (3B) | replit/replit-code-v1-3b | 2048 |
Salesforce | CodeGen2 (16B) | Salesforce/codegen2-16B | 2048 |
Salesforce | CodeGen2 (7B) | Salesforce/codegen2-7B | 2048 |
Updated 2 months ago