Serverless Endpoints

Pre-configured instances of popular models hosted for free, priced per 1M tokens used. The below models are available through our inference API as serverless endpoints.

Request for a model to be added to serverless endpoints or for dedicated instance or capacity for these models.

Chat Models

Use our Chat Completions endpoint for Chat Models.

Organization	Model Name	Model String for API	Context length
01.AI	01-ai Yi Chat (34B)	zero-one-ai/Yi-34B-Chat	4096
Austism	Chronos Hermes (13B)	Austism/chronos-hermes-13b	2048
cognitivecomputations	Dolphin 2.5 Mixtral 8x7b	cognitivecomputations/dolphin-2.5-mixtral-8x7b	32768
databricks	DBRX Instruct	databricks/dbrx-instruct	32768
DeepSeek	Deepseek Coder Instruct (33B)	deepseek-ai/deepseek-coder-33b-instruct	16384
DeepSeek	DeepSeek LLM Chat (67B)	deepseek-ai/deepseek-llm-67b-chat	4096
garage-bAInd	Platypus2 Instruct (70B)	garage-bAInd/Platypus2-70B-instruct	4096
Google	Gemma Instruct (2B)	google/gemma-2b-it	8192
Google	Gemma Instruct (7B)	google/gemma-7b-it	8192
Gryphe	MythoMax-L2 (13B)	Gryphe/MythoMax-L2-13b	4096
Gryphe	MythoMax-L2 Lite (13B)	Gryphe/MythoMax-L2-13b-Lite	4096
LM Sys	Vicuna v1.5 (13B)	lmsys/vicuna-13b-v1.5	4096
LM Sys	Vicuna v1.5 (7B)	lmsys/vicuna-7b-v1.5	4096
Meta	Code Llama Instruct (13B)	codellama/CodeLlama-13b-Instruct-hf	16384
Meta	Code Llama Instruct (34B)	codellama/CodeLlama-34b-Instruct-hf	16384
Meta	Code Llama Instruct (70B)	codellama/CodeLlama-70b-Instruct-hf	4096
Meta	Code Llama Instruct (7B)	codellama/CodeLlama-7b-Instruct-hf	16384
Meta	LLaMA-2 Chat (70B)	meta-llama/Llama-2-70b-chat-hf	4096
Meta	LLaMA-2 Chat (13B)	meta-llama/Llama-2-13b-chat-hf	4096
Meta	LLaMA-2 Chat (7B)	meta-llama/Llama-2-7b-chat-hf	4096
Meta	LLaMA-3 Chat (8B)	meta-llama/Llama-3-8b-chat-hf	8192
Meta	LLaMA-3 Chat (70B)	meta-llama/Llama-3-70b-chat-hf	8192
Meta	LLaMA-3 Chat (8B) Turbo	meta-llama/Meta-Llama-3-8B-Instruct-Turbo	8192
Meta	LLaMA-3 Chat (70B) Turbo	meta-llama/Meta-Llama-3-70B-Instruct-Turbo	8192
Meta	Llama 3.1 8B Instruct Turbo	meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo	8192
Meta	Llama 3.1 70B Instruct Turbo	meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo	8192
Meta	Llama 3.1 405B Instruct Turbo	meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo	4096
mistralai	Mistral (7B) Instruct	mistralai/Mistral-7B-Instruct-v0.1	8192
mistralai	Mistral (7B) Instruct v0.2	mistralai/Mistral-7B-Instruct-v0.2	32768
mistralai	Mistral (7B) Instruct v0.3	mistralai/Mistral-7B-Instruct-v0.3	32768
mistralai	Mixtral-8x7B Instruct (46.7B)	mistralai/Mixtral-8x7B-Instruct-v0.1	32768
mistralai	Mixtral-8x22B Instruct (141B)	mistralai/Mixtral-8x22B-Instruct-v0.1	65536
NousResearch	Nous Capybara v1.9 (7B)	NousResearch/Nous-Capybara-7B-V1p9	8192
NousResearch	Nous Hermes 2 - Mistral DPO (7B)	NousResearch/Nous-Hermes-2-Mistral-7B-DPO	32768
NousResearch	Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B)	NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO	32768
NousResearch	Nous Hermes 2 - Mixtral 8x7B-SFT (46.7B)	NousResearch/Nous-Hermes-2-Mixtral-8x7B-SFT	32768
NousResearch	Nous Hermes LLaMA-2 (7B)	NousResearch/Nous-Hermes-llama-2-7b	4096
NousResearch	Nous Hermes Llama-2 (13B)	NousResearch/Nous-Hermes-Llama2-13b	4096
NousResearch	Nous Hermes-2 Yi (34B)	NousResearch/Nous-Hermes-2-Yi-34B	4096
OpenChat	OpenChat 3.5 (7B)	openchat/openchat-3.5-1210	8192
OpenOrca	OpenOrca Mistral (7B) 8K	Open-Orca/Mistral-7B-OpenOrca	8192
Qwen	Qwen 1.5 Chat (0.5B)	Qwen/Qwen1.5-0.5B-Chat	32768
Qwen	Qwen 1.5 Chat (1.8B)	Qwen/Qwen1.5-1.8B-Chat	32768
Qwen	Qwen 1.5 Chat (4B)	Qwen/Qwen1.5-4B-Chat	32768
Qwen	Qwen 1.5 Chat (7B)	Qwen/Qwen1.5-7B-Chat	32768
Qwen	Qwen 1.5 Chat (14B)	Qwen/Qwen1.5-14B-Chat	32768
Qwen	Qwen 1.5 Chat (32B)	Qwen/Qwen1.5-32B-Chat	32768
Qwen	Qwen 1.5 Chat (72B)	Qwen/Qwen1.5-72B-Chat	32768
Qwen	Qwen 1.5 Chat (110B)	Qwen/Qwen1.5-110B-Chat	32768
Qwen	Qwen 2 Instruct (72B)	Qwen/Qwen2-72B-Instruct	32768
Snorkel AI	Snorkel Mistral PairRM DPO (7B)	snorkelai/Snorkel-Mistral-PairRM-DPO	32768
Snowflake	Snowflake Arctic Instruct	Snowflake/snowflake-arctic-instruct	4096
Stanford	Alpaca (7B)	togethercomputer/alpaca-7b	2048
Teknium	OpenHermes-2-Mistral (7B)	teknium/OpenHermes-2-Mistral-7B	8192
Teknium	OpenHermes-2.5-Mistral (7B)	teknium/OpenHermes-2p5-Mistral-7B	8192
Together	LLaMA-2-7B-32K-Instruct (7B)	togethercomputer/Llama-2-7B-32K-Instruct	32768
Together	RedPajama-INCITE Chat (3B)	togethercomputer/RedPajama-INCITE-Chat-3B-v1	2048
Together	RedPajama-INCITE Chat (7B)	togethercomputer/RedPajama-INCITE-7B-Chat	2048
Together	StripedHyena Nous (7B)	togethercomputer/StripedHyena-Nous-7B	32768
Undi95	ReMM SLERP L2 (13B)	Undi95/ReMM-SLERP-L2-13B	4096
Undi95	Toppy M (7B)	Undi95/Toppy-M-7B	4096
WizardLM	WizardLM v1.2 (13B)	WizardLM/WizardLM-13B-V1.2	4096
upstage	Upstage SOLAR Instruct v1 (11B)	upstage/SOLAR-10.7B-Instruct-v1.0	4096

Language Models

Use our Completions endpoint for Language Models.

Organization	Model Name	Model String for API	Context length
01.AI	01-ai Yi Base (34B)	zero-one-ai/Yi-34B	4096
01.AI	01-ai Yi Base (6B)	zero-one-ai/Yi-6B	4096
Mistral AI	Mixtral 8X22B	mistralai/Mixtral-8x22B	65536
Google	Gemma (2B)	google/gemma-2b	8192
Google	Gemma (7B)	google/gemma-7b	8192
Meta	LLaMA-2 (70B)	meta-llama/Llama-2-70b-hf	4096
Meta	LLaMA-2 (13B)	meta-llama/Llama-2-13b-hf	4096
Meta	LLaMA-2 (7B)	meta-llama/Llama-2-7b-hf	4096
Meta	LLaMA-3 (8B)	meta-llama/Llama-3-8b-hf	8192
Meta	LLaMA-3 (70B)	meta-llama/Meta-Llama-3-70B	8192
Microsoft	Microsoft Phi-2	microsoft/phi-2	2048
Nexusflow	NexusRaven (13B)	Nexusflow/NexusRaven-V2-13B	16384
Qwen	Qwen 1.5 (0.5B)	Qwen/Qwen1.5-0.5B	32768
Qwen	Qwen 1.5 (1.8B)	Qwen/Qwen1.5-1.8B	32768
Qwen	Qwen 1.5 (4B)	Qwen/Qwen1.5-4B	32768
Qwen	Qwen 1.5 (7B)	Qwen/Qwen1.5-7B	32768
Qwen	Qwen 1.5 (14B)	Qwen/Qwen1.5-14B	32768
Qwen	Qwen 1.5 (32B)	Qwen/Qwen1.5-32B	32768
Qwen	Qwen 1.5 (72B)	Qwen/Qwen1.5-72B	4096
Together	GPT-JT-Moderation (6B)	togethercomputer/GPT-JT-Moderation-6B	2048
Together	LLaMA-2-32K (7B)	togethercomputer/LLaMA-2-7B-32K	32768
Together	RedPajama-INCITE (3B)	togethercomputer/RedPajama-INCITE-Base-3B-v1	2048
Together	RedPajama-INCITE (7B)	togethercomputer/RedPajama-INCITE-7B-Base	2048
Together	RedPajama-INCITE Instruct (3B)	togethercomputer/RedPajama-INCITE-Instruct-3B-v1	2048
Together	RedPajama-INCITE Instruct (7B)	togethercomputer/RedPajama-INCITE-7B-Instruct	2048
Together	StripedHyena Hessian (7B)	togethercomputer/StripedHyena-Hessian-7B	32768
mistralai	Mistral (7B)	mistralai/Mistral-7B-v0.1	8192
mistralai	Mixtral-8x7B (46.7B)	mistralai/Mixtral-8x7B-v0.1	32768

Code Models

Use our Completions endpoint for Code Models.

Organization	Model Name	Model String for API	Context length
Meta	Code Llama Python (70B)	codellama/CodeLlama-70b-Python-hf	4096
Meta	Code Llama Python (34B)	codellama/CodeLlama-34b-Python-hf	16384
Meta	Code Llama Python (13B)	codellama/CodeLlama-13b-Python-hf	16384
Meta	Code Llama Python (7B)	codellama/CodeLlama-7b-Python-hf	16384
Phind	Phind Code LLaMA v2 (34B)	Phind/Phind-CodeLlama-34B-v2	16384
WizardLM	WizardCoder Python v1.0 (34B)	WizardLM/WizardCoder-Python-34B-V1.0	8192

Image Models

Use our Completions endpoint for Image Models.

Organization	Model Name	Model String for API
Prompt Hero	Openjourney v4	prompthero/openjourney
Runway ML	Stable Diffusion 1.5	runwayml/stable-diffusion-v1-5
SG161222	Realistic Vision 3.0	SG161222/Realistic_Vision_V3.0_VAE
Stability AI	Stable Diffusion 2.1	stabilityai/stable-diffusion-2-1
Stability AI	Stable Diffusion XL 1.0	stabilityai/stable-diffusion-xl-base-1.0
Wavymulder	Analog Diffusion	wavymulder/Analog-Diffusion

Moderation Models

Use our Completions endpoint to run a moderation model as a standalone classifier, or use it alongside any of the other models above as a filter to safeguard responses from 100+ models, by specifying the parameter "safety_model": "MODEL_API_STRING"

Organization	Model Name	Model String for API	Context length
Meta	Llama Guard (7B)	Meta-Llama/Llama-Guard-7b	4096
Meta	LLama Guard 3 (8B)	meta-llama/Meta-Llama-Guard-3-8B	4096

Genomic Models

Use our Completions endpoint for Genomic Models.

Organization	Model Name	Model String for API	Context length
Together	Evo-1 Base (8K)	togethercomputer/evo-1-8k-base	8192*
Together	Evo-1 Base (131K)	togethercomputer/evo-1-131k-base	131072*

* Evo-1 models can handle up to 4096 input tokens, while output sequences can extend up to the difference between the context length and the input sequence length.

Model Request

Don't see a model you want to use? Go to our contact page and add or upvote the model(s) you'd like to use on our API!

Dedicated Instances

Customizable on-demand deployable model instances, priced by hour hosted. All models in the serverless endpoints are available for hosting as private dedicated instances. Additionally, the below models are also available for hosting as private dedicated instances. Request an instance.

Chat Models

Organization	Model Name	Model String for API
Databricks	Dolly v2 (12B)	databricks/dolly-v2-12b
Databricks	Dolly v2 (3B)	databricks/dolly-v2-3b
Databricks	Dolly v2 (7B)	databricks/dolly-v2-7b
DiscoResearch	DiscoLM Mixtral 8x7b (46.7B)	DiscoResearch/DiscoLM-mixtral-8x7b-v2
HuggingFace	Zephyr-7B-ß	HuggingFaceH4/zephyr-7b-beta
HuggingFaceH4	StarCoderChat Alpha (16B)	HuggingFaceH4/starchat-alpha
LAION	Open-Assistant StableLM SFT-7 (7B)	OpenAssistant/stablelm-7b-sft-v7-epoch-3
LM Sys	Koala (13B)	togethercomputer/Koala-13B
LM Sys	Koala (7B)	togethercomputer/Koala-7B
LM Sys	Vicuna v1.3 (13B)	lmsys/vicuna-13b-v1.3
LM Sys	Vicuna v1.3 (7B)	lmsys/vicuna-7b-v1.3
LM Sys	Vicuna-FastChat-T5 (3B)	lmsys/fastchat-t5-3b-v1.0
Mosaic ML	MPT-Chat (30B)	togethercomputer/mpt-30b-chat
Mosaic ML	MPT-Chat (7B)	togethercomputer/mpt-7b-chat
NousResearch	Nous Hermes LLaMA-2 (70B)	NousResearch/Nous-Hermes-Llama2-70b
Qwen	Qwen Chat (7B)	Qwen/Qwen-7B-Chat
Qwen	Qwen Chat (14B)	Qwen/Qwen-14B-Chat
TII	Falcon Instruct (7B)	tiiuae/falcon-7b-instruct
TII	Falcon Instruct (40B)	tiiuae/falcon-40b-instruct
Tim Dettmers	Guanaco (13B)	togethercomputer/guanaco-13b
Tim Dettmers	Guanaco (33B)	togethercomputer/guanaco-33b
Tim Dettmers	Guanaco (65B)	togethercomputer/guanaco-65b
Tim Dettmers	Guanaco (7B)	togethercomputer/guanaco-7b
Together	GPT-NeoXT-Chat-Base (20B)	togethercomputer/GPT-NeoXT-Chat-Base-20B

Language Models

Organization	Model Name	Model String for API	Context length
Defog	Sqlcoder (15B)	defog/sqlcoder	8192
EleutherAI	GPT-J (6B)	EleutherAI/gpt-j-6b	2048
EleutherAI	GPT-NeoX (20B)	EleutherAI/gpt-neox-20b	2048
EleutherAI	Llemma (7B)	EleutherAI/llemma_7b	4096
Google	Flan T5 XL (3B)	google/flan-t5-xl	512
Google	Flan T5 XXL (11B)	google/flan-t5-xxl	512
Meta	LLaMA (7B)	huggyllama/llama-7b	2048
Meta	LLaMA (13B)	huggyllama/llama-13b	2048
Meta	LLaMA (30B)	huggyllama/llama-30b	2048
Meta	LLaMA (65B)	huggyllama/llama-65b	2048
Meta	LLaMA (7B)	huggyllama/llama-7b	2048
Mosaic ML	MPT (7B)	mosaicml/mpt-7b	2048
Mosaic ML	MPT-Instruct (7B)	mosaicml/mpt-7b-instruct	2048
Nous Research	Nous Hermes (13B)	NousResearch/Nous-Hermes-13b	2048
Numbers Station	NSQL (6B)	NumbersStation/nsql-6B	2048
Qwen	Qwen (7B)	Qwen/Qwen-7B	2048
Qwen	Qwen (14B)	Qwen/Qwen-14B	2048
Stability AI	StableLM-Base-Alpha (3B)	stabilityai/stablelm-base-alpha-3b	4096
Stability AI	StableLM-Base-Alpha (7B)	stabilityai/stablelm-base-alpha-7b	4096
TII	Falcon (7B)	tiiuae/falcon-7b	2048
TII	Falcon (7B)	tiiuae/falcon-40b	2048
Together	GPT-JT (6B)	togethercomputer/GPT-JT-6B-v1	2048
WizardLM	WizardLM v1.0 (70B)	WizardLM/WizardLM-70B-V1.0	4096

Code Models

Organization	Model Name	Model String for API	Context length
BigCode	StarCoder (16B)	bigcode/starcoder	8192
Meta	Code Llama (70B)	codellama/CodeLlama-70b-hf	16384
Meta	Code Llama Python (70B)	codellama/CodeLlama-70b-Python-hf	4096
Meta	Code Llama Instruct (70B)	codellama/CodeLlama-70b-Instruct-hf	4096
Numbers Station	NSQL LLaMA-2 (7B)	NumbersStation/nsql-llama-2-7B	4096
Phind	Phind Code LLaMA Python v1 (34B)	Phind/Phind-CodeLlama-34B-Python-v1	16384
Replit	Replit-Code-v1 (3B)	replit/replit-code-v1-3b	2048
Salesforce	CodeGen2 (16B)	Salesforce/codegen2-16B	2048
Salesforce	CodeGen2 (7B)	Salesforce/codegen2-7B	2048

Request a model