Code/Language Models

See which open-source language and code models we currently host, or learn how to configure and host your own.

Our Completions API has built-in support for many popular models we host via our serverless endpoints, as well as any model that you configure and host yourself using our dedicated GPU infrastructure.

When using one of our hosted serverless models, you'll be charged based on the amount of tokens you use in your queries. For dedicated models you configure and run yourself, you'll be charged per minute as long as your endpoint is running. You can start or stop your endpoint at any time using our online playground.

To learn more about the pricing for both our serverless and dedicated endoints, check out our pricing page.

Hosted models

Language Models

Use our Completions endpoint for Language Models.

OrganizationModel NameModel String for APIContext length
01.AI01-ai Yi Base (34B)zero-one-ai/Yi-34B4096
01.AI01-ai Yi Base (6B)zero-one-ai/Yi-6B4096
Mistral AIMixtral 8X22Bmistralai/Mixtral-8x22B65536
GoogleGemma (2B)google/gemma-2b8192
GoogleGemma (7B)google/gemma-7b8192
MetaLLaMA-2 (70B)meta-llama/Llama-2-70b-hf4096
MetaLLaMA-2 (13B)meta-llama/Llama-2-13b-hf4096
MetaLLaMA-2 (7B)meta-llama/Llama-2-7b-hf4096
MetaLLaMA-3 (8B)meta-llama/Llama-3-8b-hf8192
MetaLLaMA-3 (70B)meta-llama/Meta-Llama-3-70B8192
MicrosoftMicrosoft Phi-2microsoft/phi-22048
NexusflowNexusRaven (13B)Nexusflow/NexusRaven-V2-13B16384
QwenQwen 1.5 (0.5B)Qwen/Qwen1.5-0.5B32768
QwenQwen 1.5 (1.8B)Qwen/Qwen1.5-1.8B32768
QwenQwen 1.5 (4B)Qwen/Qwen1.5-4B32768
QwenQwen 1.5 (7B)Qwen/Qwen1.5-7B32768
QwenQwen 1.5 (14B)Qwen/Qwen1.5-14B32768
QwenQwen 1.5 (32B)Qwen/Qwen1.5-32B32768
QwenQwen 1.5 (72B)Qwen/Qwen1.5-72B4096
TogetherGPT-JT-Moderation (6B)togethercomputer/GPT-JT-Moderation-6B2048
TogetherLLaMA-2-32K (7B)togethercomputer/LLaMA-2-7B-32K32768
TogetherRedPajama-INCITE (3B)togethercomputer/RedPajama-INCITE-Base-3B-v12048
TogetherRedPajama-INCITE (7B)togethercomputer/RedPajama-INCITE-7B-Base2048
TogetherRedPajama-INCITE Instruct (3B)togethercomputer/RedPajama-INCITE-Instruct-3B-v12048
TogetherRedPajama-INCITE Instruct (7B)togethercomputer/RedPajama-INCITE-7B-Instruct2048
TogetherStripedHyena Hessian (7B)togethercomputer/StripedHyena-Hessian-7B32768
mistralaiMistral (7B)mistralai/Mistral-7B-v0.18192
mistralaiMixtral-8x7B (46.7B)mistralai/Mixtral-8x7B-v0.132768

Code Models

Use our Completions endpoint for Code Models.

OrganizationModel NameModel String for APIContext length
MetaCode Llama Python (70B)codellama/CodeLlama-70b-Python-hf4096
MetaCode Llama Python (34B)codellama/CodeLlama-34b-Python-hf16384
MetaCode Llama Python (13B)codellama/CodeLlama-13b-Python-hf16384
MetaCode Llama Python (7B)codellama/CodeLlama-7b-Python-hf16384
PhindPhind Code LLaMA v2 (34B)Phind/Phind-CodeLlama-34B-v216384
WizardLMWizardCoder Python v1.0 (34B)WizardLM/WizardCoder-Python-34B-V1.08192

Moderation Models

Use our Completions endpoint to run a moderation model as a standalone classifier, or use it alongside any of the other models above as a filter to safeguard responses from 100+ models, by specifying the parameter "safety_model": "MODEL_API_STRING"

OrganizationModel NameModel String for APIContext length
MetaLlama Guard (7B)Meta-Llama/Llama-Guard-7b4096

Genomic Models

Use our Completions endpoint for Genomic Models.

OrganizationModel NameModel String for APIContext length
TogetherEvo-1 Base (8K)togethercomputer/evo-1-8k-base8192*
TogetherEvo-1 Base (131K)togethercomputer/evo-1-131k-base131072*

* Evo-1 models can handle up to 4096 input tokens, while output sequences can extend up to the difference between the context length and the input sequence length.

Dedicated Instances

Language Models

OrganizationModel NameModel String for APIContext length
DefogSqlcoder (15B)defog/sqlcoder8192
EleutherAIGPT-J (6B)EleutherAI/gpt-j-6b2048
EleutherAIGPT-NeoX (20B)EleutherAI/gpt-neox-20b2048
EleutherAILlemma (7B)EleutherAI/llemma_7b4096
EleutherAIPythia (12B)EleutherAI/pythia-12b-v02048
EleutherAIPythia (1B)EleutherAI/pythia-1b-v02048
EleutherAIPythia (2.8B)EleutherAI/pythia-2.8b-v02048
EleutherAIPythia (6.9B)EleutherAI/pythia-6.9b2048
GoogleFlan T5 XL (3B)google/flan-t5-xl512
GoogleFlan T5 XXL (11B)google/flan-t5-xxl512
MetaLLaMA (7B)huggyllama/llama-7b2048
MetaLLaMA (13B)huggyllama/llama-13b2048
MetaLLaMA (30B)huggyllama/llama-30b2048
MetaLLaMA (65B)huggyllama/llama-65b2048
MetaLLaMA (7B)huggyllama/llama-7b2048
Mosaic MLMPT (7B)mosaicml/mpt-7b2048
Mosaic MLMPT-Instruct (7B)mosaicml/mpt-7b-instruct2048
Nous ResearchNous Hermes (13B)NousResearch/Nous-Hermes-13b2048
Numbers StationNSQL (6B)NumbersStation/nsql-6B2048
QwenQwen (7B)Qwen/Qwen-7B2048
QwenQwen (14B)Qwen/Qwen-14B2048
Stability AIStableLM-Base-Alpha (3B)stabilityai/stablelm-base-alpha-3b4096
Stability AIStableLM-Base-Alpha (7B)stabilityai/stablelm-base-alpha-7b4096
TIIFalcon (7B)tiiuae/falcon-7b2048
TIIFalcon (7B)tiiuae/falcon-40b2048
TogetherGPT-JT (6B)togethercomputer/GPT-JT-6B-v12048
WizardLMWizardLM v1.0 (70B)WizardLM/WizardLM-70B-V1.04096

Code Models

OrganizationModel NameModel String for APIContext length
BigCodeStarCoder (16B)bigcode/starcoder8192
MetaCode Llama (70B)codellama/CodeLlama-70b-hf16384
MetaCode Llama Python (70B)codellama/CodeLlama-70b-Python-hf4096
MetaCode Llama Instruct (70B)codellama/CodeLlama-70b-Instruct-hf4096
Numbers StationNSQL LLaMA-2 (7B)NumbersStation/nsql-llama-2-7B4096
PhindPhind Code LLaMA Python v1 (34B)Phind/Phind-CodeLlama-34B-Python-v116384
ReplitReplit-Code-v1 (3B)replit/replit-code-v1-3b2048
SalesforceCodeGen2 (16B)Salesforce/codegen2-16B2048
SalesforceCodeGen2 (7B)Salesforce/codegen2-7B2048

Request a model

Don't see a model you want to use?

Send us a Model Request here →