Introduction

Welcome to the Together AI docs! Together AI makes it easy to run or fine-tune leading open source models with only a few lines of code. We offer a variety of generative AI services:

Serverless Models - Use our API or playground to run dozens of models with pay as you go pricing.
Fine-Tuning - Fine-tune models on your own data in 5 minutes, then run the model for inference.
Dedicated Endpoints - Run models on your own private GPUs, starting at a one month minimum commitment.
GPU Clusters - If you’re interested in private, state of the art clusters with H100 GPUs, contact us.

Quickstart

See our full quickstart for how to get started with our API in 1 minute.

from together import Together

client = Together()

completion = client.chat.completions.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    messages=[
        {
            "role": "user",
            "content": "What are the top 3 things to do in New York?",
        }
    ],
)

print(completion.choices[0].message.content)

Which model should I use?

Together hosts many popular models via our serverless endpoints. For each of these, you’ll be charged based on the tokens you use and size of the model. Here are all the different types of models that we support:

Don’t see a model you want to use? Send us a request to add or upvote the model you’d love to see us add to our serverless infrastructure.

Together Cookbook

See the Together Cookbook – a collection of notebooks showcasing use cases of open-source models with Together AI. Examples include RAG (text + multimodal), Semantic Search, Rerankers, & Structured JSON extraction.

Example apps

We’ve built a number of full-stack open source example apps that you can reference. These are production-ready apps have over 500k users & 10k GitHub stars combined – all fully open source and built on Together AI.

LlamaCoder (GitHub) – an OSS Claude artifacts that is able to generate full React apps from a single prompt. Built on Llama 3.1 405B powered by Together inference.
BlinkShot (GitHub) – a realtime AI image generator using Flux Schnell on Together AI. Type in a prompt and images will get generated as you type.
TurboSeek (GitHub) – an AI search engine inspired by Perplexity. It uses a search API (Serper) along with an LLM (Mixtral) to be able to answer any questions.
Napkins.dev (GitHub) – a wireframe to app tool. It uses Llama 3.2 vision to read in screenshots and write code for them using Llama 3.1 405B.
PDFToChat (GitHub) – a site that lets you chat with your PDFs. Uses RAG with Together embeddings, inference with Llama 3, authentication with Clerk, & MongoDB/Pinecone for the vector database.
LlamaTutor (GitHub) – a personal tutor that can explain any topic at any education level by using a search API along with Llama 3.1.
NotesGPT (GitHub) – an AI note taker that converts your voice notes into organized summaries and clear action items using AI. Uses Together inference (Mixtral) with JSON mode.
CareerExplorer (GitHub) – a site that takes in a resume and suggests career paths based on your strengths and interests. Uses Llama 3 and demonstrates how to parse PDFs and chain multiple calls together.

Next steps

Check out our Quickstart to get started with our API in 1 minute
Explore our cookbook for Python recipes with Together AI
Explore our demos for full-stack open source example apps.
Check out the Together AI playground to try out different models.
See our integrations with leading LLM frameworks.

Getting Started

Inference

Training

Capabilities

Guides

❓ Frequently Asked Questions

Quickstart

Which model should I use?

Together Cookbook

Example apps

Next steps

Resources

Getting Started

Inference

Training

Capabilities

Guides

❓ Frequently Asked Questions

​Quickstart

​Which model should I use?

​Together Cookbook

​Example apps

​Next steps

​Resources

Quickstart

Which model should I use?

Together Cookbook

Example apps

Next steps

Resources