Introduction
Introduction to Together AI and all its services.
👋 Welcome to the Together AI docs! Together AI makes it easy to run or fine-tune leading open source models with only a few lines of code. We offer a variety of generative AI services:
- Serverless models - Use the API or playground to evaluate 100+ models run out of the box with our Inference Engine. You only pay per token/image.
- On-demand dedicated endpoints - Run models on your own private GPU, with a pay-per-second usage model. Start dedicated endpoints here and review our docs.
- Monthly reserved dedicated endpoints - Larger capacity reserved instances starting at a one month minimum, including VPC options for large deployments. Contact us.
- Fine-Tuning - Fine-tune with a few commands and deploy your fine-tuned model for inference.
- GPU Clusters - If you're interested in private, state of the art clusters with A100 or H100 GPUs, contact us.
Quickstart
See our full quickstart for how to get started with our API in 1 minute.
from together import Together
client = Together()
completion = client.chat.completions.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
messages=[{"role": "user", "content": "What are the top 3 things to do in New York?"}],
)
import Together from 'together-ai';
const together = new Together()
const completion = await together.chat.completions.create({
model: 'meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo',
messages: [{ role: 'user', content: 'Top 3 things to do in New York?' },],
});
curl -X POST "https://api.together.xyz/v1/chat/completions" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
"messages": [
{"role": "user", "content": "What are some fun things to do in New York?"}
]
}'
Cookbook
See the Together Cookbook – a collection of notebooks showcasing use cases of open-source models with Together AI. Examples include RAG (text + multimodal), Semantic Search, Rerankers, & Structured JSON extraction.
Example apps
We've built a number of full-stack open source example apps that you can reference. These are production-ready apps have over 200k users & 6.5k GitHub stars combined – all fully open source and built on Together AI.
- LlamaCoder (GitHub) – an OSS Claude artifacts that is able to generate full React apps from a single prompt. Built on Llama 3.1 405B powered by Together inference.
- BlinkShot (GitHub) – a realtime AI image generator using Flux Schnell on Together AI. Type in a prompt and images will get generated as you type.
- TurboSeek (GitHub) – an AI search engine inspired by Perplexity. It uses a search API (Serper) along with an LLM (Mixtral) to be able to answer any questions.
- Napkins.dev (GitHub) – a wireframe to app tool. It uses Llama 3.2 vision to read in screenshots and write code for them using Llama 3.1 405B.
- PDFToChat (GitHub) – a site that lets you chat with your PDFs. Uses RAG with Together embeddings, inference with Llama 3, authentication with Clerk, & MongoDB/Pinecone for the vector database.
- LlamaTutor (GitHub) – a personal tutor that can explain any topic at any education level by using a search API along with Llama 3.1.
- NotesGPT (GitHub) – an AI note taker that converts your voice notes into organized
summaries and clear action items using AI. Uses Together inference (Mixtral) with JSON mode. - CareerExplorer (GitHub) – a site that takes in a resume and suggests career paths based on your strengths and interests. Uses Llama 3 and demonstrates how to parse PDFs and chain multiple calls together.
Which model should I use?
Together hosts many popular models via our serverless endpoints. You can also use our dedicated GPU infrastructure to configure and host your own model.
When using one of our hosted serverless models, you'll be charged based on the amount of tokens you use in your queries. For dedicated models you configure and run yourself, you'll be charged per minute as long as your endpoint is running. You can start or stop your endpoint at any time using our online playground.
To learn more about the pricing for both our serverless and dedicated endpoints, visit our pricing page.
Check out these pages to see our current list of available models:
- Chat models
- Vision models
- Language and code models
- Image models
- Embedding models
- Rerank Models
- Fine-tuning models
Don't see a model you want to use? Send us a request to add or upvote the model you'd love to see us add to our serverless infrastructure.
Next steps
- Check out the Together AI playground to try out different models.
- Learn how to stream responses back to your applications.
- Explore our examples to learn about various use cases.
- See our integrations with leading LLM frameworks.
Resources
Updated 25 days ago