End-to-end example
The script below builds a tiny RAG pipeline with no external dependencies beyond the Together SDK. It embeds a small corpus, stores the vectors in memory, retrieves the top matches by cosine similarity, and passes them into a chat completion as context.Python
intfloat/multilingual-e5-large-instruct), persist vectors in a database, and add a reranking stage to improve precision before generation.
Add a rerank stage
A reranker is a second-stage model that re-scores the top results from your vector search using the query and document together. Rerank improves precision when the top of your similarity ranking is noisy or when you only have room for a few documents in the prompt. See the Rerank guide for details.Rerank models like
mixedbread-ai/mxbai-rerank-large-v2 are only available on dedicated endpoints. Spin one up before running the snippet below, then point RERANK_MODEL at it.Python
Vector store integrations
The in-memory store above is fine for a few hundred documents. For larger corpora, persist your vectors in a dedicated vector database. Together embeddings work with any store that accepts raw float vectors.Pinecone
Pinecone is a managed vector database with a serverless tier. Embed with Together, then upsert and query through the Pinecone client.Python
MongoDB Atlas Vector Search
MongoDB Atlas adds vector search on top of a regular Mongo collection. Store the embedding alongside the document and define a vector index on the embedding field.Python
$vectorSearch in an aggregation pipeline. The full walkthrough is in the MongoDB + Together AI tutorial.
Pixeltable
Pixeltable is a declarative table for unstructured data. It can call Together embeddings as a column expression, so chunking, embedding, and indexing all live in your table definition.Python
Other frameworks
Together is also a first-class provider in the major LLM application frameworks:- LangChain:
langchain-togethershipsTogetherEmbeddingsand aChatTogethermodel. See the LangChain + Together RAG tutorial. - LlamaIndex:
TogetherEmbeddingandTogetherLLMplug straight into aVectorStoreIndex. See the LlamaIndex + Together RAG tutorial.
Beyond the basics
Once your pipeline is working, the next questions are usually about chunking strategy, retrieval quality, and evaluation. Start here:- Embeddings. Available models, batch shapes, and the
client.embeddings.createreference. - Rerank. When to add a reranker, supported models, and JSON-rank-fields mode.
- Quickstart: RAG. End-to-end Paul Graham essay example with chunking, embedding, retrieval, rerank, and generation.
- Building a RAG workflow. Longer guide that walks through document loading, chunking, and prompt construction.
- How to implement contextual RAG from Anthropic. Apply Anthropic’s contextual retrieval technique using Together embeddings and rerank.
- How to improve search with rerankers. Side-by-side comparison of vector search alone versus vector search plus rerank.