# Together AI docs > Documentation for the Together AI platform for inference and training. ## Docs - [Manage your account](https://docs.together.ai/docs/account-management.md): Sign up for Together AI, get your API key, and manage your account settings - [Agent integrations](https://docs.together.ai/docs/agent-integrations.md): Using OSS agent frameworks with Together AI - [Coding agent setup](https://docs.together.ai/docs/agent-skills.md): Make your AI coding agent Together-AI-aware with ready-made skills for code generation and an MCP server for live docs lookup. - [Agno](https://docs.together.ai/docs/agno.md): Using Agno with Together AI - [LLM evaluations](https://docs.together.ai/docs/ai-evaluations.md): Learn how to run LLM-as-a-Judge evaluations - [AI evaluations UI](https://docs.together.ai/docs/ai-evaluations-ui.md): Guide to using the AI Evaluations UI for model assessment - [Build an AI search engine (OSS Perplexity clone)](https://docs.together.ai/docs/ai-search-engine.md): How to build an AI search engine inspired by Perplexity with Next.js and Together AI - [Build an interactive AI tutor with Llama 3.1](https://docs.together.ai/docs/ai-tutor.md): Learn we built LlamaTutor from scratch – an open source AI tutor with 90k users. - [Authentication](https://docs.together.ai/docs/api-keys-authentication.md): Create, manage, and authenticate with project-scoped API keys. - [AutoGen(AG2)](https://docs.together.ai/docs/autogen.md): Using AutoGen(AG2) with Together AI - [Credits](https://docs.together.ai/docs/billing-credits.md): Understanding credits and billing basics on Together AI. - [Payment methods & invoices](https://docs.together.ai/docs/billing-payment-methods.md): Managing payment cards, ACH transfers, viewing invoices, and updating billing details. - [Billing troubleshooting](https://docs.together.ai/docs/billing-troubleshooting.md): Resolving payment issues, understanding charges, and managing billing problems. - [Usage limits & analytics](https://docs.together.ai/docs/billing-usage-limits.md): Understanding rate limits, model access, and cost analytics on Together AI. - [Build a RAG workflow](https://docs.together.ai/docs/building-a-rag-workflow.md): Learn how to build a RAG workflow with Together AI embedding and chat endpoints! - [Changelog](https://docs.together.ai/docs/changelog.md) - [Cluster storage](https://docs.together.ai/docs/cluster-storage.md): Understand storage types, persistence, and best practices for GPU clusters - [Composio](https://docs.together.ai/docs/composio.md): Using Composio With Together AI - [Conditional Workflow](https://docs.together.ai/docs/conditional-workflows.md): Adapt to different tasks by conditionally navigating to various LLMs and tools. - [Quickstart](https://docs.together.ai/docs/containers-quickstart.md): Deploy your first container in 20 minutes. - [CrewAI](https://docs.together.ai/docs/crewai.md): Using CrewAI with Together - [Build an AI data analyst](https://docs.together.ai/docs/data-analyst-agent.md): Learn how to use code interpreter to build an AI data analyst with E2B and Together AI. - [Overview](https://docs.together.ai/docs/dedicated-container-inference.md): Deploy custom containers on Together's managed GPU infrastructure with automatic scaling, job queues, and built-in observability. - [Upload a LoRA adapter](https://docs.together.ai/docs/dedicated-endpoints/adapter.md): Upload a custom LoRA adapter from Hugging Face or S3 and serve it on a dedicated endpoint. - [Upload a model](https://docs.together.ai/docs/dedicated-endpoints/custom-models.md): Upload a custom or fine-tuned model from Hugging Face or S3 and serve it on a dedicated endpoint. - [Manage dedicated endpoints](https://docs.together.ai/docs/dedicated-endpoints/manage.md): Create, start, stop, restart, list, update, and delete dedicated endpoints via the web UI or the Together API. - [Available models](https://docs.together.ai/docs/dedicated-endpoints/models.md): View the models you can deploy to dedicated endpoints. - [Overview](https://docs.together.ai/docs/dedicated-endpoints/overview.md): Reserved-hardware inference endpoints with predictable performance, no shared rate limits, and per-endpoint configuration. - [Quickstart](https://docs.together.ai/docs/dedicated-endpoints/quickstart.md): Pick a model, deploy a dedicated endpoint with one CLI command, and send your first request in under 5 minutes. - [Scaling](https://docs.together.ai/docs/dedicated-endpoints/scaling.md): How dedicated endpoints scale, how that affects cost, and when to choose vertical vs. horizontal scaling. - [Endpoint settings](https://docs.together.ai/docs/dedicated-endpoints/settings.md): Configure replica count, hardware, decoding optimizations, and prompt caching on a dedicated endpoint. - [Image generation with Flux2](https://docs.together.ai/docs/dedicated_containers_image.md): Deploy a Flux2 image generation model on Together's managed GPU infrastructure using Dedicated Containers. - [Video generation with Wan 2.1](https://docs.together.ai/docs/dedicated_containers_video.md): Deploy a multi-GPU video generation model on Together's managed GPU infrastructure using Dedicated Containers. - [DeepSeek V3.1 quickstart](https://docs.together.ai/docs/deepseek-3-1-quickstart.md): How to get started with DeepSeek V3.1 - [DeepSeek R1 quickstart](https://docs.together.ai/docs/deepseek-r1.md): How to get the most out of reasoning models like DeepSeek-R1. - [DeepSeek V4 Pro quickstart](https://docs.together.ai/docs/deepseek-v4-quickstart.md): Call DeepSeek V4 Pro on Together for hybrid reasoning, long-context, and tool-using workloads. - [Deploying a fine-tuned model](https://docs.together.ai/docs/deploying-a-fine-tuned-model.md): Once your fine-tune job completes, you should see your new model in [your models dashboard](https://api.together.ai/models). - [Jig CLI](https://docs.together.ai/docs/deployments-jig.md): Build, push, and deploy containers to Together's managed GPU infrastructure. - [Queue API](https://docs.together.ai/docs/deployments-queue.md): Submit, monitor, and manage asynchronous jobs for your Dedicated Container deployments. - [Sprocket SDK](https://docs.together.ai/docs/deployments-sprocket.md): A Python SDK for building inference workers that support both synchronous and asynchronous requests via Together's platform. - [Deprecations](https://docs.together.ai/docs/deprecations.md): Together AI's model lifecycle policy, including upgrades, redirects, and deprecation schedules. - [DSPy](https://docs.together.ai/docs/dspy.md): Using DSPy with Together AI - [Error codes](https://docs.together.ai/docs/error-codes.md): An overview on error status codes, causes, and quick fix solutions - [Supported models](https://docs.together.ai/docs/evaluations-supported-models.md): Supported models for Evaluations - [Fine-tuning BYOM](https://docs.together.ai/docs/fine-tuning-byom.md): Bring Your Own Model: Fine-tune Custom Models from the Hugging Face Hub - [Data preparation](https://docs.together.ai/docs/fine-tuning-data-preparation.md): Together Fine-tuning API accepts two data formats for training dataset files: text data and tokenized data (in the form of Parquet files). Below, you can learn about different types of those formats and the scenarios in which they can be most useful. - [Fine-tuning FAQs](https://docs.together.ai/docs/fine-tuning-faqs.md) - [Function calling fine-tuning](https://docs.together.ai/docs/fine-tuning-function-calling.md): Learn how to fine-tune models with function calling capabilities using Together AI. - [LoRA supported modules](https://docs.together.ai/docs/fine-tuning-lora-supported-modules.md): Supported target modules for LoRA fine-tuning by model - [Supported models](https://docs.together.ai/docs/fine-tuning-models.md): A list of all the models available for fine-tuning. - [Overview](https://docs.together.ai/docs/fine-tuning-overview.md): Adapt a base model to your task or domain by training it on your data. - [Pricing](https://docs.together.ai/docs/fine-tuning-pricing.md): Fine-tuning pricing at Together AI is based on the total number of tokens processed during your job. - [Fine-tuning quickstart](https://docs.together.ai/docs/fine-tuning-quickstart.md): Fine-tune a model end-to-end: prepare data, launch a job, and evaluate the result. - [Reasoning fine-tuning](https://docs.together.ai/docs/fine-tuning-reasoning.md): Learn how to fine-tune reasoning models with chain-of-thought data using Together AI. - [Vision-language fine-tuning](https://docs.together.ai/docs/fine-tuning-vlm.md): Learn how to fine-tune Vision-Language Models (VLMs) on image+text data using Together AI. - [GLM-5 quickstart](https://docs.together.ai/docs/glm-5-quickstart.md): How to get the most out of GLM-5 for reasoning and agentic tasks. - [GPT-OSS quickstart](https://docs.together.ai/docs/gpt-oss.md): Get started with OpenAI's GPT-OSS, open-source reasoning model duo. - [API & integrations](https://docs.together.ai/docs/gpu-clusters-api.md): Manage clusters programmatically with CLI, REST API, Terraform, and third-party tools - [Billing & pricing](https://docs.together.ai/docs/gpu-clusters-billing.md): Understand billing, pricing, and lifecycle policies for GPU Clusters - [Cluster management](https://docs.together.ai/docs/gpu-clusters-management.md): Manage, scale, and operate your GPU clusters - [Overview](https://docs.together.ai/docs/gpu-clusters-overview.md): High-performance GPU clusters for training, fine-tuning, and large-scale AI workloads - [Quickstart](https://docs.together.ai/docs/gpu-clusters-quickstart.md): Get started with GPU Clusters in minutes. - [Overview](https://docs.together.ai/docs/guides.md): Quickstarts and step-by-step guides for building with Together AI. - [Health checks and node repair](https://docs.together.ai/docs/health-checks.md): Proactively validate GPU node health and trigger repair actions for issues - [How to build a Lovable clone with Kimi K2](https://docs.together.ai/docs/how-to-build-a-lovable-clone-with-kimi-k2.md): Learn how to build a full-stack Next.js app that can generate React apps with a single prompt. - [Build coding agents](https://docs.together.ai/docs/how-to-build-coding-agents.md): How to build your own simple code editing agent from scratch in 400 lines of code! - [Build a Phone Voice Agent with Together AI](https://docs.together.ai/docs/how-to-build-phone-voice-agent.md): Build a real-time phone voice agent from scratch with Twilio Media Streams, Together AI realtime STT, chat completions, realtime TTS, and local voice activity detection. - [How to build an AI audio transcription app with Whisper](https://docs.together.ai/docs/how-to-build-real-time-audio-transcription-app.md): Learn how to build a real-time AI audio transcription app with Whisper, Next.js, and Together AI. - [Implement contextual RAG from Anthropic](https://docs.together.ai/docs/how-to-implement-contextual-rag-from-anthropic.md): An open source line-by-line implementation and explanation of Contextual RAG from Anthropic! - [Improve search with rerankers](https://docs.together.ai/docs/how-to-improve-search-with-rerankers.md): Learn how you can improve semantic search quality with reranker models! - [How to use Cline with DeepSeek V3 to build faster](https://docs.together.ai/docs/how-to-use-cline.md): Use Cline (an AI coding agent) with DeepSeek V3 (a powerful open source model) to code faster. - [Quickstart: How to Use OpenClaw with Together AI](https://docs.together.ai/docs/how-to-use-openclaw.md): Learn how to pair OpenClaw, a powerful autonomous agent, with frontier OSS models on Together AI like Kimi K2.5 and GLM 4.7. - [How to use OpenCode with Together AI to build faster](https://docs.together.ai/docs/how-to-use-opencode.md): Learn how to combine OpenCode, a powerful terminal-based AI coding agent, with Together AI models like DeepSeek V3 to supercharge your development workflow. - [How to use Qwen Code with Together AI for enhanced development workflow](https://docs.together.ai/docs/how-to-use-qwen-code.md): Learn how to configure Qwen Code, a powerful AI-powered command-line workflow tool, with Together AI models to supercharge your coding workflow with advanced code understanding and automation. - [IAM model](https://docs.together.ai/docs/identity-access-management.md): How users, credentials, and resources are organized across the Together platform - [Manage batch jobs](https://docs.together.ai/docs/inference/batch/manage.md): Status, results, errors, and operational reference for the Together Batch API. - [Overview](https://docs.together.ai/docs/inference/batch/overview.md): Run asynchronous batch workloads at up to 50% lower cost. - [Run a batch job](https://docs.together.ai/docs/inference/batch/tutorial.md): Prepare a JSONL file, upload it, start a batch job, poll until it finishes, and retrieve results. - [Log probabilities](https://docs.together.ai/docs/inference/chat/logprobs.md): Return per-token log probabilities to measure model confidence and route low-confidence outputs to a stronger model. - [Send chat completions](https://docs.together.ai/docs/inference/chat/overview.md): Query chat models with single prompts, multi-turn conversations, and system prompts. - [Parameters](https://docs.together.ai/docs/inference/chat/parameters.md): The full list of parameters you can pass to the chat completions endpoint. - [Reasoning](https://docs.together.ai/docs/inference/chat/reasoning.md): Use reasoning models that think step-by-step before answering. - [Structured outputs](https://docs.together.ai/docs/inference/chat/structured-outputs.md): Use JSON mode to get structured outputs from supported chat models. - [Generate embeddings](https://docs.together.ai/docs/inference/embeddings/embeddings.md): Turn text into vector embeddings for search, classification, recommendations, and RAG. - [Retrieval-augmented generation](https://docs.together.ai/docs/inference/embeddings/rag.md): Build a retrieval-augmented generation pipeline with Together embeddings, rerank, and chat completions. - [Rerank](https://docs.together.ai/docs/inference/embeddings/rerank.md): Reorder retrieved documents by relevance to a query for sharper search and RAG results. - [Agentic function calling patterns](https://docs.together.ai/docs/inference/function-calling/agentic.md): Tool use across multiple steps or conversation turns, covering multi-step and multi-turn agent loops. - [Function calling patterns](https://docs.together.ai/docs/inference/function-calling/overview.md): Function calling lets LLMs respond with structured function names and arguments your application can execute. - [Call functions in parallel](https://docs.together.ai/docs/inference/function-calling/parallel.md): Multiple tool calls in one response, covering parallel calls to the same tool and to different tools. - [Call functions](https://docs.together.ai/docs/inference/function-calling/single-call.md): One tool call per response, covering simple and multiple-tool patterns. - [Text-to-image generation](https://docs.together.ai/docs/inference/images/overview.md): Generate images from text prompts. - [Image generation parameters](https://docs.together.ai/docs/inference/images/parameters.md): Parameter reference for the images API: dimensions, quality control, base64 responses, safety checker, and troubleshooting. - [Image-to-image generation](https://docs.together.ai/docs/inference/images/reference-images.md): Edit or transform an existing image by passing image_url (Kontext) or reference_images (FLUX.2 and Google models). - [OpenAI compatibility](https://docs.together.ai/docs/inference/openai-compatibility.md): Point your OpenAI Python or TypeScript client at Together AI to call open-source models without rewriting your app. - [Overview](https://docs.together.ai/docs/inference/overview.md): Run inference on 100+ open-source models. - [Pricing](https://docs.together.ai/docs/inference/pricing.md): How Together AI bills for inference. - [Recommended models](https://docs.together.ai/docs/inference/recommended-models.md): Our picks for common inference use cases. - [Third-party integrations](https://docs.together.ai/docs/inference/sdk-integrations.md): Use Together AI models through partner SDKs and integrations. - [Generate speech](https://docs.together.ai/docs/inference/text-to-speech/overview.md): Generate speech audio from text with Together AI text-to-speech models. - [Text-to-speech streaming](https://docs.together.ai/docs/inference/text-to-speech/streaming.md): Stream audio over HTTP for low time-to-first-byte and access raw PCM bytes. - [WebSocket API](https://docs.together.ai/docs/inference/text-to-speech/websocket.md): Stream text in and audio out over a single WebSocket connection for the lowest interactive latency. - [Advanced transcription options](https://docs.together.ai/docs/inference/transcription/features.md): Speaker diarization, word-level timestamps, response formats, async support, and best practices. - [Transcribe audio](https://docs.together.ai/docs/inference/transcription/overview.md): Transcribe and translate audio into text. - [Streaming transcription](https://docs.together.ai/docs/inference/transcription/streaming.md): Use the real-time WebSocket API for low-latency, incremental speech-to-text. - [Audio translation](https://docs.together.ai/docs/inference/transcription/translation.md): Translate speech in any language into English text. - [Voice activity detection](https://docs.together.ai/docs/inference/transcription/voice-activity-detection.md): Configure voice activity detection to control how speech segments are detected in real-time transcription. - [Audio input for videos](https://docs.together.ai/docs/inference/videos/audio-input.md): Drive video generation with an audio file for lip sync, beat-matched motion, or narration. - [Generate videos](https://docs.together.ai/docs/inference/videos/overview.md): Generate videos from text and image prompts. - [Video generation parameters](https://docs.together.ai/docs/inference/videos/parameters.md): Reference for video generation parameters, including guidance scale and quality control. - [Reference images and keyframes](https://docs.together.ai/docs/inference/videos/reference-and-keyframes.md): Guide visual style with reference images and control specific frames in your video. - [Vision-language function calling](https://docs.together.ai/docs/inference/vision/function-calling.md): Combine image understanding with tool use on Together AI vision-language models. - [Vision input modes](https://docs.together.ai/docs/inference/vision/inputs.md): Send local images, video URLs, or multiple images to a vision model in a single request. - [Use image inputs](https://docs.together.ai/docs/inference/vision/overview.md): Run vision-language models on Together: pass images alongside text and get structured replies, transcripts, comparisons, or extracted data. - [Structured extraction with vision](https://docs.together.ai/docs/inference/vision/structured-extraction.md): Combine image input with a JSON schema to extract typed data from screenshots, documents, and photos. - [Iterative Workflow](https://docs.together.ai/docs/iterative-workflow.md): Iteratively call LLMs to optimize task performance. - [Kimi K2 quickstart](https://docs.together.ai/docs/kimi-k2-quickstart.md): How to get the most out of models like Kimi K2. - [Kimi K2 Thinking quickstart](https://docs.together.ai/docs/kimi-k2-thinking-quickstart.md): How to get the most out of reasoning models like Kimi K2 Thinking. - [Kimi K2.5 quickstart](https://docs.together.ai/docs/kimi-k2.5-quickstart.md): How to get the most out of Kimi's new K2.5 model. - [LangGraph](https://docs.together.ai/docs/langgraph.md): Using LangGraph with Together AI - [Llama 4 quickstart](https://docs.together.ai/docs/llama4-quickstart.md): How to get the most out of the new Llama 4 models. - [LoRA fine-tuning](https://docs.together.ai/docs/lora-training-and-inference.md): Fine-tune and run dedicated inference for a model with LoRA adapters - [Together Mixture of Agents (MoA)](https://docs.together.ai/docs/mixture-of-agents.md) - [How to run nanochat on Instant Clusters⚡️](https://docs.together.ai/docs/nanochat-on-instant-clusters.md): Learn how to train Andrej Karpathy's end-to-end ChatGPT clone on Together's on-demand GPU clusters - [Quickstart: Next.Js](https://docs.together.ai/docs/nextjs-chat-quickstart.md): Build an app that can ask a single question or chat with an LLM using Next.js and Together AI. - [Build an open source NotebookLM: PDF to podcast](https://docs.together.ai/docs/open-notebooklm-pdf-to-podcast.md): In this guide we will see how to create a podcast like the one below from a PDF input! - [Organizations](https://docs.together.ai/docs/organizations.md): Create and manage your Together Organization, invite Members, and configure billing - [Parallel Workflow](https://docs.together.ai/docs/parallel-workflows.md): Execute multiple LLM calls in parallel and aggregate afterwards. - [Preference fine-tuning](https://docs.together.ai/docs/preference-fine-tuning.md): Learn how to use preference fine-tuning on Together Fine-Tuning Platform - [Privacy and security](https://docs.together.ai/docs/privacy-and-security.md): How Together handles your inputs, outputs, and account data, plus enterprise options for data residency and private networking. - [Projects](https://docs.together.ai/docs/projects.md): Create isolated workspaces to organize resources, manage team access, and scope API keys - [PydanticAI](https://docs.together.ai/docs/pydanticai.md): Using PydanticAI with Together - [Python v2 SDK Migration Guide](https://docs.together.ai/docs/pythonv2-migration-guide.md): Migrate from Together Python v1 to v2 - the new Together AI Python SDK with improved type safety and modern architecture. - [Quickstart](https://docs.together.ai/docs/quickstart.md): Make your first request to Together AI in a few minutes. - [FLUX.2 quickstart](https://docs.together.ai/docs/quickstart-flux.md): Learn how to use FLUX.2, the next generation image model with advanced prompting capabilities - [FLUX Kontext quickstart](https://docs.together.ai/docs/quickstart-flux-kontext.md): Learn how to use Flux's new in-context image generation models - [FLUX LoRA quickstart](https://docs.together.ai/docs/quickstart-flux-lora.md) - [Quickstart: How to do OCR](https://docs.together.ai/docs/quickstart-how-to-do-ocr.md): A step by step guide on how to do OCR with Together AI's vision models with structured outputs - [Quickstart: Retrieval Augmented Generation (RAG)](https://docs.together.ai/docs/quickstart-retrieval-augmented-generation-rag.md): How to build a RAG workflow in under 5 mins! - [Hugging Face Inference quickstart](https://docs.together.ai/docs/quickstart-using-hugging-face-inference.md): This guide will walk you through how to use Together models with Hugging Face Inference. - [Roles & permissions (RBAC)](https://docs.together.ai/docs/roles-permissions.md): Understand Organization and Project role-based access control (RBAC) including Admin and Member roles and what each can do across the Together platform - [Sequential Workflow](https://docs.together.ai/docs/sequential-agent-workflow.md): Coordinating a chain of LLM calls to solve a complex task. - [Serverless models](https://docs.together.ai/docs/serverless/models.md): Browse the catalog of available models for instant inference. - [Serverless rate limits](https://docs.together.ai/docs/serverless/rate-limits.md): Together AI applies dynamic per-model rate limits that scale with your sustained traffic on serverless inference. - [Slurm management system](https://docs.together.ai/docs/slurm.md) - [Slurm configuration](https://docs.together.ai/docs/slurm-configuration.md): Customize Slurm cluster settings to match your workload requirements - [Single sign-on (SSO)](https://docs.together.ai/docs/sso.md): Connect your Identity Provider for secure, automated team access to Together - [Support](https://docs.together.ai/docs/support.md): Search the support portal, file a ticket, or reach the Together AI team by email, Slack, or Discord. - [Code interpreter](https://docs.together.ai/docs/together-code-interpreter.md): Execute LLM-generated code seamlessly with a simple API call. - [Code sandbox](https://docs.together.ai/docs/together-code-sandbox.md): Level-up generative code tooling with fast, secure code sandboxes at scale - [Architecture](https://docs.together.ai/docs/together-deployments.md): Architecture, deployment lifecycle, and core concepts for dedicated container inference. - [Quickstart: Using Mastra with Together AI](https://docs.together.ai/docs/using-together-with-mastra.md): This guide will walk you through how to use Together models with Mastra. - [Vercel AI SDK quickstart](https://docs.together.ai/docs/using-together-with-vercels-ai-sdk.md): This guide will walk you through how to use Together models with the Vercel AI SDK. - [Wan 2.7 quickstart](https://docs.together.ai/docs/wan2.7-quickstart.md): Generate videos from text, images, and reference materials with the Wan 2.7 model family. - [Agent Workflows](https://docs.together.ai/docs/workflows.md): Orchestrating together multiple language model calls to solve complex tasks. - [Together Cookbooks & Example Apps](https://docs.together.ai/examples.md): Explore our vast library of open-source cookbooks & example apps - [How to build a real-time image generator with Flux and Together AI](https://docs.together.ai/external-link-02.md) - [Overview](https://docs.together.ai/intro.md): Run, train, and serve open-source AI models on Together AI. - [Python Library](https://docs.together.ai/python-library.md) - [Create audio generation request](https://docs.together.ai/reference/audio-speech.md): Generate audio from input text - [Create realtime text-to-speech](https://docs.together.ai/reference/audio-speech-websocket.md): Establishes a WebSocket connection for real-time text-to-speech generation. This endpoint uses WebSocket protocol (wss://api.together.ai/v1/audio/speech/websocket) for bidirectional streaming communication. - [Create audio transcription request](https://docs.together.ai/reference/audio-transcriptions.md): Transcribes audio into text - [Real-time audio transcription via WebSocket](https://docs.together.ai/reference/audio-transcriptions-realtime.md): Establishes a WebSocket connection for real-time audio transcription. This endpoint uses WebSocket protocol (wss://api.together.ai/v1/realtime) for bidirectional streaming communication. - [Create audio translation request](https://docs.together.ai/reference/audio-translations.md): Translates audio into English - [Cancel a batch job](https://docs.together.ai/reference/batch-cancel.md): Cancel a batch job by ID - [Create a batch job](https://docs.together.ai/reference/batch-create.md): Create a new batch job with the given input file and endpoint - [Get a batch job](https://docs.together.ai/reference/batch-get.md): Get details of a batch job by ID - [List batch jobs](https://docs.together.ai/reference/batch-list.md): List all batch jobs for the authenticated user - [Create chat completion](https://docs.together.ai/reference/chat-completions.md): Generate a model response for a given chat conversation. Supports single queries and multi-turn conversations with system, user, and assistant messages. - [Clusters](https://docs.together.ai/reference/cli/clusters.md): Reserve, configure, and manage GPU clusters from your terminal. - [Endpoints](https://docs.together.ai/reference/cli/endpoints.md): Create, update, and manage dedicated inference endpoints from your terminal. - [Evals](https://docs.together.ai/reference/cli/evals.md): Create and manage model-evaluation jobs from your terminal, including classify, score, and compare evals. - [Files](https://docs.together.ai/reference/cli/files.md): Upload and manage datasets for use in fine-tuning, evals, and batch inference. - [Fine-tuning](https://docs.together.ai/reference/cli/finetune.md): Create, monitor, and manage fine-tuning jobs from your terminal. - [Get started](https://docs.together.ai/reference/cli/getting-started.md): Install the Together CLI to deploy endpoints, fine-tune models, and manage GPU clusters from your terminal. - [Jig CLI reference](https://docs.together.ai/reference/cli/jig.md): CLI commands, pyproject.toml configuration, environment variables, and Python SDK for dedicated containers. - [Models](https://docs.together.ai/reference/cli/models.md): List Together AI models and upload your own from Hugging Face or S3. - [Telemetry](https://docs.together.ai/reference/cli/telemetry.md): Understand what the Together CLI tracks, how to opt out, and where the local config file lives. - [Create a GPU cluster](https://docs.together.ai/reference/clusters-create.md): Create an Instant Cluster on Together's high-performance GPU clusters. With features like on-demand scaling, long-lived resizable high-bandwidth shared DC-local storage, Kubernetes and Slurm cluster flavors, a REST API, and Terraform support, you can run workloads flexibly without complex infrastruc… - [Delete GPU cluster by cluster ID](https://docs.together.ai/reference/clusters-delete.md): Delete a GPU cluster by cluster ID. - [Get GPU cluster by cluster ID](https://docs.together.ai/reference/clusters-get.md): Retrieve information about a specific GPU cluster. - [List all GPU clusters](https://docs.together.ai/reference/clusters-list.md): List all GPU clusters. - [List regions and corresponding supported driver versions](https://docs.together.ai/reference/clusters-list-regions.md) - [Update a GPU cluster](https://docs.together.ai/reference/clusters-update.md): Update the configuration of an existing GPU cluster. - [Create a shared volume](https://docs.together.ai/reference/clusters_storages-create.md): Instant Clusters supports long-lived, resizable in-DC shared storage with user data persistence. You can dynamically create and attach volumes to your cluster at cluster creation time, and resize as your data grows. All shared storage is backed by multi-NIC bare metal paths, ensuring high-throughput… - [Delete a shared volume by ID](https://docs.together.ai/reference/clusters_storages-delete.md): Delete a shared volume. Note that if this volume is attached to a cluster, deleting will fail. - [Get a shared volume by ID](https://docs.together.ai/reference/clusters_storages-get.md): Retrieve information about a specific shared volume. - [List all shared volumes](https://docs.together.ai/reference/clusters_storages-list.md): List all shared volumes. - [Update a shared volume](https://docs.together.ai/reference/clusters_storages-update.md): Update the configuration of an existing shared volume. - [Create completion](https://docs.together.ai/reference/completions.md): Generate text completions for a given prompt using a language, code, or image model. - [Create an evaluation job](https://docs.together.ai/reference/create-evaluation.md) - [Create video](https://docs.together.ai/reference/create-videos.md): Create a video - [Create a dedicated endpoint](https://docs.together.ai/reference/createendpoint.md): Creates a new dedicated endpoint for serving models. The endpoint starts automatically after creation. You can deploy any supported model on hardware configurations that meet the model's requirements. - [Sprocket SDK reference](https://docs.together.ai/reference/dci-reference-sprocket.md): API reference for Sprocket classes, functions, and configuration. - [Delete a file](https://docs.together.ai/reference/delete-files-id.md): Delete a previously uploaded data file. - [Delete a fine-tune job](https://docs.together.ai/reference/delete-fine-tunes-id.md): Delete a fine-tuning job. - [Delete endpoint](https://docs.together.ai/reference/deleteendpoint.md): Permanently deletes an endpoint. This action cannot be undone. - [Create a new deployment](https://docs.together.ai/reference/deployments-create.md): Create a new deployment with specified configuration - [Delete a deployment](https://docs.together.ai/reference/deployments-delete.md): Delete an existing deployment - [Get a deployment by ID or name](https://docs.together.ai/reference/deployments-get.md): Retrieve details of a specific deployment by its ID or name - [Get the list of deployments](https://docs.together.ai/reference/deployments-list.md): Get a list of all deployments in your project - [Get logs for a deployment](https://docs.together.ai/reference/deployments-logs.md): Retrieve logs from a deployment, optionally filtered by replica ID. - [Create a new secret](https://docs.together.ai/reference/deployments-secrets-create.md): Create a new secret to store sensitive configuration values - [Delete a secret](https://docs.together.ai/reference/deployments-secrets-delete.md): Delete an existing secret - [Get a secret by ID or name](https://docs.together.ai/reference/deployments-secrets-get.md): Retrieve details of a specific secret by its ID or name - [Get the list of project secrets](https://docs.together.ai/reference/deployments-secrets-list.md): Retrieve all secrets in your project - [Update a secret](https://docs.together.ai/reference/deployments-secrets-update.md): Update an existing secret's value or metadata - [Download a file](https://docs.together.ai/reference/deployments-storage-get.md): Download a file by redirecting to a signed URL - [Create a new volume](https://docs.together.ai/reference/deployments-storage-volumes-create.md): Create a new volume to preload files in deployments - [Delete a volume](https://docs.together.ai/reference/deployments-storage-volumes-delete.md): Delete an existing volume - [Get a volume by ID or name](https://docs.together.ai/reference/deployments-storage-volumes-get.md): Retrieve details of a specific volume by its ID or name - [Get the list of project volumes](https://docs.together.ai/reference/deployments-storage-volumes-list.md): Retrieve all volumes in your project - [Update a volume](https://docs.together.ai/reference/deployments-storage-volumes-update.md): Update an existing volume's configuration or contents - [Update a deployment](https://docs.together.ai/reference/deployments-update.md): Update an existing deployment configuration - [Create embedding](https://docs.together.ai/reference/embeddings.md): Generate vector embeddings for one or more text inputs. Returns numerical arrays representing semantic meaning, useful for search, classification, and retrieval. - [Get evaluation job details](https://docs.together.ai/reference/get-evaluation.md) - [Get evaluation job status and results](https://docs.together.ai/reference/get-evaluation-status.md) - [List all files](https://docs.together.ai/reference/get-files.md): List the metadata for all uploaded data files. - [Retrieve file metadata](https://docs.together.ai/reference/get-files-id.md): Retrieve the metadata for a single uploaded data file. - [Get file contents](https://docs.together.ai/reference/get-files-id-content.md): Get the contents of a single uploaded data file. - [List all jobs](https://docs.together.ai/reference/get-fine-tunes.md): List the metadata for all fine-tuning jobs. Returns a list of FinetuneResponseTruncated objects. - [List job](https://docs.together.ai/reference/get-fine-tunes-id.md): List the metadata for a single fine-tuning job. - [List checkpoints](https://docs.together.ai/reference/get-fine-tunes-id-checkpoint.md): List the checkpoints for a single fine-tuning job. - [List job events](https://docs.together.ai/reference/get-fine-tunes-id-events.md): List the events for a single fine-tuning job. - [Download model](https://docs.together.ai/reference/get-finetune-download.md): Receive a compressed fine-tuned model or checkpoint. - [Fetch video metadata](https://docs.together.ai/reference/get-videos-id.md): Fetch video metadata - [Get endpoint by ID](https://docs.together.ai/reference/getendpoint.md): Retrieves details about a specific endpoint, including its current state, configuration, and scaling settings. - [Get model list](https://docs.together.ai/reference/list-evaluation-models.md) - [Get all evaluation jobs](https://docs.together.ai/reference/list-evaluations.md) - [List all endpoints](https://docs.together.ai/reference/listendpoints.md): Returns a list of all endpoints associated with your account. You can filter the results by type (dedicated or serverless). - [List available hardware configurations](https://docs.together.ai/reference/listhardware.md): Returns a list of available hardware configurations for deploying models. When a model parameter is provided, it returns only hardware configurations compatible with that model, including their current availability status. - [List all models](https://docs.together.ai/reference/models.md): Lists all of Together's open-source models - [Create job](https://docs.together.ai/reference/post-fine-tunes.md): Create a fine-tuning job with the provided model and training data. - [Cancel job](https://docs.together.ai/reference/post-fine-tunes-id-cancel.md): Cancel a currently running fine-tuning job. Returns a FinetuneResponseTruncated object. - [Create image](https://docs.together.ai/reference/post-images-generations.md): Use an image model to generate an image for a given prompt. - [Cancel a queued job](https://docs.together.ai/reference/queue-cancel.md): Cancel a pending job. Only jobs in pending status can be canceled. Running jobs cannot be stopped. Returns the job status after the attempt. If the job is not pending, returns 409 with the current status unchanged. - [Get queue metrics](https://docs.together.ai/reference/queue-metrics.md): Get the current queue statistics for a model, including pending and running job counts. - [Get job status](https://docs.together.ai/reference/queue-status.md): Poll the current status of a previously submitted job. Provide the request_id and model as query parameters. - [Submit a queued job](https://docs.together.ai/reference/queue-submit.md): Submit a new job to the queue for asynchronous processing. Jobs are processed in strict priority order (higher priority first, FIFO within the same priority). Returns a request ID that can be used to poll status or cancel the job. - [Remediation approve](https://docs.together.ai/reference/remediation-approve.md): Approves a pending remediation. - [Remediation cancel](https://docs.together.ai/reference/remediation-cancel.md): Cancels a pending remediation. - [Remediation create](https://docs.together.ai/reference/remediation-create.md): Creates a new remediation for an instance. - [Remediation get](https://docs.together.ai/reference/remediation-get.md): Retrieve the status of a specific remdiation on a specific instance in a specific cluster. - [Remediation list](https://docs.together.ai/reference/remediation-list.md): Lists remediations for an instance or cluster. - [Remediation reject](https://docs.together.ai/reference/remediation-reject.md): Rejects a pending remediation. - [Create a rerank request](https://docs.together.ai/reference/rerank.md): Rerank a list of documents by relevance to a query. Returns a relevance score and ordering index for each document. - [Execute code](https://docs.together.ai/reference/tci-execute.md): Executes the given code snippet and returns the output. Without a session_id, a new session is created to run the code. If you pass a valid session_id, the code runs in that session. This is useful for running multiple code snippets in the same environment, because dependencies and similar things ar… - [List active sessions](https://docs.together.ai/reference/tci-sessions.md): Lists all your currently active sessions. - [Update endpoint, this can also be used to start or stop a dedicated endpoint](https://docs.together.ai/reference/updateendpoint.md): Updates an existing endpoint's configuration. You can modify the display name, autoscaling settings, or change the endpoint's state (start/stop). - [Upload a file](https://docs.together.ai/reference/upload-file.md): Upload a file with specified purpose, file name, and file type. - [Upload a custom model or adapter](https://docs.together.ai/reference/upload-model.md): Upload a custom model or adapter from Hugging Face or S3 - [TypeScript Library](https://docs.together.ai/typescript-library.md) ## OpenAPI Specs - [openapi](https://docs.together.ai/openapi.yaml) - [clusters-remediation-openapi](https://docs.together.ai/clusters-remediation-openapi.yaml) - [tcloud](https://docs.together.ai/tcloud.yaml) - [deprecated-spec](https://docs.together.ai/deprecated-spec.json)