> ## Documentation Index > Fetch the complete documentation index at: https://docs.together.ai/llms.txt > Use this file to discover all available pages before exploring further. # Overview > Run inference on 100+ open-source models. Together AI offers two ways to run inference: **[Serverless models](/docs/serverless/models):** A shared fleet of popular open models you can call through a per-token API. No GPUs to provision or manage. Best for prototyping, or apps with variable traffic. **[Dedicated endpoints](/docs/dedicated-endpoints/overview):** A single model running on GPUs reserved for you, billed per minute by hardware. Best for apps with steady traffic, consistent latency, or for serving fine-tuned models. ## Get started Set up an API key and make your first call in Python, TypeScript, or cURL. Our picks for common inference use cases. How Together AI bills for inference. ## Shared inference API Serverless models and dedicated endpoints use the same inference APIs for generating and retrieving model outputs. Apps work on either deployment mode without code changes; just swap the `model` parameter: ```python Python highlight={7,13} theme={null} from together import Together client = Together() # Serverless model request response = client.chat.completions.create( model="moonshotai/Kimi-K2.6", messages=[{"role": "user", "content": "Hello!"}], ) # Dedicated endpoint request response = client.chat.completions.create( model="/Qwen/Qwen3.5-9B-FP8-bb04c904", messages=[{"role": "user", "content": "Hello!"}], ) ``` ```typescript TypeScript highlight={6,12} theme={null} import Together from "together-ai"; const client = new Together(); // Serverless model request let response = await client.chat.completions.create({ model: "moonshotai/Kimi-K2.6", messages: [{ role: "user", content: "Hello!" }], }); // Dedicated endpoint request response = await client.chat.completions.create({ model: "/Qwen/Qwen3.5-9B-FP8-bb04c904", messages: [{ role: "user", content: "Hello!" }], }); ``` ```bash cURL highlight={6,15} theme={null} # Serverless model request curl -X POST "https://api.together.ai/v1/chat/completions" \ -H "Authorization: Bearer $TOGETHER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "moonshotai/Kimi-K2.6", "messages": [{"role": "user", "content": "Hello!"}] }' # Dedicated endpoint request curl -X POST "https://api.together.ai/v1/chat/completions" \ -H "Authorization: Bearer $TOGETHER_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "/Qwen/Qwen3.5-9B-FP8-bb04c904", "messages": [{"role": "user", "content": "Hello!"}] }' ``` ## Integrations Drop-in replacement for OpenAI clients. Together SDKs and framework wiring. ## Batch processing If your workload doesn't need a real-time response, submit it as a batch job for up to 50% off serverless rates. ## Model capabilities Chat completions, streaming, parameters. Tool use and agentic loops. Pass images alongside text. FLUX, Kontext, and Google models. Text-to-video and image-to-video. Batch and streaming transcription. HTTP and WebSocket audio output. Vectors, rerankers, and RAG.