> ## Documentation Index
> Fetch the complete documentation index at: https://docs.together.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Overview

> Reserved-hardware inference endpoints with predictable performance, no shared rate limits, and per-endpoint configuration.

<Tip>
  Using a coding agent? Install the [together-dedicated-endpoints](/docs/agent-skills) skill to let your agent create and manage dedicated endpoints for you.
</Tip>

A dedicated endpoint serves a single model on hardware reserved only for you, offering predictable latency and no [shared-fleet rate limits](/docs/serverless/rate-limits). They are highly configurable, allowing you to upload custom fine-tuned models, and configure autoscaling and decoding optimizations to match your workload.

Dedicated endpoints use the same [inference APIs](/docs/inference/overview#shared-inference-api) as [serverless models](/docs/serverless/models), allowing you to prototype with serverless, then switch to dedicated endpoints without changing your application code.

## Get started

<CardGroup cols={3}>
  <Card title="Quickstart" icon="rocket" href="/docs/dedicated-endpoints/quickstart">
    Deploy and call your first endpoint in 5 minutes.
  </Card>

  <Card title="Manage endpoints" icon="tool" href="/docs/dedicated-endpoints/manage">
    Create, start, stop, update, and delete via the UI or API.
  </Card>

  <Card title="Endpoint settings" icon="adjustments-horizontal" href="/docs/dedicated-endpoints/settings">
    Configure endpoint hardware, autoscaling, decoding, prompt caching.
  </Card>

  <Card title="Inference APIs" icon="code" href="/docs/inference/overview">
    Explore the API surface for chat, vision, audio, embeddings, and more.
  </Card>

  <Card title="Available models" icon="list" href="/docs/dedicated-endpoints/models">
    Browse Together-hosted models you can deploy on dedicated endpoints.
  </Card>

  <Card title="Upload a custom model" icon="upload" href="/docs/dedicated-endpoints/custom-models">
    Upload your own model weights.
  </Card>
</CardGroup>

## Pricing

Dedicated endpoints bill per-minute by hardware while the endpoint is running, regardless of your model or request volume.

| Hardware type | Cost/hour |
| ------------- | --------- |
| 1x H100 80GB  | \$3.99    |
| 1x H200 141GB | \$5.49    |
| 1x B200 180GB | \$9.95    |

Each running [replica](/docs/dedicated-endpoints/settings#replica-count) bills independently, and stop billing as soon as they are [scaled down](/docs/dedicated-endpoints/scaling).
