Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Prerequisites

Before you begin, make sure you have:

Step 1: Pick a model

You can deploy any model from the dedicated endpoint model catalog, or upload your own custom model. For this quickstart we’ll use Qwen/Qwen3.5-9B-FP8.

Step 2: Pick your hardware

Some models can be deployed on multiple hardware types at different price points. List compatible hardware options with the CLI:
Shell
tg endpoints hardware --model Qwen/Qwen3.5-9B-FP8
You’ll see output similar to this:
Hardware ID              GPU    Memory    Count    Price (per minute)    availability
1x_nvidia_h100_80gb_sxm  h100   80GB      1        \$0.06                ✓ available

Step 3: Deploy the endpoint

Create the endpoint with the tg endpoints create command, using the hardware ID output from the previous step. The --wait flag blocks until the endpoint is ready:
Shell
tg endpoints create \
  --model Qwen/Qwen3.5-9B-FP8 \
  --hardware 1x_nvidia_h100_80gb_sxm \
  --display-name "My quickstart endpoint" \
  --wait
When it returns, copy the endpoint name from the Name field (e.g., tester/Qwen/Qwen3.5-9B-FP8-bb04c904).
The endpoint name is passed to the model parameter for API inference requests. The endpoint ID (e.g., endpoint-e6c6b82f-...) is used for management operations like start, stop, update, and delete.

Step 4: Send a request

Send a request to your endpoint, passing the name you copied in the previous step into the model parameter:
from together import Together

client = Together()

response = client.chat.completions.create(
    model="tester/Qwen/Qwen3.5-9B-FP8-bb04c904",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
Congrats! You just made deployed and called your first dedicated endpoint on Together AI.

Stop the endpoint

Dedicated endpoints bill per minute as long as they’re running. Stop your endpoint when you no longer need it so you don’t accrue charges:
Shell
tg endpoints stop <endpoint_id>
Find the endpoint ID in the ID field of tg endpoints retrieve, or run tg endpoints list to see all your endpoints.

Next steps

Available models

Browse the list of available models for instant deployment.

Manage endpoints

Create, start, stop, restart, list, update, and delete dedicated endpoints via the web UI, API, or CLI.

Endpoint settings

Configure endpoint hardware, autoscaling, decoding, and prompt caching.

Upload a custom model

Upload your own model weights.