Quickstart - Together AI docs

Prerequisites

Before you begin, make sure you have:

Created an account and generated an API key.
Set your API key as an environment variable in your terminal.
Installed the Together CLI on your machine.
Installed the Python or TypeScript SDK.

Step 1: Pick a model

You can deploy any model from the dedicated endpoint model catalog, or upload your own custom model. For this quickstart we’ll use Qwen/Qwen3.5-9B-FP8.

Step 2: Pick your hardware

Some models can be deployed on multiple hardware types at different price points. List compatible hardware options with the CLI:

Shell

tg endpoints hardware --model Qwen/Qwen3.5-9B-FP8

You’ll see output similar to this:

Hardware ID              GPU    Memory    Count    Price (per minute)    availability
1x_nvidia_h100_80gb_sxm  h100   80GB      1        \$0.06                ✓ available

Step 3: Deploy the endpoint

Create the endpoint with the tg endpoints create command, using the hardware ID output from the previous step. The --wait flag blocks until the endpoint is ready:

Shell

tg endpoints create \
  --model Qwen/Qwen3.5-9B-FP8 \
  --hardware 1x_nvidia_h100_80gb_sxm \
  --display-name "My quickstart endpoint" \
  --wait

When it returns, copy the endpoint name from the Name field (e.g., tester/Qwen/Qwen3.5-9B-FP8-bb04c904).

The endpoint name is passed to the model parameter for API inference requests. The endpoint ID (e.g., endpoint-e6c6b82f-...) is used for management operations like start, stop, update, and delete.

Step 4: Send a request

Send a request to your endpoint, passing the name you copied in the previous step into the model parameter:

from together import Together

client = Together()

response = client.chat.completions.create(
    model="tester/Qwen/Qwen3.5-9B-FP8-bb04c904",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Congrats! You just made deployed and called your first dedicated endpoint on Together AI.

Stop the endpoint

Dedicated endpoints bill per minute as long as they’re running. Stop your endpoint when you no longer need it so you don’t accrue charges:

Shell

tg endpoints stop <endpoint_id>

Find the endpoint ID in the ID field of tg endpoints retrieve, or run tg endpoints list to see all your endpoints.

Next steps

Available models

Browse the list of available models for instant deployment.

Manage endpoints

Create, start, stop, restart, list, update, and delete dedicated endpoints via the web UI, API, or CLI.

Endpoint settings

Configure endpoint hardware, autoscaling, decoding, and prompt caching.

Upload a custom model

Upload your own model weights.

Documentation Index

​Prerequisites

​Step 1: Pick a model

​Step 2: Pick your hardware

​Step 3: Deploy the endpoint

​Step 4: Send a request

​Stop the endpoint

​Next steps

Available models

Manage endpoints

Endpoint settings

Upload a custom model

Prerequisites

Step 1: Pick a model

Step 2: Pick your hardware

Step 3: Deploy the endpoint

Step 4: Send a request

Stop the endpoint

Next steps