Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Before you begin, make sure you have:- Created an account and generated an API key.
- Set your API key as an environment variable in your terminal.
- Installed the Together CLI on your machine.
- Installed the Python or TypeScript SDK.
Step 1: Pick a model
You can deploy any model from the dedicated endpoint model catalog, or upload your own custom model. For this quickstart we’ll useQwen/Qwen3.5-9B-FP8.
Step 2: Pick your hardware
Some models can be deployed on multiple hardware types at different price points. List compatible hardware options with the CLI:Shell
Step 3: Deploy the endpoint
Create the endpoint with thetg endpoints create command, using the hardware ID output from the previous step. The --wait flag blocks until the endpoint is ready:
Shell
Name field (e.g., tester/Qwen/Qwen3.5-9B-FP8-bb04c904).
The endpoint name is passed to the
model parameter for API inference requests. The endpoint ID (e.g., endpoint-e6c6b82f-...) is used for management operations like start, stop, update, and delete.Step 4: Send a request
Send a request to your endpoint, passing the name you copied in the previous step into themodel parameter:
Congrats! You just made deployed and called your first dedicated endpoint on Together AI.
Stop the endpoint
Dedicated endpoints bill per minute as long as they’re running. Stop your endpoint when you no longer need it so you don’t accrue charges:Shell
ID field of tg endpoints retrieve, or run tg endpoints list to see all your endpoints.
Next steps
Available models
Browse the list of available models for instant deployment.
Manage endpoints
Create, start, stop, restart, list, update, and delete dedicated endpoints via the web UI, API, or CLI.
Endpoint settings
Configure endpoint hardware, autoscaling, decoding, and prompt caching.
Upload a custom model
Upload your own model weights.