Create and manage dedicated endpoints for model inference.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint ID
Many commands require anENDPOINT_ID to identify which endpoint to operate on. The endpoint ID is a unique identifier assigned at creation time, in the format endpoint-<uuid>.
For example: endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462.
The endpoint ID is different from the model name (e.g.,
meta-llama/Llama-3.3-70B-Instruct-Turbo) or the display name you set with --display-name.Find your endpoint ID
To find your endpoint ID, you can:- Run the
tg endpoints createcommand to create an endpoint. The endpoint ID is returned in the output. - Run the
tg endpoints listcommand to list all your endpoints. The endpoint ID is displayed for each endpoint. - View the endpoint details page in the Together AI console.
Create
Create a new dedicated endpoint.Shell
Parameters
| Flag | Description |
|---|---|
--model [string] | (required) The model to deploy. |
--hardware [string] | (required) GPU type to use for inference. Use tg endpoints hardware to discover available GPU identifiers. |
--min-replicas [number] | Minimum number of replicas to deploy. Default: 1. |
--max-replicas [number] | Maximum number of replicas to deploy. Default: 1. |
--display-name [string] | A human-readable name for the endpoint. |
--no-auto-start | Create the endpoint in STOPPED state instead of auto-starting it. |
--no-speculative-decoding | Disable speculative decoding for this endpoint. |
--inactive-timeout [number] | Minutes of inactivity after which the endpoint auto-stops. Set to 0 to disable. |
--availability-zone [string] | Start the endpoint in a specific availability zone (e.g. us-central-4b).Use tg endpoints availability-zones to discover valid options. |
--wait | Wait for the endpoint to be ready after creation. Cannot be combined with --json. |
--no-prompt-cache is accepted for backward compatibility but no longer has any effect.Hardware
List all hardware options (optionally filtered by model and availability).Parameters
| Flag | Description |
|---|---|
--model [string] | Filter hardware that is compatible with a given model. |
--available | Filter for only hardware that is currently available. |
Retrieve
Print details for a specific endpoint.Shell
Update
Update the configuration of an existing endpoint.Shell
Parameters
At least one update flag must be supplied.| Flag | Description |
|---|---|
--display-name [string] | New human-readable name for the endpoint. |
--min-replicas [number] | New minimum number of replicas to maintain. |
--max-replicas [number] | New maximum number of replicas to scale up to. |
--inactive-timeout [number] | Minutes of inactivity after which the endpoint auto-stops. Set to 0 to disable. |
Start
Start a dedicated endpoint.Shell
Parameters
| Flag | Description |
|---|---|
--wait | Wait for the endpoint to start. |
Stop
Stop a dedicated endpoint.Shell
Parameters
| Flag | Description |
|---|---|
--wait | Wait for the endpoint to stop. |
Delete
Delete a dedicated endpoint.Shell
List
List your dedicated endpoints.Shell
Options
| Options | Description |
|---|---|
--usage-type [on-demand | reserved] | Filter by usage type. |
--after [string] | Pagination cursor. |
--mine and --type are accepted for backward compatibility but no longer have any effect. tg endpoints list already returns the dedicated endpoints on your account.Availability zones
List the availability zones you can deploy endpoints into.Shell