Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Create and manage dedicated endpoints for model inference.

Endpoint ID

Many commands require an ENDPOINT_ID to identify which endpoint to operate on. The endpoint ID is a unique identifier assigned at creation time, in the format endpoint-<uuid>. For example: endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462.
The endpoint ID is different from the model name (e.g., meta-llama/Llama-3.3-70B-Instruct-Turbo) or the display name you set with --display-name.

Find your endpoint ID

To find your endpoint ID, you can:
  • Run the tg endpoints create command to create an endpoint. The endpoint ID is returned in the output.
  • Run the tg endpoints list command to list all your endpoints. The endpoint ID is displayed for each endpoint.
  • View the endpoint details page in the Together AI console.

Create

Create a new dedicated endpoint.
Shell
tg endpoints create \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo \
  --hardware 4x_nvidia_h100_80gb_sxm \
  --display-name "My Endpoint" \
  --wait

Parameters

FlagDescription
--model [string](required) The model to deploy.
--hardware [string](required) GPU type to use for inference.

Use tg endpoints hardware to discover available GPU identifiers.
--min-replicas [number]Minimum number of replicas to deploy. Default: 1.
--max-replicas [number]Maximum number of replicas to deploy. Default: 1.
--display-name [string]A human-readable name for the endpoint.
--no-auto-startCreate the endpoint in STOPPED state instead of auto-starting it.
--no-speculative-decodingDisable speculative decoding for this endpoint.
--inactive-timeout [number]Minutes of inactivity after which the endpoint auto-stops. Set to 0 to disable.
--availability-zone [string]Start the endpoint in a specific availability zone (e.g. us-central-4b).

Use tg endpoints availability-zones to discover valid options.
--waitWait for the endpoint to be ready after creation. Cannot be combined with --json.
--no-prompt-cache is accepted for backward compatibility but no longer has any effect.

Hardware

List all hardware options (optionally filtered by model and availability).
tg endpoints hardware

Parameters

FlagDescription
--model [string]Filter hardware that is compatible with a given model.
--availableFilter for only hardware that is currently available.

Retrieve

Print details for a specific endpoint.
Shell
tg endpoints retrieve endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Update

Update the configuration of an existing endpoint.
Shell
tg endpoints update endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462 \
  --min-replicas 2 \
  --max-replicas 4 

Parameters

At least one update flag must be supplied.
FlagDescription
--display-name [string]New human-readable name for the endpoint.
--min-replicas [number]New minimum number of replicas to maintain.
--max-replicas [number]New maximum number of replicas to scale up to.
--inactive-timeout [number]Minutes of inactivity after which the endpoint auto-stops. Set to 0 to disable.

Start

Start a dedicated endpoint.
Shell
tg endpoints start endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Parameters

FlagDescription
--waitWait for the endpoint to start.

Stop

Stop a dedicated endpoint.
Shell
tg endpoints stop endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Parameters

FlagDescription
--waitWait for the endpoint to stop.

Delete

Delete a dedicated endpoint.
Shell
tg endpoints delete endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

List

List your dedicated endpoints.
Shell
tg endpoints list

Options

OptionsDescription
--usage-type [on-demand | reserved]Filter by usage type.
--after [string]Pagination cursor.
--mine and --type are accepted for backward compatibility but no longer have any effect. tg endpoints list already returns the dedicated endpoints on your account.

Availability zones

List the availability zones you can deploy endpoints into.
Shell
tg endpoints availability-zones