Skip to main content

Setup

See our Getting Started guide for initial setup.

Endpoint ID

Many commands require an ENDPOINT_ID to identify which endpoint to operate on. The endpoint ID is a unique identifier assigned when an endpoint is created, in the format:
endpoint-<uuid>
For example: endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462
The endpoint ID is different from the model name (e.g., mistralai/Mixtral-8x7B-Instruct-v0.1) or the display name you set with --display-name.

How to find your endpoint ID

You can find your endpoint ID in the following ways:
  1. From the create command output: The endpoint ID is returned when you create an endpoint.
  2. Using the list command: Run together endpoints list --mine to see all your endpoints with their IDs.
  3. From the web interface: The endpoint ID is shown in the endpoint details page on the Together AI console.

Create

Create a new dedicated inference endpoint.

Usage

Shell
together endpoints create \
  --model mistralai/Mixtral-8x7B-Instruct-v0.1 \
  --hardware 4x_nvidia_h100_80gb_sxm \
  --display-name "My Endpoint" \
  --wait

Options

OptionsArgumentDescription
--modelstring(required) The model to deploy
--hardwarestring(required) GPU type to use for inference
--min-replicasnumberMinimum number of replicas to deploy
--max-replicasnumberMaximum number of replicas to deploy
--display-namestringA human-readable name for the endpoint
--no-auto-startCreate the endpoint in STOPPED state instead of auto-starting it
--no-speculative-decodingDisable speculative decoding for this endpoint
--availability-zonetogether endpoints availability-zonesStart endpoint in specified availability zone
--waitWait for the endpoint to be ready after creation
--jsonOutputs in JSON

Hardware

List all the hardware options, optionally filtered by model.

Usage

together endpoints hardware [OPTIONS]

Options

OptionsArgumentDescription
--modelTEXTFilter hardware options by model
--jsonPrint output in JSON format
--availablePrint only available hardware options (can only be used if model is passed in)

Retrieve

Print details for a specific endpoint.

Usage

Shell
together endpoints retrieve endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

OptionsDescription
--jsonPrint output in JSON format

Update

Update an existing endpoint by listing the changes followed by the endpoint ID. You can find the endpoint ID by listing your dedicated endpoints.

Usage

Shell
together endpoints update --min-replicas 2 --max-replicas 4 endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

Note: Both --min-replicas and --max-replicas must be specified together
OptionsDescription
--display-name - TEXTA new human-readable name for the endpoint
--min-replicas - INTEGERNew minimum number of replicas to maintain
--max-replicas - INTEGERNew maximum number of replicas to scale up to

Start

Start a dedicated inference endpoint.

Usage

Shell
together endpoints start endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

OptionsDescription
--waitWait for the endpoint to start

Stop

Stop a dedicated inference endpoint.

Usage

Shell
together endpoints stop endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

OptionsDescription
--waitWait for the endpoint to stop

Delete

Delete a dedicated inference endpoint.

Usage

Shell
together endpoints delete endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

List

Usage

Shell
together endpoints list --type dedicated

Options

OptionsDescription
--jsonPrint output in JSON format
type [dedicated | serverless]Filter by endpoint type