Create, update and delete endpoints via the CLI

Create

Create a new dedicated inference endpoint.

Usage

together endpoints create [MODEL] [GPU] [OPTIONS]

Example

together endpoints create \
--model mistralai/Mixtral-8x7B-Instruct-v0.1 \
--gpu h100 \
--gpu-count 2 \
--display-name "My Endpoint" \
--wait

Options

OptionsDescription
--model- TEXT(required) The model to deploy
--gpu [ h100 | a100 | l40 | l40s | rtx-6000](required) GPU type to use for inference
--min-replicas- INTEGERMinimum number of replicas to deploy
--max-replicas- INTEGERMaximum number of replicas to deploy
--gpu-count - INTEGERNumber of GPUs to use per replica
--display-name- TEXTA human-readable name for the endpoint
--no-prompt-cacheDisable the prompt cache for this endpoint
--no-speculative-decodingDisable speculative decoding for this
endpoint
--no-auto-startCreate the endpoint in STOPPED state instead
of auto-starting it
--waitWait for the endpoint to be ready after
creation

Hardware

List all the hardware options, optionally filtered by model.

Usage

together endpoints hardware [OPTIONS]

Example

together endpoints hardware --model mistralai/Mixtral-8x7B-Instruct-v0.1

Options

OptionsDescription
--model- TEXTFilter hardware options by model
--jsonPrint output in JSON format
--availablePrint only available hardware options (can only be used if model is passed in)

Get

Print details for a specific endpoint.

Usage

together endpoints get [OPTIONS]

Example

together endpoints get endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

OptionsDescription
--jsonPrint output in JSON format

Update

Update an existing endpoint by listing the changes followed by the endpoint ID.

You can find the endpoint ID by listing your dedicated endpoints.

Usage

together endpoints update [OPTIONS] ENDPOINT_ID

Example

together endpoints update --min-replicas 2 --max-replicas 4 endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

Note: Both --min-replicas and --max-replicas must be specified together

OptionsDescription
--display-name - TEXTA new human-readable name for the endpoint
--min-replicas - INTEGERNew minimum number of replicas to maintain
--max-replicas - INTEGERNew maximum number of replicas to scale up to

Start

Start a dedicated inference endpoint.

Usage

together endpoints start [OPTIONS] ENDPOINT_ID

Example

together endpoints start endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

OptionsDescription
--waitWait for the endpoint to start

Stop

Stop a dedicated inference endpoint.

Usage

together endpoints stop [OPTIONS] ENDPOINT_ID

Example

together endpoints stop endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

OptionsDescription
--waitWait for the endpoint to stop

Update

Usage

Update an existing endpoint by listing the changes followed by the endpoint ID.

You can find the endpoint ID by listing your dedicated endpoints

together endpoints update [OPTIONS] ENDPOINT_ID

Example

together endpoints update --min-replicas 2 --max-replicas 4 endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

Note: Both --min-replicas and --max-replicas must be specified together

OptionsDescription
--display-name - TEXTA new human-readable name for the endpoint
--min-replicas - INTEGERNew minimum number of replicas to maintain
--max-replicas - INTEGERNew maximum number of replicas to scale up to

Delete

Delete a dedicated inference endpoint.

Usage

together endpoints delete [OPTIONS] ENDPOINT_ID

Example

together endpoints delete endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

List

Usage

together endpoints list [FLAGS]

Example

together endpoints list --type dedicated

Options


OptionsDescription
--jsonPrint output in JSON format
type [dedicatedserverless]Filter by endpoint type


Help

See all commands with

together endpoints --help