Setup
See our Getting Started guide for initial setup.Endpoint ID
Many commands require anENDPOINT_ID to identify which endpoint to operate on. The endpoint ID is a unique identifier assigned when an endpoint is created, in the format:
endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462
The endpoint ID is different from the model name (e.g.,
mistralai/Mixtral-8x7B-Instruct-v0.1) or the display name you set with --display-name.How to find your endpoint ID
You can find your endpoint ID in the following ways:- From the create command output: The endpoint ID is returned when you create an endpoint.
-
Using the list command: Run
together endpoints list --mineto see all your endpoints with their IDs. - From the web interface: The endpoint ID is shown in the endpoint details page on the Together AI console.
Create
Create a new dedicated inference endpoint.Usage
Shell
Options
| Options | Argument | Description |
|---|---|---|
--model | string | (required) The model to deploy |
--hardware | string | (required) GPU type to use for inference |
--min-replicas | number | Minimum number of replicas to deploy |
--max-replicas | number | Maximum number of replicas to deploy |
--display-name | string | A human-readable name for the endpoint |
--no-auto-start | Create the endpoint in STOPPED state instead of auto-starting it | |
--no-speculative-decoding | Disable speculative decoding for this endpoint | |
--availability-zone | together endpoints availability-zones | Start endpoint in specified availability zone |
--wait | Wait for the endpoint to be ready after creation | |
--json | Outputs in JSON |
Hardware
List all the hardware options, optionally filtered by model.Usage
Options
| Options | Argument | Description |
|---|---|---|
--model | TEXT | Filter hardware options by model |
--json | Print output in JSON format | |
--available | Print only available hardware options (can only be used if model is passed in) |
Retrieve
Print details for a specific endpoint.Usage
Shell
Options
| Options | Description |
|---|---|
--json | Print output in JSON format |
Update
Update an existing endpoint by listing the changes followed by the endpoint ID. You can find the endpoint ID by listing your dedicated endpoints.Usage
Shell
Options
Note: Both--min-replicas and --max-replicas must be specified together
| Options | Description |
|---|---|
--display-name - TEXT | A new human-readable name for the endpoint |
--min-replicas - INTEGER | New minimum number of replicas to maintain |
--max-replicas - INTEGER | New maximum number of replicas to scale up to |
Start
Start a dedicated inference endpoint.Usage
Shell
Options
| Options | Description |
|---|---|
--wait | Wait for the endpoint to start |
Stop
Stop a dedicated inference endpoint.Usage
Shell
Options
| Options | Description |
|---|---|
--wait | Wait for the endpoint to stop |
Delete
Delete a dedicated inference endpoint.Usage
Shell
List
Usage
Shell
Options
| Options | Description |
|---|---|
--json | Print output in JSON format |
type [dedicated | serverless] | Filter by endpoint type |