Create
Create a new dedicated inference endpoint.Usage
Shell
Example
Shell
Options
Options | Description |
---|---|
--model - TEXT | (required) The model to deploy |
--gpu [ h100 | a100 | l40 | l40s | rtx-6000] | (required) GPU type to use for inference |
--min-replicas - INTEGER | Minimum number of replicas to deploy |
--max-replicas - INTEGER | Maximum number of replicas to deploy |
--gpu-count - INTEGER | Number of GPUs to use per replica |
--display-name - TEXT | A human-readable name for the endpoint |
--no-prompt-cache | Disable the prompt cache for this endpoint |
--no-speculative-decoding | Disable speculative decoding for this endpoint |
--no-auto-start | Create the endpoint in STOPPED state instead of auto-starting it |
--wait | Wait for the endpoint to be ready after creation |
Hardware
List all the hardware options, optionally filtered by model.Usage
Shell
Example
Shell
Options
Options | Description |
---|---|
--model - TEXT | Filter hardware options by model |
--json | Print output in JSON format |
--available | Print only available hardware options (can only be used if model is passed in) |
Get
Print details for a specific endpoint.Usage
Shell
Example
Shell
Options
Options | Description |
---|---|
--json | Print output in JSON format |
Update
Update an existing endpoint by listing the changes followed by the endpoint ID. You can find the endpoint ID by listing your dedicated endpoints.Usage
Shell
Example
Shell
Options
Note: Both--min-replicas
and --max-replicas
must be specified together
Options | Description |
---|---|
--display-name - TEXT | A new human-readable name for the endpoint |
--min-replicas - INTEGER | New minimum number of replicas to maintain |
--max-replicas - INTEGER | New maximum number of replicas to scale up to |
Start
Start a dedicated inference endpoint.Usage
Shell
Example
Shell
Options
Options | Description |
---|---|
--wait | Wait for the endpoint to start |
Stop
Stop a dedicated inference endpoint.Usage
Shell
Example
Shell
Options
Options | Description |
---|---|
--wait | Wait for the endpoint to stop |
Update
Usage
Update an existing endpoint by listing the changes followed by the endpoint ID. You can find the endpoint ID by listing your dedicated endpointsShell
Example
Shell
Options
Note: Both--min-replicas
and --max-replicas
must be specified together
Options | Description |
---|---|
--display-name - TEXT | A new human-readable name for the endpoint |
--min-replicas - INTEGER | New minimum number of replicas to maintain |
--max-replicas - INTEGER | New maximum number of replicas to scale up to |
Delete
Delete a dedicated inference endpoint.Usage
Shell
Example
Shell
List
Usage
Shell
Example
Shell
Options
Options | Description |
---|---|
--json | Print output in JSON format |
type [dedicated | serverless] | Filter by endpoint type |
Help
See all commands withShell