Authorizations
Bearer authentication header of the form Bearer <token>
, where <token>
is your auth token.
Body
The model to deploy on this endpoint
"meta-llama/Llama-3-8b-chat-hf"
The hardware configuration to use for this endpoint
"1x_nvidia_a100_80gb_sxm"
Configuration for automatic scaling of the endpoint
A human-readable name for the endpoint
"My Llama3 70b endpoint"
Whether to disable the prompt cache for this endpoint
Whether to disable speculative decoding for this endpoint
The desired state of the endpoint
STARTED
, STOPPED
"STARTED"
The number of minutes of inactivity after which the endpoint will be automatically stopped. Set to null, omit or set to 0 to disable automatic timeout.
60
Response
200
Details about a dedicated endpoint deployment
The type of object
endpoint
"endpoint"
Unique identifier for the endpoint
"endpoint-d23901de-ef8f-44bf-b3e7-de9c1ca8f2d7"
System name for the endpoint
"devuser/meta-llama/Llama-3-8b-chat-hf-a32b82a1"
Human-readable name for the endpoint
"My Llama3 70b endpoint"
The model deployed on this endpoint
"meta-llama/Llama-3-8b-chat-hf"
The hardware configuration used for this endpoint
"1x_nvidia_a100_80gb_sxm"
The type of endpoint
dedicated
"dedicated"
The owner of this endpoint
"devuser"
Current state of the endpoint
PENDING
, STARTING
, STARTED
, STOPPING
, STOPPED
, ERROR
"STARTED"
Configuration for automatic scaling of the endpoint
Timestamp when the endpoint was created
"2025-02-04T10:43:55.405Z"