Skip to main content
POST
/
endpoints
# Docs for v2 can be found by changing the above selector ^
from together import Together
import os

client = Together(
api_key=os.environ.get("TOGETHER_API_KEY"),
)

endpoint = client.endpoints.create(
model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
hardware="1x_nvidia_a100_80gb_sxm",
min_replicas=2,
max_replicas=5,
)

print(endpoint.id)
{
  "object": "endpoint",
  "id": "endpoint-d23901de-ef8f-44bf-b3e7-de9c1ca8f2d7",
  "name": "devuser/meta-llama/Llama-3-8b-chat-hf-a32b82a1",
  "display_name": "My Llama3 70b endpoint",
  "model": "meta-llama/Llama-3-8b-chat-hf",
  "hardware": "1x_nvidia_a100_80gb_sxm",
  "type": "dedicated",
  "owner": "devuser",
  "state": "STARTED",
  "autoscaling": {
    "min_replicas": 123,
    "max_replicas": 123
  },
  "created_at": "2025-02-04T10:43:55.405Z"
}

Authorizations

Authorization
string
header
default:default
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
model
string
required

The model to deploy on this endpoint

hardware
string
required

The hardware configuration to use for this endpoint

autoscaling
object
required

Configuration for automatic scaling of the endpoint

display_name
string

A human-readable name for the endpoint

disable_prompt_cache
boolean
default:false

Whether to disable the prompt cache for this endpoint

disable_speculative_decoding
boolean
default:false

Whether to disable speculative decoding for this endpoint

state
enum<string>
default:STARTED

The desired state of the endpoint

Available options:
STARTED,
STOPPED
Example:

"STARTED"

inactive_timeout
integer | null

The number of minutes of inactivity after which the endpoint will be automatically stopped. Set to null, omit or set to 0 to disable automatic timeout.

Example:

60

availability_zone
string

Create the endpoint in a specified availability zone (e.g., us-central-4b)

Response

200

Details about a dedicated endpoint deployment

object
enum<string>
required

The type of object

Available options:
endpoint
Example:

"endpoint"

id
string
required

Unique identifier for the endpoint

Example:

"endpoint-d23901de-ef8f-44bf-b3e7-de9c1ca8f2d7"

name
string
required

System name for the endpoint

Example:

"devuser/meta-llama/Llama-3-8b-chat-hf-a32b82a1"

display_name
string
required

Human-readable name for the endpoint

Example:

"My Llama3 70b endpoint"

model
string
required

The model deployed on this endpoint

Example:

"meta-llama/Llama-3-8b-chat-hf"

hardware
string
required

The hardware configuration used for this endpoint

Example:

"1x_nvidia_a100_80gb_sxm"

type
enum<string>
required

The type of endpoint

Available options:
dedicated
Example:

"dedicated"

owner
string
required

The owner of this endpoint

Example:

"devuser"

state
enum<string>
required

Current state of the endpoint

Available options:
PENDING,
STARTING,
STARTED,
STOPPING,
STOPPED,
ERROR
Example:

"STARTED"

autoscaling
object
required

Configuration for automatic scaling of the endpoint

created_at
string<date-time>
required

Timestamp when the endpoint was created

Example:

"2025-02-04T10:43:55.405Z"