Create A Dedicated Endpoint

# Docs for v2 can be found by changing the above selector ^
from together import Together
import os

client = Together(
    api_key=os.environ.get("TOGETHER_API_KEY"),
)

endpoint = client.endpoints.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    hardware="1x_nvidia_a100_80gb_sxm",
    min_replicas=2,
    max_replicas=5,
)

print(endpoint.id)

{
  "object": "endpoint",
  "id": "endpoint-d23901de-ef8f-44bf-b3e7-de9c1ca8f2d7",
  "name": "devuser/meta-llama/Llama-3-8b-chat-hf-a32b82a1",
  "display_name": "My Llama3 70b endpoint",
  "model": "meta-llama/Llama-3-8b-chat-hf",
  "hardware": "1x_nvidia_a100_80gb_sxm",
  "type": "dedicated",
  "owner": "devuser",
  "state": "STARTED",
  "autoscaling": {
    "min_replicas": 123,
    "max_replicas": 123
  },
  "created_at": "2025-02-04T10:43:55.405Z"
}

POST

endpoints

# Docs for v2 can be found by changing the above selector ^
from together import Together
import os

client = Together(
    api_key=os.environ.get("TOGETHER_API_KEY"),
)

endpoint = client.endpoints.create(
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo",
    hardware="1x_nvidia_a100_80gb_sxm",
    min_replicas=2,
    max_replicas=5,
)

print(endpoint.id)

{
  "object": "endpoint",
  "id": "endpoint-d23901de-ef8f-44bf-b3e7-de9c1ca8f2d7",
  "name": "devuser/meta-llama/Llama-3-8b-chat-hf-a32b82a1",
  "display_name": "My Llama3 70b endpoint",
  "model": "meta-llama/Llama-3-8b-chat-hf",
  "hardware": "1x_nvidia_a100_80gb_sxm",
  "type": "dedicated",
  "owner": "devuser",
  "state": "STARTED",
  "autoscaling": {
    "min_replicas": 123,
    "max_replicas": 123
  },
  "created_at": "2025-02-04T10:43:55.405Z"
}

Authorizations

Authorization

string

header

default:default

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

model

string

required

The model to deploy on this endpoint

Example:

"meta-llama/Llama-3-8b-chat-hf"

hardware

string

required

The hardware configuration to use for this endpoint

Example:

"1x_nvidia_a100_80gb_sxm"

autoscaling

object

required

Configuration for automatic scaling of the endpoint

Show child attributes

display_name

string

A human-readable name for the endpoint

Example:

"My Llama3 70b endpoint"

disable_prompt_cache

boolean

default:false

Whether to disable the prompt cache for this endpoint

disable_speculative_decoding

boolean

default:false

Whether to disable speculative decoding for this endpoint

state

enum<string>

default:STARTED

The desired state of the endpoint

Available options:

STARTED,

STOPPED

Example:

"STARTED"

inactive_timeout

integer | null

The number of minutes of inactivity after which the endpoint will be automatically stopped. Set to null, omit or set to 0 to disable automatic timeout.

Example:

60

availability_zone

string

Create the endpoint in a specified availability zone (e.g., us-central-4b)

Response

200

Details about a dedicated endpoint deployment

object

enum<string>

required

The type of object

Available options:

endpoint

Example:

"endpoint"

string

required

Unique identifier for the endpoint

Example:

"endpoint-d23901de-ef8f-44bf-b3e7-de9c1ca8f2d7"

name

string

required

System name for the endpoint

Example:

"devuser/meta-llama/Llama-3-8b-chat-hf-a32b82a1"

display_name

string

required

Human-readable name for the endpoint

Example:

"My Llama3 70b endpoint"

model

string

required

The model deployed on this endpoint

Example:

"meta-llama/Llama-3-8b-chat-hf"

hardware

string

required

The hardware configuration used for this endpoint

Example:

"1x_nvidia_a100_80gb_sxm"

type

enum<string>

required

The type of endpoint

Available options:

dedicated

Example:

"dedicated"

owner

string

required

The owner of this endpoint

Example:

"devuser"

state

enum<string>

required

Current state of the endpoint

Available options:

PENDING,

STARTING,

STARTED,

STOPPING,

STOPPED,

ERROR

Example:

"STARTED"

autoscaling

object

required

Configuration for automatic scaling of the endpoint

Show child attributes

created_at

string<date-time>

required

Timestamp when the endpoint was created

Example:

"2025-02-04T10:43:55.405Z"

List Available Hardware Configurations

Get Endpoint By ID

⌘I

Together APIs

Command Line Interface

General

Authorizations

Body

Response