Dedicated Endpoints API
Deploy models as dedicated endpoints using the Together AI API
The Together AI Dedicated Endpoints API allows you to deploy models as dedicated endpoints with custom hardware configurations and autoscaling capabilities.
This guide walks through the key API endpoints for managing dedicated model deployments.
Authentication
All API requests require authentication using your Together API key. Set your API key in the Authorization header:
curl -H "Authorization: Bearer YOUR_API_KEY" <https://api.together.xyz/v1/>...
API Endpoints
1. Models
Before creating a dedicated endpoint, you'll need to select or import a model to deploy.
List Available Models
Lists models that can be deployed as dedicated endpoints.
curl -X GET <https://api.together.xyz/v0/models> \
-H "Authorization: Bearer YOUR_API_KEY"
List Available Models Response
{
"object": "model",
"id": "model-42168e53-cfc5-474e-adb1-06fcd7aba56b",
"name": "meta-llama/Llama-3-70b-chat-hf",
"display_name": "Llama 3 70B Chat",
"owner": {
"user": "together",
"organization": "together.ai"
},
"type": "chat",
"num_parameters": 70000000000,
"context_length": 4096,
"chat_config": {
"chat_template": "...",
"add_generation_prompt": true,
"bos_token": "<s>",
"eos_token": "</s>",
"stop": ["</s>"]
}
}
Create Model Request
Select model a supported model from Together AI.
curl -X POST <https://api.together.xyz/v0/models> \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json"\
-d '{
"model_name": "mistral-large-2407",
"model_source": "mistralai/Mistral-Large-Instruct-2407",
"hf_token": "hf_YIIfXdff2eca3c3ac99fa262b6976",
"description": "Official Mistral Large Instruct Model"
}'
Create Model Response
{
"message": "Processing model weights",
"data": {
"job_id": "job-3231364a-b1ef-41c1-b5f1-42414e70cc4f",
"model_name": "devuser/mistral-large-2407",
"model_source": "mistralai/Mistral-Large-Instruct-2407"
}
}
2. Hardware Configuration
List Available Hardware
Get information about available hardware configurations for your endpoints.
curl -X GET <https://api.together.xyz/v1/hardware> \
-H "Authorization: Bearer YOUR_API_KEY"
List Available Hardware Response
{
"object": "list",
"data": [
{
"object": "hardware",
"name": "2x_nvidia_a100_80gb_sxm",
"pricing": {
"input": 0,
"output": 0,
"cents_per_minute": 5.42
},
"specs": {
"gpu_type": "a100-80gb",
"gpu_link": "sxm",
"gpu_memory": 80,
"gpu_count": 2
},
"updated_at": "2024-01-01T00:00:00Z"
}
]
}
Get Model-Specific Hardware Options
Retrieve hardware configurations available for a specific model
# Request
curl -X GET <https://api.together.xyz/v0/models/{model_id}/hardware> \
-H "Authorization: Bearer YOUR_API_KEY"
Get Model Hardware Response
{
"object": "list",
"data": [
{
"object": "hardware",
"name": "2x_nvidia_a100_80gb_sxm",
"pricing": {
"input": 0,
"output": 0,
"cents_per_minute": 5.42
},
"specs": {
"gpu_type": "a100-80gb",
"gpu_link": "sxm",
"gpu_memory": 80,
"gpu_count": 2
},
"availability": {
"status": "available"
}
}
]
}
3. Endpoints Management
Create Endpoint Request
Create a new dedicated endpoint for your model.
curl -X POST <https://api.together.xyz/v1/endpoints> \
-H "Authorization: Bearer YOUR_API_KEY"
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-3-70b-chat-hf",
"suffix": "prodDeploy7",
"hardware_name": "2x_nvidia_a100_80gb_sxm"
}'
Create Endpoint Response
{
"object": "endpoint",
"id": "endpoint-4c88c864-9f71-4c50-9b32-879001a46511",
"name": "devuser/meta-llama/Llama-3-70b-chat-hf-prodDeploy7",
"model": {
"object": "model",
"id": "model-42168e53-cfc5-474e-adb1-06fcd7aba56b",
"name": "meta-llama/Llama-3-70b-chat-hf"
},
"owner": {
"user": "devuser",
"organization": "together.ai"
},
"deployments": [],
"created_at": "2024-11-01T12:41:33.728Z"
}
Start Endpoint Request
Start a created endpoint with optional autoscaling configuration.
curl -X POST <https://api.together.xyz/v1/endpoints/{endpoint_id}/start> \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"autoscaling": {
"min_replicas": 2,
"max_replicas": 5
}
}'
Start Endpoint Response
{
"object": "endpoint",
"id": "endpoint-4c88c864-9f71-4c50-9b32-879001a46511",
"deployments": [
{
"object": "deployment",
"status": "pending",
"autoscaling": {
"min_replicas": 2,
"max_replicas": 5
},
"hardware": {
"name": "2x_nvidia_a100_80gb_sxm",
"specs": {
"gpu_type": "a100-80gb",
"gpu_count": 2
}
}
}
]
}
Stop Endpoint Request
Stop a running endpoint deployment.
curl -X POST https://api.together.xyz/v1/endpoints/{endpoint_id}/stop \
-H "Authorization: Bearer YOUR_API_KEY"
Stop Endpoint Response
{
"object": "endpoint",
"id": "endpoint-4c88c864-9f71-4c50-9b32-879001a46511",
"deployments": [
{
"object": "deployment",
"status": "stopping",
"autoscaling": {
"min_replicas": 2,
"max_replicas": 5
},
"hardware": {
"name": "2x_nvidia_a100_80gb_sxm",
"specs": {
"gpu_type": "a100-80gb",
"gpu_count": 2
}
}
}
]
}
Delete Endpoint
Remove an endpoint deployment.
curl -X DELETE https://api.together.xyz/v1/endpoints/{endpoint_id} \
-H "Authorization: Bearer YOUR_API_KEY"
Delete Endpoint Response
{
"object": "endpoint",
"id": "endpoint-4c88c864-9f71-4c50-9b32-879001a46511",
"deployments": [
{
"object": "deployment",
"status": "deleting",
"autoscaling": {
"min_replicas": 2,
"max_replicas": 5
},
"hardware": {
"name": "2x_nvidia_a100_80gb_sxm",
"specs": {
"gpu_type": "a100-80gb",
"gpu_count": 2
}
}
}
]
}
List Endpoints
Retrieve a list of all your endpoints.
curl -X GET https://api.together.xyz/v1/endpoints \
-H "Authorization: Bearer YOUR_API_KEY"
List Endpoints Response
{
"object": "list",
"data": [
{
"object": "endpoint",
"id": "endpoint-4c88c864-9f71-4c50-9b32-879001a46511",
"name": "devuser/meta-llama/Llama-3-70b-chat-hf-prodDeploy7",
"model": {
"object": "model",
"id": "model-42168e53-cfc5-474e-adb1-06fcd7aba56b",
"name": "meta-llama/Llama-3-70b-chat-hf"
},
"owner": {
"user": "devuser",
"organization": "together.ai"
},
"deployments": [],
"created_at": "2024-11-01T12:41:33.728Z"
}
]
}
4. Job Management
List Jobs
View all jobs related to your endpoints and model imports.
curl -X GET <https://api.together.xyz/v1/jobs> \
-H "Authorization: Bearer YOUR_API_KEY"
List Jobs Response
{
"type": "model_import",
"job_id": "job-3231364a-b1ef-41c1-b5f1-42414e70cc4f",
"status": "processing",
"status_updates": [
{
"status": "started",
"message": "Starting model import",
"timestamp": "2024-01-01T00:00:00Z"
}
],
"args": {
"model_name": "mistral-large-2407"
},
"created_at": "2024-01-01T00:00:00Z",
"updated_at": "2024-01-01T00:00:01Z"
}
Get Job Status
Check the status of a specific job.
curl -X GET <https://api.together.xyz/v1/jobs/{job_id}> \
-H "Authorization: Bearer YOUR_API_KEY"
Get Job Status Response
{
"object": "list",
"data": [
{
"type": "model_import",
"job_id": "job-3231364a-b1ef-41c1-b5f1-42414e70cc4f",
"status": "completed",
"status_updates": [
{
"status": "completed",
"message": "Model import successful",
"timestamp": "2024-01-01T00:01:00Z"
}
],
"args": {
"model_name": "mistral-large-2407"
},
"created_at": "2024-01-01T00:00:00Z",
"updated_at": "2024-01-01T00:01:00Z"
}
]
}
Error Handling
The API uses standard HTTP response codes and returns error details in JSON format, see below for an example:
{
"error": {
"message": "Internal server error",
"type": "server_error"
}
}
Updated about 1 month ago