Dedicated Endpoints API

Deploy models as dedicated endpoints using the Together AI API

The Together AI Dedicated Endpoints API allows you to deploy models as dedicated endpoints with custom hardware configurations and autoscaling capabilities.

This guide walks through the key API endpoints for managing dedicated model deployments.

Authentication

All API requests require authentication using your Together API key. Set your API key in the Authorization header:

curl -H "Authorization: Bearer YOUR_API_KEY" <https://api.together.xyz/v1/>...

API Endpoints

1. Models

Before creating a dedicated endpoint, you'll need to select or import a model to deploy.

List Available Models

Lists models that can be deployed as dedicated endpoints.

curl -X GET <https://api.together.xyz/v0/models> \
  -H "Authorization: Bearer YOUR_API_KEY"

List Available Models Response

{
  "object": "model",
  "id": "model-42168e53-cfc5-474e-adb1-06fcd7aba56b",
  "name": "meta-llama/Llama-3-70b-chat-hf",
  "display_name": "Llama 3 70B Chat",
  "owner": {
    "user": "together",
    "organization": "together.ai"
  },
  "type": "chat",
  "num_parameters": 70000000000,
  "context_length": 4096,
  "chat_config": {
    "chat_template": "...",
    "add_generation_prompt": true,
    "bos_token": "<s>",
    "eos_token": "</s>",
    "stop": ["</s>"]
  }
}

Create Model Request

Select model a supported model from Together AI.

curl -X POST <https://api.together.xyz/v0/models> \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json"\
  -d '{
    "model_name": "mistral-large-2407",
    "model_source": "mistralai/Mistral-Large-Instruct-2407",
    "hf_token": "hf_YIIfXdff2eca3c3ac99fa262b6976",
    "description": "Official Mistral Large Instruct Model"
  }'

Create Model Response

{
  "message": "Processing model weights",
  "data": {
    "job_id": "job-3231364a-b1ef-41c1-b5f1-42414e70cc4f",
    "model_name": "devuser/mistral-large-2407",
    "model_source": "mistralai/Mistral-Large-Instruct-2407"
  }
}

2. Hardware Configuration

List Available Hardware

Get information about available hardware configurations for your endpoints.

curl -X GET <https://api.together.xyz/v1/hardware> \
  -H "Authorization: Bearer YOUR_API_KEY"

List Available Hardware Response

{
  "object": "list",
  "data": [
    {
      "object": "hardware",
      "name": "2x_nvidia_a100_80gb_sxm",
      "pricing": {
        "input": 0,
        "output": 0,
        "cents_per_minute": 5.42
      },
      "specs": {
        "gpu_type": "a100-80gb",
        "gpu_link": "sxm",
        "gpu_memory": 80,
        "gpu_count": 2
      },
      "updated_at": "2024-01-01T00:00:00Z"
    }
  ]
}

Get Model-Specific Hardware Options

Retrieve hardware configurations available for a specific model

# Request
curl -X GET <https://api.together.xyz/v0/models/{model_id}/hardware> \
  -H "Authorization: Bearer YOUR_API_KEY"

Get Model Hardware Response

{
  "object": "list",
  "data": [
    {
      "object": "hardware",
      "name": "2x_nvidia_a100_80gb_sxm",
      "pricing": {
        "input": 0,
        "output": 0,
        "cents_per_minute": 5.42
      },
      "specs": {
        "gpu_type": "a100-80gb",
        "gpu_link": "sxm",
        "gpu_memory": 80,
        "gpu_count": 2
      },
      "availability": {
        "status": "available"
      }
    }
  ]
}

3. Endpoints Management

Create Endpoint Request

Create a new dedicated endpoint for your model.

curl -X POST <https://api.together.xyz/v1/endpoints> \
  -H "Authorization: Bearer YOUR_API_KEY" 
  -H "Content-Type: application/json" \
  -d '{
    "model": "meta-llama/Llama-3-70b-chat-hf",
    "suffix": "prodDeploy7",
    "hardware_name": "2x_nvidia_a100_80gb_sxm"
  }'

Create Endpoint Response

{
  "object": "endpoint",
  "id": "endpoint-4c88c864-9f71-4c50-9b32-879001a46511",
  "name": "devuser/meta-llama/Llama-3-70b-chat-hf-prodDeploy7",
  "model": {
    "object": "model",
    "id": "model-42168e53-cfc5-474e-adb1-06fcd7aba56b",
    "name": "meta-llama/Llama-3-70b-chat-hf"
  },
  "owner": {
    "user": "devuser",
    "organization": "together.ai"
  },
  "deployments": [],
  "created_at": "2024-11-01T12:41:33.728Z"
}

Start Endpoint Request

Start a created endpoint with optional autoscaling configuration.

curl -X POST <https://api.together.xyz/v1/endpoints/{endpoint_id}/start> \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "autoscaling": {
      "min_replicas": 2,
      "max_replicas": 5
    }
  }'

Start Endpoint Response

{
  "object": "endpoint",
  "id": "endpoint-4c88c864-9f71-4c50-9b32-879001a46511",
  "deployments": [
    {
      "object": "deployment",
      "status": "pending",
      "autoscaling": {
        "min_replicas": 2,
        "max_replicas": 5
      },
      "hardware": {
        "name": "2x_nvidia_a100_80gb_sxm",
        "specs": {
          "gpu_type": "a100-80gb",
          "gpu_count": 2
        }
      }
    }
  ]
}

Stop Endpoint Request

Stop a running endpoint deployment.

curl -X POST https://api.together.xyz/v1/endpoints/{endpoint_id}/stop \
  -H "Authorization: Bearer YOUR_API_KEY"

Stop Endpoint Response

{
  "object": "endpoint",
  "id": "endpoint-4c88c864-9f71-4c50-9b32-879001a46511",
  "deployments": [
    {
      "object": "deployment",
      "status": "stopping",
      "autoscaling": {
        "min_replicas": 2,
        "max_replicas": 5
      },
      "hardware": {
        "name": "2x_nvidia_a100_80gb_sxm",
        "specs": {
          "gpu_type": "a100-80gb",
          "gpu_count": 2
        }
      }
    }
  ]
}

Delete Endpoint

Remove an endpoint deployment.

curl -X DELETE https://api.together.xyz/v1/endpoints/{endpoint_id} \
  -H "Authorization: Bearer YOUR_API_KEY"

Delete Endpoint Response

{
  "object": "endpoint",
  "id": "endpoint-4c88c864-9f71-4c50-9b32-879001a46511",
  "deployments": [
    {
      "object": "deployment",
      "status": "deleting",
      "autoscaling": {
        "min_replicas": 2,
        "max_replicas": 5
      },
      "hardware": {
        "name": "2x_nvidia_a100_80gb_sxm",
        "specs": {
          "gpu_type": "a100-80gb",
          "gpu_count": 2
        }
      }
    }
  ]
}

List Endpoints

Retrieve a list of all your endpoints.

curl -X GET https://api.together.xyz/v1/endpoints \
  -H "Authorization: Bearer YOUR_API_KEY"

List Endpoints Response

{
  "object": "list",
  "data": [
    {
      "object": "endpoint",
      "id": "endpoint-4c88c864-9f71-4c50-9b32-879001a46511",
      "name": "devuser/meta-llama/Llama-3-70b-chat-hf-prodDeploy7",
      "model": {
        "object": "model",
        "id": "model-42168e53-cfc5-474e-adb1-06fcd7aba56b",
        "name": "meta-llama/Llama-3-70b-chat-hf"
      },
      "owner": {
        "user": "devuser",
        "organization": "together.ai"
      },
      "deployments": [],
      "created_at": "2024-11-01T12:41:33.728Z"
    }
  ]
}

4. Job Management

List Jobs

View all jobs related to your endpoints and model imports.

curl -X GET <https://api.together.xyz/v1/jobs> \
  -H "Authorization: Bearer YOUR_API_KEY"

List Jobs Response

{
  "type": "model_import",
  "job_id": "job-3231364a-b1ef-41c1-b5f1-42414e70cc4f",
  "status": "processing",
  "status_updates": [
    {
      "status": "started",
      "message": "Starting model import",
      "timestamp": "2024-01-01T00:00:00Z"
    }
  ],
  "args": {
    "model_name": "mistral-large-2407"
  },
  "created_at": "2024-01-01T00:00:00Z",
  "updated_at": "2024-01-01T00:00:01Z"
}

Get Job Status

Check the status of a specific job.

curl -X GET <https://api.together.xyz/v1/jobs/{job_id}> \
  -H "Authorization: Bearer YOUR_API_KEY"

Get Job Status Response

{
  "object": "list",
  "data": [
    {
      "type": "model_import",
      "job_id": "job-3231364a-b1ef-41c1-b5f1-42414e70cc4f",
      "status": "completed",
      "status_updates": [
        {
          "status": "completed",
          "message": "Model import successful",
          "timestamp": "2024-01-01T00:01:00Z"
        }
      ],
      "args": {
        "model_name": "mistral-large-2407"
      },
      "created_at": "2024-01-01T00:00:00Z",
      "updated_at": "2024-01-01T00:01:00Z"
    }
  ]
}

Error Handling

The API uses standard HTTP response codes and returns error details in JSON format, see below for an example:

{
  "error": {
    "message": "Internal server error",
    "type": "server_error"
  }
}