REST API

This tutorial covers how to use the REST API to run models. We will be querying the RedPajama-INCITE-7B-Instruct model to find the capital of France. For the full API reference, go to API Reference.

Prerequisites

Ensure you have curl installed in your machine. Then launch your terminal and define your Together API key.

TOGETHER_API_KEY="YOUR_API_KEY"

Find your API token in your account settings.

Send the `curl` Request

We’re going to send a POST request to api.together.xyz/v1/chat/completions with a JSON-formatted object that contains the model (model we want to query), messages (content to send to the model), and additional parameters such as temperature (randomness of the result) and max_tokens (max number of output tokens).

curl -X POST "https://api.together.xyz/v1/chat/completions" \
     -H "Authorization: Bearer $TOGETHER_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
     	 "model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
         "messages": [{"role": "user", "content": "Q: The capital of France is?\nA:"}],
         "temperature": 0.8, "max_tokens": 1
        }'

See the API Reference for all the possible parameters we can include. Also, you can find a full list of all the models offered here.

Output

{
  "id": "85fd85280d4a8c54-EWR",
  "object": "chat.completion",
  "created": 1709677508,
  "model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
  "prompt": [],
  "choices": [
    {
      "finish_reason": "length",
      "logprobs": null,
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 1,
    "total_tokens": 13
  }
}

The response’s output key contains the output of the model. The choices array contains the N-best responses from the model. In this example, the API returned one choice with the completion, “Paris”.

Streaming tokens

If you want to stream the response back, add stream_tokens: true.

curl -X POST "https://api.together.xyz/v1/chat/completions" \
     -H "Authorization: Bearer $TOGETHER_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
         "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
         "messages": [{"role": "user", "content": "Q: Tell me fun things to do in NYC"}],
         "temperature": 0.8, "max_tokens": 25, "stream_tokens": true
        }'

Streaming output

Each event is encoded as a Server-Sent Event with a JSON-encoded payload instead of a single JSON response. For example, the curl command above might see the following events:

data: {"choices":[{"index":0,"delta":{"content":" A"}}],"id":"85ffbb8a6d2c4340-EWR","token":{"id":330,"text":" A","logprob":1,"special":false},"finish_reason":null,"generated_text":null,"stats":null,"usage":null,"created":1709700707,"object":"chat.completion.chunk"}
data: {"choices":[{"index":0,"delta":{"content":":"}}],"id":"85ffbb8a6d2c4340-EWR","token":{"id":28747,"text":":","logprob":0,"special":false},"finish_reason":null,"generated_text":null,"stats":null,"usage":null,"created":1709700707,"object":"chat.completion.chunk"}
data: {"choices":[{"index":0,"delta":{"content":" Sure"}}],"id":"85ffbb8a6d2c4340-EWR","token":{"id":12875,"text":" Sure","logprob":-0.00724411,"special":false},"finish_reason":null,"generated_text":null,"stats":null,"usage":null,"created":1709700707,"object":"chat.completion.chunk"}

Together APIs

Command Line Interface

General

Prerequisites

Send the `curl` Request

Output

Streaming tokens

Streaming output

Together APIs

Command Line Interface

General

​Prerequisites

​Send the curl Request

​Output

​Streaming tokens

​Streaming output

Prerequisites

Send the `curl` Request

Output

Streaming tokens

Streaming output