REST API

Learn how to use the REST API for inference.

This tutorial covers how to use the REST API to run models. We will be querying the RedPajama-INCITE-7B-Instruct model to find the capital of France. For the full API reference, go to API Reference.

Prerequisites

Ensure you have curl installed in your machine. Then launch your terminal and define your Together API key.

TOGETHER_API_KEY="YOUR_API_KEY"

Find your API token in your account settings.

Send the curl Request

We're going to send a POST request to api.together.xyz/v1/chat/completions with a JSON-formatted object that contains the model (model we want to query), messages (content to send to the model), and additional parameters such as temperature (randomness of the result) and max_tokens (max number of output tokens).

curl -X POST "https://api.together.xyz/v1/chat/completions" \
     -H "Authorization: Bearer $TOGETHER_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
     	 "model": "togethercomputer/RedPajama-INCITE-7B-Instruct", 
         "messages": [{"role": "user", "content": "Q: The capital of France is?\nA:"}], 
         "temperature": 0.8, "max_tokens": 1
        }'

See the API Reference for all the possible parameters we can include. Also, you can find a full list of all the models offered here.

Output

{
  "id": "85fd85280d4a8c54-EWR",
  "object": "chat.completion",
  "created": 1709677508,
  "model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
  "prompt": [],
  "choices": [
    {
      "finish_reason": "length",
      "logprobs": null,
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A"
      }
    }
  ],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 1,
    "total_tokens": 13
  }
}

The response’s output key contains the output of the model. The choices array contains the N-best responses from the model. In this example, the API returned one choice with the completion, β€œParis”.

Streaming tokens

If you want to stream the response back, add stream_tokens: true.

curl -X POST "https://api.together.xyz/v1/chat/completions" \
     -H "Authorization: Bearer $TOGETHER_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{
         "model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
         "messages": [{"role": "user", "content": "Q: Tell me fun things to do in NYC"}],
         "temperature": 0.8, "max_tokens": 25, "stream_tokens": true
        }'

Streaming output

Each event is encoded as a Server-Sent Event with a JSON-encoded payload instead of a single JSON response. For example, the curl command above might see the following events:

data: {"choices":[{"index":0,"delta":{"content":" A"}}],"id":"85ffbb8a6d2c4340-EWR","token":{"id":330,"text":" A","logprob":1,"special":false},"finish_reason":null,"generated_text":null,"stats":null,"usage":null,"created":1709700707,"object":"chat.completion.chunk"}
data: {"choices":[{"index":0,"delta":{"content":":"}}],"id":"85ffbb8a6d2c4340-EWR","token":{"id":28747,"text":":","logprob":0,"special":false},"finish_reason":null,"generated_text":null,"stats":null,"usage":null,"created":1709700707,"object":"chat.completion.chunk"}
data: {"choices":[{"index":0,"delta":{"content":" Sure"}}],"id":"85ffbb8a6d2c4340-EWR","token":{"id":12875,"text":" Sure","logprob":-0.00724411,"special":false},"finish_reason":null,"generated_text":null,"stats":null,"usage":null,"created":1709700707,"object":"chat.completion.chunk"}