Rest API

Reference this guide to learn how to run inference using the Rest API.

In this tutorial, we will teach you how to use the REST API to run models. We will be querying the RedPajama-INCITE-7B-Instruct model to find the capital of France.



For the full API reference go to API Reference.



Pre-requisites

  • Ensure you have curl installed in your machine
  • You will need to create a free account with together.ai to obtain a Together API Key.

Define the Endpoint URL and API Key

Launch your terminal. Define the endpoint URL and the API key for authentication

ENDPOINT_URL="https://api.together.xyz/inference"
API_KEY="YOUR_API_KEY"

Find your API token in your account settings

Create your JSON-Formatted Object

The input to the API is a JSON-formatted object with all the request parameters.

  • The model field in the request object specifies the model that you’d like to query. Specify togethercomputer/RedPajama-INCITE-7B-Instruct for this request.
  • The prompt field contains a prompt for the model to complete. In this tutorial, you will use the prompt “The capital of France is “ and will query the model for the most likely token that follows the prompt.
  • The request object may contain a number of parameters to control its output (see the API reference). Set the temperature to 0.8 and max_tokens to 1. With max_tokens set to 1, model will stop generating text after only one token.
{
  "model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
  "prompt": "Q: The capital of France is?\nA:",
  "temperature": 0.7,
  "top_p": 0.7,
  "top_k": 50,
  "max_tokens": 1,
  "repetition_penalty": 1
}

Create the curl Request

To retrieve the capital of Paris, issue the following curl command, inserting your JSON-formatted object in -d below.

curl -X POST "$ENDPOINT_URL" \
     -H "Authorization: Bearer $API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"model": "togethercomputer/RedPajama-INCITE-7B-Instruct", "prompt": "Q: The capital of France is?\nA:", "temperature": 0.8, "top_p": 0.7, "top_k": 50, "max_tokens": 1, "repetition_penalty": 1}'

Output

Your output should contain the input prompt, arguments, and model output:

{
  "status": "finished",
  "prompt": [
    "Q: The capital of France is?\nA:"
  ],
  "model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
  "model_owner": "",
  "tags": {},
  "num_returns": 1,
  "args": {
    "model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
    "prompt": "Q: The capital of France is?\nA:",
    "temperature": 0.8,
    "top_p": 0.7,
    "top_k": 50,
    "max_tokens": 1,
    "repetition_penalty": 1
  },
  "subjobs": [],
  "output": {
    "choices": [
      {
        "finish_reason": "length",
        "index": 0,
        "text": " Paris"
      }
    ],
    "raw_compute_time": 0.03950854716822505,
    "result_type": "language-model-inference"
  }
}

The response’s output key contains the output of the model. The choices array contains the N-best responses from the model. In this example, the API returned one choice with the completion, “Paris”.