Rest API
Reference this guide to learn how to run inference using the Rest API.
In this tutorial, we will teach you how to use the REST API to run models. We will be querying the RedPajama-INCITE-7B-Instruct
model to find the capital of France.
For the full API reference go to API Reference.
Pre-requisites
- Ensure you have
curl
installed in your machine - You will need to create a free account with together.ai to obtain a Together API Key.
Define the Endpoint URL and API Key
Launch your terminal. Define the endpoint URL and the API key for authentication
ENDPOINT_URL="https://api.together.xyz/inference"
API_KEY="YOUR_API_KEY"
Find your API token in your account settings
Create your JSON-Formatted Object
The input to the API is a JSON-formatted object with all the request parameters.
- The
model
field in the request object specifies the model that you’d like to query. Specifytogethercomputer/RedPajama-INCITE-7B-Instruct
for this request. - The
prompt
field contains a prompt for the model to complete. In this tutorial, you will use the prompt “The capital of France is “ and will query the model for the most likely token that follows the prompt. - The request object may contain a number of parameters to control its output (see the API reference). Set the
temperature
to 0.8 andmax_tokens
to 1. Withmax_tokens
set to 1, model will stop generating text after only one token.
{
"model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
"prompt": "Q: The capital of France is?\nA:",
"temperature": 0.7,
"top_p": 0.7,
"top_k": 50,
"max_tokens": 1,
"repetition_penalty": 1
}
Create the curl
Request
curl
RequestTo retrieve the capital of Paris, issue the following curl command, inserting your JSON-formatted object in -d
below.
curl -X POST "$ENDPOINT_URL" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "togethercomputer/RedPajama-INCITE-7B-Instruct", "prompt": "Q: The capital of France is?\nA:", "temperature": 0.8, "top_p": 0.7, "top_k": 50, "max_tokens": 1, "repetition_penalty": 1}'
Output
Your output should contain the input prompt, arguments, and model output:
{
"status": "finished",
"prompt": [
"Q: The capital of France is?\nA:"
],
"model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
"model_owner": "",
"tags": {},
"num_returns": 1,
"args": {
"model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
"prompt": "Q: The capital of France is?\nA:",
"temperature": 0.8,
"top_p": 0.7,
"top_k": 50,
"max_tokens": 1,
"repetition_penalty": 1
},
"subjobs": [],
"output": {
"choices": [
{
"finish_reason": "length",
"index": 0,
"text": " Paris"
}
],
"raw_compute_time": 0.03950854716822505,
"result_type": "language-model-inference"
}
}
The response’s output
key contains the output of the model. The choices
array contains the N-best responses from the model. In this example, the API returned one choice with the completion, “Paris”.
Updated 24 days ago