REST API
Learn how to use the REST API for inference.
This tutorial covers how to use the REST API to run models. We will be querying the RedPajama-INCITE-7B-Instruct
model to find the capital of France. For the full API reference, go to API Reference.
Prerequisites
Ensure you have curl
installed in your machine. Then launch your terminal and define your Together API key.
TOGETHER_API_KEY="YOUR_API_KEY"
Find your API token in your account settings.
Send the curl
Request
curl
RequestWe're going to send a POST request to api.together.xyz/v1/chat/completions
with a JSON-formatted object that contains the model
(model we want to query), messages
(content to send to the model), and additional parameters such as temperature
(randomness of the result) and max_tokens
(max number of output tokens).
curl -X POST "https://api.together.xyz/v1/chat/completions" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
"messages": [{"role": "user", "content": "Q: The capital of France is?\nA:"}],
"temperature": 0.8, "max_tokens": 1
}'
See the API Reference for all the possible parameters we can include. Also, you can find a full list of all the models offered here.
Output
{
"id": "85fd85280d4a8c54-EWR",
"object": "chat.completion",
"created": 1709677508,
"model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
"prompt": [],
"choices": [
{
"finish_reason": "length",
"logprobs": null,
"index": 0,
"message": {
"role": "assistant",
"content": "A"
}
}
],
"usage": {
"prompt_tokens": 12,
"completion_tokens": 1,
"total_tokens": 13
}
}
The responseβs output
key contains the output of the model. The choices
array contains the N-best responses from the model. In this example, the API returned one choice with the completion, βParisβ.
Streaming tokens
If you want to stream the response back, add stream_tokens: true
.
curl -X POST "https://api.together.xyz/v1/chat/completions" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "mistralai/Mixtral-8x7B-Instruct-v0.1",
"messages": [{"role": "user", "content": "Q: Tell me fun things to do in NYC"}],
"temperature": 0.8, "max_tokens": 25, "stream_tokens": true
}'
Streaming output
Each event is encoded as a Server-Sent Event with a JSON-encoded payload instead of a single JSON response. For example, the curl command above might see the following events:
data: {"choices":[{"index":0,"delta":{"content":" A"}}],"id":"85ffbb8a6d2c4340-EWR","token":{"id":330,"text":" A","logprob":1,"special":false},"finish_reason":null,"generated_text":null,"stats":null,"usage":null,"created":1709700707,"object":"chat.completion.chunk"}
data: {"choices":[{"index":0,"delta":{"content":":"}}],"id":"85ffbb8a6d2c4340-EWR","token":{"id":28747,"text":":","logprob":0,"special":false},"finish_reason":null,"generated_text":null,"stats":null,"usage":null,"created":1709700707,"object":"chat.completion.chunk"}
data: {"choices":[{"index":0,"delta":{"content":" Sure"}}],"id":"85ffbb8a6d2c4340-EWR","token":{"id":12875,"text":" Sure","logprob":-0.00724411,"special":false},"finish_reason":null,"generated_text":null,"stats":null,"usage":null,"created":1709700707,"object":"chat.completion.chunk"}
Updated 6 months ago