Streaming tokens
Reference this guide to learn how to enable streaming tokens in the Together API.
Displaying tokens as the model generates them can often provide a better user experience because users receive feedback much faster. By the end of this tutorial, you should understand how to enable streaming tokens in the Together API.
Enable streaming responses
Use the stream_tokens
parameter to enable streaming responses.
When stream_tokens
is true, the API returns events as it generates the response instead of waiting for the entire response first.
For example, the curl command below sets "stream_tokens" to true in the request payload.
curl 'https://api.together.xyz/inference' -X POST \
-H 'Authorization: Bearer API_KEY' \
-H 'Content-Type: application/json' \
-d @- <<'EOF'
{
"model": "togethercomputer/RedPajama-INCITE-7B-Instruct",
"prompt": "Alan Turing was",
"max_tokens": 128,
"stop":["\n\n"],
"temperature":0.7,
"top_p":0.7,
"top_k":50,
"repetition_penalty": 1,
"stream_tokens": true
}
EOF
- Replace
API_KEY
with your API key in user settings.
Parse the Server-Sent Events from the API
Each event is encoded as a Server-Sent Event with a JSON-encoded payload instead of a single JSON response. For example, the curl command above might see the following events:
data: {"choices":[{"text":" a"}],"result_type":"language-model-inference","id":"e235a724408a86a5f408f437ca26239ddadf509e9dfdf359a645db08ee9a8682"}
data: {"choices":[{"text":" brilliant"}],"result_type":"language-model-inference","id":"e235a724408a86a5f408f437ca26239ddadf509e9dfdf359a645db08ee9a8682"}
data: {"choices":[{"text":" mathematic"}],"result_type":"language-model-inference","id":"e235a724408a86a5f408f437ca26239ddadf509e9dfdf359a645db08ee9a8682"}
Parse the final message
The final message is the string [DONE]
and is not encoded as a server-sent event. Be sure to parse it properly instead of treating it as a server-sent event.
Sample code
Putting it all together, the following Python script demonstrates how you might process streaming results using the Python requests
and sseclient-py
packages.
- Install dependencies
pip install requests sseclient-py
- Run the following sample code
import json
import os
import requests
import sseclient
url = "https://api.together.xyz/inference"
model = "togethercomputer/RedPajama-INCITE-7B-Chat"
prompt = "Tell me a story\n\n"
print(f"Model: {model}")
print(f"Prompt: {repr(prompt)}")
print("Repsonse:")
print()
payload = {
"model": model,
"prompt": prompt,
"max_tokens": 512,
"temperature": 0.7,
"top_p": 0.7,
"top_k": 50,
"repetition_penalty": 1,
"stream_tokens": True,
}
headers = {
"accept": "application/json",
"content-type": "application/json",
"Authorization": f"Bearer {os.environ['TOGETHER_API_KEY']}",
}
response = requests.post(url, json=payload, headers=headers, stream=True)
response.raise_for_status()
client = sseclient.SSEClient(response)
for event in client.events():
if event.data == "[DONE]":
break
partial_result = json.loads(event.data)
token = partial_result["choices"][0]["text"]
print(token, end="", flush=True)
Updated 5 months ago