For real-time applications where time-to-first-byte (TTFB) is critical, use streaming mode. Streaming returns a sequence of server-sent events containing base64-encoded audio chunks, so playback can start before generation finishes.
For the lowest possible interactive latency (and bidirectional text input), see the WebSocket API.
Streaming audio
from together import Together
client = Together()
# Save the streamed audio to a file
with client.audio.speech.with_streaming_response.create(
model="canopylabs/orpheus-3b-0.1-ft",
input="The quick brown fox jumps over the lazy dog",
voice="tara",
stream=True,
response_format="raw", # Required for streaming
response_encoding="pcm_s16le", # 16-bit PCM for clean audio
) as response:
response.stream_to_file("speech_streaming.pcm")
When stream: true, the API returns a stream of server-sent events.
Audio chunk:
data: {"type":"conversation.item.audio_output.delta","item_id":"tts_1","delta":"<base64_encoded_audio>"}
Word timestamps (when alignment=word):
data: {"type":"conversation.item.word_timestamps","words":["Hello","world"],"start_seconds":[0.0,0.4],"end_seconds":[0.4,0.8]}
Stream end:
When streaming is enabled, only raw (PCM) format is supported. For non-streaming requests, you can use mp3, wav, or raw.
Output raw bytes
If you want to extract raw audio bytes (for example, to feed into a custom audio pipeline), use the settings below.
import requests
import os
url = "https://api.together.ai/v1/audio/speech"
api_key = os.environ.get("TOGETHER_API_KEY")
headers = {"Authorization": f"Bearer {api_key}"}
data = {
"input": "This is a test of raw PCM audio output.",
"voice": "tara",
"response_format": "raw",
"response_encoding": "pcm_s16le",
"sample_rate": 24000,
"stream": False,
"model": "canopylabs/orpheus-3b-0.1-ft",
}
response = requests.post(url, headers=headers, json=data)
with open("output_raw.pcm", "wb") as f:
f.write(response.content)
print(f"Raw PCM audio saved to output_raw.pcm")
print(f" Size: {len(response.content)} bytes")
This writes the raw bytes to a test2.pcm file.
See also