Text-to-speech streaming - Together AI docs

For real-time applications where time-to-first-byte (TTFB) is critical, use streaming mode. Streaming returns a sequence of server-sent events containing base64-encoded audio chunks, so playback can start before generation finishes. For the lowest possible interactive latency (and bidirectional text input), see the WebSocket API.

Streaming audio

from together import Together

client = Together()

# Save the streamed audio to a file
with client.audio.speech.with_streaming_response.create(
    model="canopylabs/orpheus-3b-0.1-ft",
    input="The quick brown fox jumps over the lazy dog",
    voice="tara",
    stream=True,
    response_format="raw",  # Required for streaming
    response_encoding="pcm_s16le",  # 16-bit PCM for clean audio
) as response:
    response.stream_to_file("speech_streaming.pcm")

Streaming response format

When stream: true, the API returns a stream of server-sent events. Audio chunk:

data: {"type":"conversation.item.audio_output.delta","item_id":"tts_1","delta":"<base64_encoded_audio>"}

Word timestamps (when alignment=word):

data: {"type":"conversation.item.word_timestamps","words":["Hello","world"],"start_seconds":[0.0,0.4],"end_seconds":[0.4,0.8]}

Stream end:

data: [DONE]

When streaming is enabled, only raw (PCM) format is supported. For non-streaming requests, you can use mp3, wav, or raw.

Output raw bytes

If you want to extract raw audio bytes (for example, to feed into a custom audio pipeline), use the settings below.

import requests
import os

url = "https://api.together.ai/v1/audio/speech"
api_key = os.environ.get("TOGETHER_API_KEY")

headers = {"Authorization": f"Bearer {api_key}"}

data = {
    "input": "This is a test of raw PCM audio output.",
    "voice": "tara",
    "response_format": "raw",
    "response_encoding": "pcm_s16le",
    "sample_rate": 24000,
    "stream": False,
    "model": "canopylabs/orpheus-3b-0.1-ft",
}

response = requests.post(url, headers=headers, json=data)

with open("output_raw.pcm", "wb") as f:
    f.write(response.content)

print(f"Raw PCM audio saved to output_raw.pcm")
print(f"   Size: {len(response.content)} bytes")

This writes the raw bytes to a test2.pcm file.

Documentation Index

​Streaming audio

​Streaming response format

​Output raw bytes

​See also

Streaming audio

Streaming response format

Output raw bytes

See also