For real-time applications where time-to-first-byte (TTFB) is critical, use streaming mode. Streaming returns a sequence of server-sent events containing base64-encoded audio chunks, so playback can start before generation finishes. For the lowest possible interactive latency (and bidirectional text input), see the WebSocket API.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Streaming audio
Streaming response format
Whenstream: true, the API returns a stream of server-sent events.
Audio chunk:
alignment=word):
When streaming is enabled, only
raw (PCM) format is supported. For non-streaming requests, you can use mp3, wav, or raw.Output raw bytes
If you want to extract raw audio bytes (for example, to feed into a custom audio pipeline), use the settings below.test2.pcm file.
See also
- Text-to-speech overview for parameters, response formats, voices, and pricing.
- WebSocket API for the lowest-latency, bidirectional streaming option.