Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
The name of the model to query.<br> <br> See all of Together AI's chat models The current supported tts models are: - cartesia/sonic - hexgrad/Kokoro-82M - canopylabs/orpheus-3b-0.1-ft
cartesia/sonic, hexgrad/Kokoro-82M, canopylabs/orpheus-3b-0.1-ft "canopylabs/orpheus-3b-0.1-ft"
Input text to generate the audio for
The voice to use for generating the audio. The voices supported are different for each model. For eg - for canopylabs/orpheus-3b-0.1-ft, one of the voices supported is tara, for hexgrad/Kokoro-82M, one of the voices supported is af_alloy and for cartesia/sonic, one of the voices supported is "friendly sidekick". <br> <br> You can view the voices supported for each model using the /v1/voices endpoint sending the model name as the query parameter. View all supported voices here.
The format of audio output. Supported formats are mp3, wav, raw if streaming is false. If streaming is true, the only supported format is raw.
mp3, wav, raw Language of input text.
en, de, fr, es, hi, it, ja, ko, nl, pl, pt, ru, sv, tr, zh Audio encoding of response
pcm_f32le, pcm_s16le, pcm_mulaw, pcm_alaw Sampling rate to use for the output audio. The default sampling rate for canopylabs/orpheus-3b-0.1-ft and hexgrad/Kokoro-82M is 24000 and for cartesia/sonic is 44100.
If true, output is streamed for several characters at a time instead of waiting for the full response. The stream terminates with data: [DONE]. If false, return the encoded audio as octet stream
Response
OK
The response is of type file.