Skip to main content
POST
/
audio
/
transcriptions
from together import Together
import os

client = Together(
api_key=os.environ.get("TOGETHER_API_KEY"),
)

file = open("audio.wav", "rb")

response = client.audio.transcriptions.create(
model="openai/whisper-large-v3",
file=file,
)

print(response.text)
{
  "text": "Hello, world!"
}

Authorizations

Authorization
string
header
default:default
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
file
required

Audio file upload or public HTTP/HTTPS URL. Supported formats .wav, .mp3, .m4a, .webm, .flac. Audio file to transcribe

model
enum<string>
default:openai/whisper-large-v3

Model to use for transcription

Available options:
openai/whisper-large-v3
language
string
default:en

Optional ISO 639-1 language code. If auto is provided, language is auto-detected.

Example:

"en"

prompt
string

Optional text to bias decoding.

response_format
enum<string>
default:json

The format of the response

Available options:
json,
verbose_json
temperature
number
default:0

Sampling temperature between 0.0 and 1.0

Required range: 0 <= x <= 1
timestamp_granularities
default:segment

Controls level of timestamp detail in verbose_json. Only used when response_format is verbose_json. Can be a single granularity or an array to get multiple levels.

Available options:
segment,
word
Example:
["word", "segment"]
diarize
boolean
default:false

Whether to enable speaker diarization. When enabled, you will get the speaker id for each word in the transcription. In the response, in the words array, you will get the speaker id for each word. In addition, we also return the speaker_segments array which contains the speaker id for each speaker segment along with the start and end time of the segment along with all the words in the segment. <br> <br> For eg - ... "speaker_segments": [ "speaker_id": "SPEAKER_00", "start": 0, "end": 30.02, "words": [ { "id": 0, "word": "Tijana", "start": 0, "end": 11.475, "speaker_id": "SPEAKER_00" }, ...

Response

OK

  • Option 1
  • Option 2
text
string
required

The transcribed text

Example:

"Hello, world!"