POST
/
audio
/
transcriptions
from together import Together
import os

client = Together(
api_key=os.environ.get("TOGETHER_API_KEY"),
)

file = open("audio.wav", "rb")

response = client.audio.transcriptions.create(
model="openai/whisper-large-v3",
file=file,
)

print(response.text)
{
  "text": "Hello, world!"
}

Authorizations

Authorization
string
header
default:default
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data
file
required

Audio file upload or public HTTP/HTTPS URL. Supported formats .wav, .mp3, .m4a, .webm, .flac. Audio file to transcribe

model
enum<string>
default:openai/whisper-large-v3

Model to use for transcription

Available options:
openai/whisper-large-v3
language
string
default:en

Optional ISO 639-1 language code. If auto is provided, language is auto-detected.

Example:

"en"

prompt
string

Optional text to bias decoding.

response_format
enum<string>
default:json

The format of the response

Available options:
json,
verbose_json
temperature
number
default:0

Sampling temperature between 0.0 and 1.0

Required range: 0 <= x <= 1
timestamp_granularities
default:segment

Controls level of timestamp detail in verbose_json. Only used when response_format is verbose_json. Can be a single granularity or an array to get multiple levels.

Available options:
segment,
word
Example:
["word", "segment"]

Response

OK

text
string
required

The transcribed text

Example:

"Hello, world!"