Learn how to transcribe and translate audio into text!
Together AI provides comprehensive audio transcription and translation capabilities powered by OpenAI’s Whisper models. This guide covers everything you need to know to integrate speech-to-text functionality into your applications.
First, register for an account to get an API key. New accounts come with $1 to get started.Once you’ve registered, set your account’s API key to an environment variable named TOGETHER_API_KEY:
Shell
Copy
Ask AI
export TOGETHER_API_KEY=xxxxx
Install your preferred library
Together provides an official library for Python and TypeScript:
Copy
Ask AI
pip install together
Run your first transcription
Here’s how to get started with basic transcription and translation:
Copy
Ask AI
from together import Together## Initialize the clientclient = Together()## Basic transcriptionresponse = client.audio.transcriptions.create( file="path/to/audio.mp3", model="openai/whisper-large-v3", language="en")print(response.text)## Basic translationresponse = client.audio.translations.create( file="path/to/foreign_audio.mp3", model="openai/whisper-large-v3")print(response.text)
Use prompts to improve transcription accuracy for specific contexts:
Copy
Ask AI
response = client.audio.transcriptions.create( file="medical_consultation.mp3", model="openai/whisper-large-v3", language="en", prompt="This is a medical consultation discussing patient symptoms, diagnosis, and treatment options.")
response = client.audio.translations.create( file="business_meeting_spanish.mp3", model="openai/whisper-large-v3", prompt="This is a business meeting discussing quarterly sales results.")
JSON Format (Default)Returns only the transcribed/translated text:
Python
Copy
Ask AI
response = client.audio.transcriptions.create( file="audio.mp3", model="openai/whisper-large-v3", response_format="json")print(response.text) # "Hello, this is a test recording."
Verbose JSON FormatReturns detailed information including timestamps:
Copy
Ask AI
response = client.audio.transcriptions.create( file="audio.mp3", model="openai/whisper-large-v3", response_format="verbose_json", timestamp_granularities="segment")## Access segments with timestampsfor segment in response.segments: print(f"[{segment.start:.2f}s - {segment.end:.2f}s]: {segment.text}")
Example Output:
Text
Copy
Ask AI
[0.11s - 10.85s]: Call is now being recorded. Parker Scarves, how may I help you? Online for my wife, and it turns out they shipped the wrong... Oh, I am so sorry, sir. I got it for her birthday, which is tonight, and now I'm not 100% sure what I need to do. Okay, let me see if I can help. Do you have the item number of the Parker Scarves? I don't think so. Call the New Yorker, I... Excellent. What color do...[10.88s - 21.73s]: Blue. The one they shipped was light blue. I wanted the darker one. What's the difference? The royal blue is a bit brighter. What zip code are you located in? One nine.[22.04s - 32.62s]: Karen's Boutique, Termall. Is that close? I'm in my office. Okay, um, what is your name, sir? Charlie. Charlie Johnson. Is that J-O-H-N-S-O-N? And Mr. Johnson, do you have the Parker scarf in light blue with you now? I do. They shipped it to my office. It came in not that long ago. What I will do is make arrangements with Karen's Boutique for...[32.62s - 41.03s]: you to Parker Scarf at no additional cost. And in addition, I was able to look up your order in our system, and I'm going to send out a special gift to you to make up for the inconvenience. Thank you. You're welcome. And thank you for calling Parker Scarf, and I hope your wife enjoys her birthday gift. Thank you. You're very welcome. Goodbye.[43.50s - 44.20s]: you
response = client.audio.transcriptions.create( file="audio.mp3", model="openai/whisper-large-v3", response_format="verbose_json", timestamp_granularities="word")print(f"Text: {response.text}")print(f"Language: {response.language}")print(f"Duration: {response.duration}s")print(f"Task: {response.task}")## Access individual words with timestampsif response.words: for word in response.words: print(f"'{word.word}' [{word.start:.2f}s - {word.end:.2f}s]")
Example Output:
Text
Copy
Ask AI
Text: It is certain that Jack Pumpkinhead might have had a much finer house to live in.Language: enDuration: 7.2562358276643995sTask: None'It' [0.00s - 0.36s]'is' [0.42s - 0.47s]'certain' [0.51s - 0.74s]'that' [0.79s - 0.86s]'Jack' [0.90s - 1.11s]'Pumpkinhead' [1.15s - 1.66s]'might' [1.81s - 2.00s]'have' [2.04s - 2.13s]'had' [2.16s - 2.26s]'a' [2.30s - 2.32s]'much' [2.36s - 2.48s]'finer' [2.54s - 2.74s]'house' [2.78s - 2.93s]'to' [2.96s - 3.03s]'live' [3.07s - 3.21s]'in.' [3.26s - 7.27s]