Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Using a coding agent? Install the together-audio skill to let your agent write correct speech-to-text code automatically. See agent skills for details.
Together AI hosts speech recognition models including OpenAI’s Whisper and Voxtral for batch transcription and real-time streaming.
Want to hear it in action? Call (847) 851-4323 to talk to a live voice agent powered by Together AI’s real-time STT and TTS pipeline. Then read the end-to-end guide to build your own.

Quickstart

Basic transcription and translation:
from pathlib import Path

from together import Together

client = Together()

## Basic transcription

response = client.audio.transcriptions.create(
    file=Path("audio.mp3"),
    model="openai/whisper-large-v3",
    language="en",
)
print(response.text)

## Basic translation

response = client.audio.translations.create(
    file=Path("foreign_audio.mp3"),
    model="openai/whisper-large-v3",
)
print(response.text)

Available models

For the current list of speech-to-text models, see the serverless catalog or the dedicated endpoint model catalog.

Audio transcription

Audio transcription is speech-to-text in the same language as the source audio.
from pathlib import Path

from together import Together

client = Together()

response = client.audio.transcriptions.create(
    file=Path("meeting_recording.mp3"),
    model="openai/whisper-large-v3",
    language="en",
    response_format="json",
)

print(f"Transcription: {response.text}")
The API supports the following audio formats:
  • .wav (audio/wav)
  • .mp3 (audio/mpeg)
  • .m4a (audio/mp4)
  • .webm (audio/webm)
  • .flac (audio/flac)
  • .ogg (audio/ogg)
  • .opus (audio/opus)
  • .aac (audio/aac)

Input methods

Path object

Python
from pathlib import Path

response = client.audio.transcriptions.create(
    file=Path("recordings/interview.wav"),
    model="openai/whisper-large-v3",
)

File-like object

Python
with open("audio.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-large-v3",
    )

Remote URL

The Python SDK doesn’t accept a string URL on file=. To transcribe a remote file, download it first or use the CLI:
Shell
together audio transcribe https://example.com/audio.mp3 \
  --model openai/whisper-large-v3

Language support

Specify the audio language using ISO 639-1 language codes:
from pathlib import Path

response = client.audio.transcriptions.create(
    file=Path("spanish_audio.mp3"),
    model="openai/whisper-large-v3",
    language="es",  # Spanish
)
Common language codes:
  • "en": English.
  • "es": Spanish.
  • "fr": French.
  • "de": German.
  • "ja": Japanese.
  • "zh": Chinese.
  • "auto": Auto-detect (default).

Custom prompts

Use prompts to improve transcription accuracy for specific contexts.
Prompts are supported only on Whisper-family models (for example, openai/whisper-large-v3). Other STT models (for example, nvidia/parakeet-tdt-0.6b-v3) accept the field for API compatibility but ignore it.
from pathlib import Path

response = client.audio.transcriptions.create(
    file=Path("medical_consultation.mp3"),
    model="openai/whisper-large-v3",
    language="en",
    prompt="This is a medical consultation discussing patient symptoms, diagnosis, and treatment options.",
)

Next steps