> ## Documentation Index
> Fetch the complete documentation index at: https://docs.together.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Transcribe audio

> Transcribe and translate audio into text.

<Tip>
  Using a coding agent? Install the [together-audio](https://github.com/togethercomputer/skills/tree/main/skills/together-audio) skill to let your agent write correct speech-to-text code automatically. See [agent skills](/docs/agent-skills) for details.
</Tip>

Together AI hosts speech recognition models including OpenAI's Whisper and Voxtral for batch transcription and real-time streaming.

<Tip>
  Want to hear it in action? Call **(847) 851-4323** to talk to a live voice agent powered by Together AI's real-time STT and TTS pipeline. Then read the [end-to-end guide](/docs/how-to-build-phone-voice-agent) to build your own.
</Tip>

## Quickstart

Basic transcription and translation:

<CodeGroup>
  ```python Python theme={null}
  from pathlib import Path

  from together import Together

  client = Together()

  ## Basic transcription

  response = client.audio.transcriptions.create(
      file=Path("audio.mp3"),
      model="openai/whisper-large-v3",
      language="en",
  )
  print(response.text)

  ## Basic translation

  response = client.audio.translations.create(
      file=Path("foreign_audio.mp3"),
      model="openai/whisper-large-v3",
  )
  print(response.text)
  ```

  ```typescript TypeScript theme={null}
  import { createReadStream } from 'fs';
  import Together from 'together-ai';

  const together = new Together();

  // Basic transcription
  const transcription = await together.audio.transcriptions.create({
    file: createReadStream('audio.mp3'),
    model: 'openai/whisper-large-v3',
    language: 'en',
  });
  console.log(transcription.text);

  // Basic translation
  const translation = await together.audio.translations.create({
    file: createReadStream('foreign_audio.mp3'),
    model: 'openai/whisper-large-v3',
  });
  console.log(translation.text);
  ```

  ```bash cURL theme={null}
  # Use -F for each field. Append ;type=<format> to the file field so the
  # server knows the audio format. Common values:
  #   audio/mpeg  → .mp3
  #   audio/wav   → .wav
  #   audio/mp4   → .m4a
  #   audio/webm  → .webm
  #   audio/flac  → .flac

  # Transcription (MP3)
  curl -X POST "https://api.together.ai/v1/audio/transcriptions" \
       -H "Authorization: Bearer $TOGETHER_API_KEY" \
       -F "file=@audio.mp3;type=audio/mpeg" \
       -F "model=openai/whisper-large-v3" \
       -F "language=en" \
       -F "response_format=json"

  # Translation (MP3)
  curl -X POST "https://api.together.ai/v1/audio/translations" \
       -H "Authorization: Bearer $TOGETHER_API_KEY" \
       -F "file=@foreign_audio.mp3;type=audio/mpeg" \
       -F "model=openai/whisper-large-v3"

  # Transcription (WAV)
  curl -X POST "https://api.together.ai/v1/audio/transcriptions" \
       -H "Authorization: Bearer $TOGETHER_API_KEY" \
       -F "file=@audio.wav;type=audio/wav" \
       -F "model=openai/whisper-large-v3"
  ```

  ```bash Shell theme={null}
  ## Transcription
  together audio transcribe audio.mp3 \
    --model openai/whisper-large-v3 \
    --language en

  ## Translation
  together audio translate foreign_audio.mp3 \
    --model openai/whisper-large-v3
  ```
</CodeGroup>

## Available models

For the current list of speech-to-text models, see the [serverless catalog](/docs/serverless/models) or the [dedicated endpoint model catalog](/docs/dedicated-endpoints/models).

## Audio transcription

Audio transcription is speech-to-text in the same language as the source audio.

<CodeGroup>
  ```python Python theme={null}
  from pathlib import Path

  from together import Together

  client = Together()

  response = client.audio.transcriptions.create(
      file=Path("meeting_recording.mp3"),
      model="openai/whisper-large-v3",
      language="en",
      response_format="json",
  )

  print(f"Transcription: {response.text}")
  ```

  ```typescript TypeScript theme={null}
  import { createReadStream } from 'fs';
  import Together from 'together-ai';

  const together = new Together();

  const response = await together.audio.transcriptions.create({
    file: createReadStream('meeting_recording.mp3'),
    model: 'openai/whisper-large-v3',
    language: 'en',
    response_format: 'json',
  });

  console.log(`Transcription: ${response.text}`);
  ```

  ```bash Shell theme={null}
  together audio transcribe meeting_recording.mp3 \
    --model openai/whisper-large-v3 \
    --language en \
    --response-format json
  ```
</CodeGroup>

The API supports the following audio formats:

* `.wav` (audio/wav)
* `.mp3` (audio/mpeg)
* `.m4a` (audio/mp4)
* `.webm` (audio/webm)
* `.flac` (audio/flac)
* `.ogg` (audio/ogg)
* `.opus` (audio/opus)
* `.aac` (audio/aac)

### Input methods

#### Path object

```python Python theme={null}
from pathlib import Path

response = client.audio.transcriptions.create(
    file=Path("recordings/interview.wav"),
    model="openai/whisper-large-v3",
)
```

#### File-like object

```python Python theme={null}
with open("audio.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-large-v3",
    )
```

#### Remote URL

The Python SDK doesn't accept a string URL on `file=`. To transcribe a remote file, download it first or use the CLI:

```bash Shell theme={null}
together audio transcribe https://example.com/audio.mp3 \
  --model openai/whisper-large-v3
```

### Language support

Specify the audio language using ISO 639-1 language codes:

<CodeGroup>
  ```python Python theme={null}
  from pathlib import Path

  response = client.audio.transcriptions.create(
      file=Path("spanish_audio.mp3"),
      model="openai/whisper-large-v3",
      language="es",  # Spanish
  )
  ```
</CodeGroup>

Common language codes:

* `"en"`: English.
* `"es"`: Spanish.
* `"fr"`: French.
* `"de"`: German.
* `"ja"`: Japanese.
* `"zh"`: Chinese.
* `"auto"`: Auto-detect (default).

### Custom prompts

Use prompts to improve transcription accuracy for specific contexts.

<Note>
  Prompts are supported only on Whisper-family models (for example, `openai/whisper-large-v3`). Other STT models (for example, `nvidia/parakeet-tdt-0.6b-v3`) accept the field for API compatibility but ignore it.
</Note>

<CodeGroup>
  ```python Python theme={null}
  from pathlib import Path

  response = client.audio.transcriptions.create(
      file=Path("medical_consultation.mp3"),
      model="openai/whisper-large-v3",
      language="en",
      prompt="This is a medical consultation discussing patient symptoms, diagnosis, and treatment options.",
  )
  ```

  ```bash Shell theme={null}
  together audio transcribe medical_consultation.mp3 \
    --model openai/whisper-large-v3 \
    --language en \
    --prompt "This is a medical consultation discussing patient symptoms, diagnosis, and treatment options."
  ```
</CodeGroup>

## Next steps

* [Streaming transcription](/docs/inference/transcription/streaming): real-time WebSocket transcription for low-latency applications.
* [Audio translation](/docs/inference/transcription/translation): translate speech in any language to English text.
* [Transcription features](/docs/inference/transcription/features): speaker diarization, word-level timestamps, response formats, async support, and best practices.
