Inference billing works differently depending on your deployment mode:Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
- Serverless models bill based on usage, with no minimums and no provisioning cost.
- Dedicated endpoints bill per-minute, depending on the hardware you reserve.
Serverless models
You pay per unit of work, with units determined by model type:- Chat, language, embedding, and rerank: Per input and output token.
- Image generation: Per megapixel of output.
- Video generation: Per second of output.
- Speech-to-text and text-to-speech: Per second of audio.