Log probabilities (logprobs) are the per-token probabilities the model assigns when generating a response. Use them to measure how confident the model is for each token, gate low-confidence outputs, or compare them with alternatives the model considered. Common applications include classification, autocomplete ranking, retrieval evaluation, and content moderation.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Enable logprobs
Passlogprobs: 1 on a chat completion request:
Python
logprobs object on the choice. Its content field is a list of one entry per output token, each with the chosen token, its raw bytes, and a logprob. The top_logprobs field on each entry surfaces the alternatives the model considered:
JSON
Convert logprobs to probabilities
To get a probability between 0 and 1, take the exponential of the logprob:Python
" York") has a logprob of -2.026558e-6, which converts to roughly 0.999998. The model was effectively certain about " York" once it had committed to "New".
Read the per-token logprob from completion.choices[0].logprobs.content[i]["logprob"].
Route by confidence
A common pattern is to run a fast, cheap model first, then escalate to a larger model only when the cheap one isn’t confident. Logprobs let you measure that confidence per response. The example below classifies an email into one of four categories. If the cheap model’s confidence falls below a threshold, the application can re-run the request on a stronger model.Python
work with confidence ≈ 0.99, well above the threshold. For ambiguous emails the same model often returns confidence in the 0.5 to 0.7 range, which is the signal to escalate.
When not to use logprobs
- Open-ended generation: Logprobs measure token-level certainty, not whether the response is correct. A confident wrong answer is still wrong.
- Long outputs: The first few tokens often dominate the meaning of a classification or routing decision. Logprobs deeper in a long response are noisier and less actionable.
- Cross-model comparison: Logprob magnitudes aren’t directly comparable across model families. A 0.7 confidence from one model isn’t the same as 0.7 from another.