Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Create

Create a new model evaluation job. For the full list of supported models, see Supported Models.
tg evals create

Parameters

FlagDescription
--type [classify|score|compare]Type of evaluation to create.
required
--judge-model [string]Name or URL of the judge model to use for evaluation.
required
--judge-model-source [serverless|dedicated|external]Source of the judge model.
required
--judge-system-template [string]System template for the judge model.
required
--input-data-file-path [string]Path to the input data file.
required
--judge-external-api-token [string]API token for an external judge model. Pass an empty string ("") when --judge-model-source is serverless or dedicated.
required
--judge-external-base-url [string]Base URL for an external judge model. Pass an empty string ("") when --judge-model-source is serverless or dedicated.
required
--model-field [string]Name of the field in the input file containing text generated by the model. Mutually exclusive with --model-to-evaluate and the other detailed-config flags below.
--model-to-evaluate [string]Model name when using the detailed config.
--model-to-evaluate-source [serverless|dedicated|external]Source of the model to evaluate.
--model-to-evaluate-external-api-token [string]Optional external API token for the model to evaluate.
--model-to-evaluate-external-base-url [string]Optional external base URL for the model to evaluate.
--model-to-evaluate-max-tokens [integer]Max tokens for the model to evaluate.
--model-to-evaluate-temperature [float]Temperature for the model to evaluate.
--model-to-evaluate-system-template [string]System template for the model to evaluate.
--model-to-evaluate-input-template [string]Input template for the model to evaluate.
--labels [string]Comma-separated list of classification labels.
--pass-labels [string]Comma-separated list of labels considered as passing. Required for the classify type.
--min-score [float]Minimum score value. Required for the score type.
--max-score [float]Maximum score value. Required for the score type.
--pass-threshold [float]Threshold score for passing. Required for the score type.
--model-a-field [string]Name of the field in the input file containing text generated by model A. Mutually exclusive with --model-a and the other model-A flags below.
--model-a [string]Model name or URL for model A when using the detailed config.
--model-a-source [serverless|dedicated|external]Source of model A.
--model-a-external-api-token [string]Optional external API token for model A.
--model-a-external-base-url [string]Optional external base URL for model A.
--model-a-max-tokens [integer]Max tokens for model A.
--model-a-temperature [float]Temperature for model A.
--model-a-system-template [string]System template for model A.
--model-a-input-template [string]Input template for model A.
--model-b-field [string]Name of the field in the input file containing text generated by model B. Mutually exclusive with --model-b and the other model-B flags below.
--model-b [string]Model name or URL for model B when using the detailed config.
--model-b-source [serverless|dedicated|external]Source of model B.
--model-b-external-api-token [string]Optional external API token for model B.
--model-b-external-base-url [string]Optional external base URL for model B.
--model-b-max-tokens [integer]Max tokens for model B.
--model-b-temperature [float]Temperature for model B.
--model-b-system-template [string]System template for model B.
--model-b-input-template [string]Input template for model B.
--disable-position-bias-correctionSkip the flipped-order judge pass and run only a single judge pass (original order). Halves judge cost and latency at the expense of position-bias correction. Default: off (two-pass mode).

List

List all eval jobs.
tg evals list

Parameters

FlagDescription
--status [pending|queued|running|completed|error|user_error]Filter by job status.
--limit [integer]Limit number of results (max 100).
--after [string]Pagination cursor.

Retrieve

Get the details for a specific evaluation job.
tg evals retrieve [EVALUATION_ID]

Status

Get the status and results of a specific evaluation job.
tg evals status [EVALUATION_ID]