Skip to main content

Setup

See our Getting Started guide for initial setup.

Create

The Together AI Evaluations service is a powerful framework for using LLM-as-a-Judge to evaluate other LLMs and various inputs. ​
Shell
together evals create [OPTIONS]

Options

NameDescription
--type [classify|score|compare]Type of evaluation to create. [required]
--judge-model TEXTName or URL of the judge model to use for evaluation. [required]
--judge-model-source [serverless|dedicated|external]Source of the judge model. [required]
--judge-external-api-token TEXTOptional external API token for the judge model.
--judge-external-base-url TEXTOptional external base URLs for the judge model.
--judge-system-template TEXTSystem template for the judge model. [required]
--input-data-file-path TEXTPath to the input data file. [required]
--model-field TEXTName of the field in the input file contaning text generated by the model.Can not be used when model-a-name and other model config parameters are specified
--model-to-evaluate TEXTModel name when using the detailed config
--model-to-evaluate-source [serverless|dedicated|external]Source of the model to evaluate.
--model-to-evaluate-external-api-token TEXTOptional external API token for the model to evaluate.
--model-to-evaluate-external-base-url TEXTOptional external base URL for the model to evaluate.
--model-to-evaluate-max-tokens INTEGERMax tokens for model-to-evaluate
--model-to-evaluate-temperature FLOATTemperature for model-to-evaluate
--model-to-evaluate-system-template TEXTSystem template for model-to-evaluate
--model-to-evaluate-input-template TEXTInput template for model-to-evaluate
--labels TEXTClassification labels - comma-separated list
--pass-labels TEXTLabels considered as passing (required for classify type). A comma-separated list.
--min-score FLOATMinimum score value (required for score type).
--max-score FLOATMaximum score value (required for score type).
--pass-threshold FLOATThreshold score for passing (required for score type).
--model-a-field TEXTName of the field in the input file containing text generated by Model A. Can not be used when model-a-name and other model config parameters are specified
--model-a TEXTModel name or URL for model A when using detailed config.
--model-a-source [serverless|dedicated|external]Source of model A.
--model-a-external-api-token TEXTOptional external API token for model A.
--model-a-external-base-url TEXTOptional external base URL for model A.
--model-a-max-tokens INTEGERMax tokens for model A.
--model-a-temperature FLOATTemperature for model A.
--model-a-system-template TEXTSystem template for model A.
--model-a-input-template TEXTInput template for model A.
--model-b-field TEXTName of the field in the input file containing text generated by Model B. Can not be used when model-b-name and other model config parameters are specified
--model-b TEXTModel name or URL for model B when using detailed config.
--model-b-source [serverless|dedicated|external]Source of model B.
--model-b-external-api-token TEXTOptional external API token for model B.
--model-b-external-base-url TEXTOptional external base URL for model B.
--model-b-max-tokens INTEGERMax tokens for model B.
--model-b-temperature FLOATTemperature for model B.
--model-b-system-template TEXTSystem template for model B.
--model-b-input-template TEXTInput template for model B.

List

Shell
together evals list [OPTIONS]

Options

NameArgsDescription
--statuspending, queued, running, completed, error, or user_errorFilter by job status.
--limitnumberLimit number of results (max 100).

Retrieve

Get details of a specific evaluation job
Shell
together evals retrieve EVALUATION_ID

Status

Get the status and results of a specific evaluation job
Shell
together evals status EVALUATION_ID