Evals

Setup

See our Getting Started guide for initial setup.

Create

The Together AI Evaluations service is a powerful framework for using LLM-as-a-Judge to evaluate other LLMs and various inputs.

Shell

together evals create [OPTIONS]

Options

Name	Description
`--type [classify\|score\|compare]`	Type of evaluation to create. [required]
`--judge-model TEXT`	Name or URL of the judge model to use for evaluation. [required]
`--judge-model-source [serverless\|dedicated\|external]`	Source of the judge model. [required]
`--judge-external-api-token TEXT`	Optional external API token for the judge model.
`--judge-external-base-url TEXT`	Optional external base URLs for the judge model.
`--judge-system-template TEXT`	System template for the judge model. [required]
`--input-data-file-path TEXT`	Path to the input data file. [required]
`--model-field TEXT`	Name of the field in the input file contaning text generated by the model.Can not be used when model-a-name and other model config parameters are specified
`--model-to-evaluate TEXT`	Model name when using the detailed config
`--model-to-evaluate-source [serverless\|dedicated\|external]`	Source of the model to evaluate.
`--model-to-evaluate-external-api-token TEXT`	Optional external API token for the model to evaluate.
`--model-to-evaluate-external-base-url TEXT`	Optional external base URL for the model to evaluate.
`--model-to-evaluate-max-tokens INTEGER`	Max tokens for model-to-evaluate
`--model-to-evaluate-temperature FLOAT`	Temperature for model-to-evaluate
`--model-to-evaluate-system-template TEXT`	System template for model-to-evaluate
`--model-to-evaluate-input-template TEXT`	Input template for model-to-evaluate
`--labels TEXT`	Classification labels - comma-separated list
`--pass-labels TEXT`	Labels considered as passing (required for classify type). A comma-separated list.
`--min-score FLOAT`	Minimum score value (required for score type).
`--max-score FLOAT`	Maximum score value (required for score type).
`--pass-threshold FLOAT`	Threshold score for passing (required for score type).
`--model-a-field TEXT`	Name of the field in the input file containing text generated by Model A. Can not be used when model-a-name and other model config parameters are specified
`--model-a TEXT`	Model name or URL for model A when using detailed config.
`--model-a-source [serverless\|dedicated\|external]`	Source of model A.
`--model-a-external-api-token TEXT`	Optional external API token for model A.
`--model-a-external-base-url TEXT`	Optional external base URL for model A.
`--model-a-max-tokens INTEGER`	Max tokens for model A.
`--model-a-temperature FLOAT`	Temperature for model A.
`--model-a-system-template TEXT`	System template for model A.
`--model-a-input-template TEXT`	Input template for model A.
`--model-b-field TEXT`	Name of the field in the input file containing text generated by Model B. Can not be used when model-b-name and other model config parameters are specified
`--model-b TEXT`	Model name or URL for model B when using detailed config.
`--model-b-source [serverless\|dedicated\|external]`	Source of model B.
`--model-b-external-api-token TEXT`	Optional external API token for model B.
`--model-b-external-base-url TEXT`	Optional external base URL for model B.
`--model-b-max-tokens INTEGER`	Max tokens for model B.
`--model-b-temperature FLOAT`	Temperature for model B.
`--model-b-system-template TEXT`	System template for model B.
`--model-b-input-template TEXT`	Input template for model B.

List

Shell

together evals list [OPTIONS]

Options

Name	Args	Description
`--status`	`pending, queued, running, completed, error, or user_error`	Filter by job status.
`--limit`	number	Limit number of results (max 100).

Retrieve

Get details of a specific evaluation job

Shell

together evals retrieve EVALUATION_ID

Status

Get the status and results of a specific evaluation job

Shell

together evals status EVALUATION_ID

Together APIs

Command Line Interface

General

Setup

Create

Options

List

Options

Retrieve

Status

Together APIs

Command Line Interface

General

​Setup

​Create

​Options

​List

​Options

​Retrieve

​Status

Setup

Create

Options

List

Options

Retrieve

Status