> ## Documentation Index
> Fetch the complete documentation index at: https://docs.together.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Evals

> Create and manage model-evaluation jobs from your terminal, including classify, score, and compare evals.

## Create

Create a new [model evaluation](/docs/ai-evaluations) job. For the full list of supported models, see [Supported Models](/docs/evaluations-supported-models).

```bash theme={null}
tg evals create
```

### Parameters

| Flag                                                           | Description                                                                                                                                                                                  |
| -------------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `--type [classify\|score\|compare]`                            | Type of evaluation to create.<br />**required**                                                                                                                                              |
| `--judge-model [string]`                                       | Name or URL of the judge model to use for evaluation.<br />**required**                                                                                                                      |
| `--judge-model-source [serverless\|dedicated\|external]`       | Source of the judge model.<br />**required**                                                                                                                                                 |
| `--judge-system-template [string]`                             | System template for the judge model.<br />**required**                                                                                                                                       |
| `--input-data-file-path [string]`                              | Path to the input data file.<br />**required**                                                                                                                                               |
| `--judge-external-api-token [string]`                          | API token for an external judge model. Pass an empty string (`""`) when `--judge-model-source` is `serverless` or `dedicated`.<br />**required**                                             |
| `--judge-external-base-url [string]`                           | Base URL for an external judge model. Pass an empty string (`""`) when `--judge-model-source` is `serverless` or `dedicated`.<br />**required**                                              |
| `--model-field [string]`                                       | Name of the field in the input file containing text generated by the model. Mutually exclusive with `--model-to-evaluate` and the other detailed-config flags below.                         |
| `--model-to-evaluate [string]`                                 | Model name when using the detailed config.                                                                                                                                                   |
| `--model-to-evaluate-source [serverless\|dedicated\|external]` | Source of the model to evaluate.                                                                                                                                                             |
| `--model-to-evaluate-external-api-token [string]`              | Optional external API token for the model to evaluate.                                                                                                                                       |
| `--model-to-evaluate-external-base-url [string]`               | Optional external base URL for the model to evaluate.                                                                                                                                        |
| `--model-to-evaluate-max-tokens [integer]`                     | Max tokens for the model to evaluate.                                                                                                                                                        |
| `--model-to-evaluate-temperature [float]`                      | Temperature for the model to evaluate.                                                                                                                                                       |
| `--model-to-evaluate-system-template [string]`                 | System template for the model to evaluate.                                                                                                                                                   |
| `--model-to-evaluate-input-template [string]`                  | Input template for the model to evaluate.                                                                                                                                                    |
| `--labels [string]`                                            | Comma-separated list of classification labels.                                                                                                                                               |
| `--pass-labels [string]`                                       | Comma-separated list of labels considered as passing. Required for the `classify` type.                                                                                                      |
| `--min-score [float]`                                          | Minimum score value. Required for the `score` type.                                                                                                                                          |
| `--max-score [float]`                                          | Maximum score value. Required for the `score` type.                                                                                                                                          |
| `--pass-threshold [float]`                                     | Threshold score for passing. Required for the `score` type.                                                                                                                                  |
| `--model-a-field [string]`                                     | Name of the field in the input file containing text generated by model A. Mutually exclusive with `--model-a` and the other model-A flags below.                                             |
| `--model-a [string]`                                           | Model name or URL for model A when using the detailed config.                                                                                                                                |
| `--model-a-source [serverless\|dedicated\|external]`           | Source of model A.                                                                                                                                                                           |
| `--model-a-external-api-token [string]`                        | Optional external API token for model A.                                                                                                                                                     |
| `--model-a-external-base-url [string]`                         | Optional external base URL for model A.                                                                                                                                                      |
| `--model-a-max-tokens [integer]`                               | Max tokens for model A.                                                                                                                                                                      |
| `--model-a-temperature [float]`                                | Temperature for model A.                                                                                                                                                                     |
| `--model-a-system-template [string]`                           | System template for model A.                                                                                                                                                                 |
| `--model-a-input-template [string]`                            | Input template for model A.                                                                                                                                                                  |
| `--model-b-field [string]`                                     | Name of the field in the input file containing text generated by model B. Mutually exclusive with `--model-b` and the other model-B flags below.                                             |
| `--model-b [string]`                                           | Model name or URL for model B when using the detailed config.                                                                                                                                |
| `--model-b-source [serverless\|dedicated\|external]`           | Source of model B.                                                                                                                                                                           |
| `--model-b-external-api-token [string]`                        | Optional external API token for model B.                                                                                                                                                     |
| `--model-b-external-base-url [string]`                         | Optional external base URL for model B.                                                                                                                                                      |
| `--model-b-max-tokens [integer]`                               | Max tokens for model B.                                                                                                                                                                      |
| `--model-b-temperature [float]`                                | Temperature for model B.                                                                                                                                                                     |
| `--model-b-system-template [string]`                           | System template for model B.                                                                                                                                                                 |
| `--model-b-input-template [string]`                            | Input template for model B.                                                                                                                                                                  |
| `--disable-position-bias-correction`                           | Skip the flipped-order judge pass and run only a single judge pass (original order). Halves judge cost and latency at the expense of position-bias correction. Default: off (two-pass mode). |

## List

List all eval jobs.

```bash theme={null}
tg evals list
```

### Parameters

| Flag                                                                | Description                        |
| ------------------------------------------------------------------- | ---------------------------------- |
| `--status [pending\|queued\|running\|completed\|error\|user_error]` | Filter by job status.              |
| `--limit [integer]`                                                 | Limit number of results (max 100). |
| `--after [string]`                                                  | Pagination cursor.                 |

## Retrieve

Get the details for a specific evaluation job.

```bash theme={null}
tg evals retrieve [EVALUATION_ID]
```

## Status

Get the status and results of a specific evaluation job.

```bash theme={null}
tg evals status [EVALUATION_ID]
```