Setup
See our Getting Started guide for initial setup.Create
The Together AI Evaluations service is a powerful framework for using LLM-as-a-Judge to evaluate other LLMs and various inputs. Shell
Options
| Name | Description |
|---|---|
--type [classify|score|compare] | Type of evaluation to create. [required] |
--judge-model TEXT | Name or URL of the judge model to use for evaluation. [required] |
--judge-model-source [serverless|dedicated|external] | Source of the judge model. [required] |
--judge-external-api-token TEXT | Optional external API token for the judge model. |
--judge-external-base-url TEXT | Optional external base URLs for the judge model. |
--judge-system-template TEXT | System template for the judge model. [required] |
--input-data-file-path TEXT | Path to the input data file. [required] |
--model-field TEXT | Name of the field in the input file contaning text generated by the model.Can not be used when model-a-name and other model config parameters are specified |
--model-to-evaluate TEXT | Model name when using the detailed config |
--model-to-evaluate-source [serverless|dedicated|external] | Source of the model to evaluate. |
--model-to-evaluate-external-api-token TEXT | Optional external API token for the model to evaluate. |
--model-to-evaluate-external-base-url TEXT | Optional external base URL for the model to evaluate. |
--model-to-evaluate-max-tokens INTEGER | Max tokens for model-to-evaluate |
--model-to-evaluate-temperature FLOAT | Temperature for model-to-evaluate |
--model-to-evaluate-system-template TEXT | System template for model-to-evaluate |
--model-to-evaluate-input-template TEXT | Input template for model-to-evaluate |
--labels TEXT | Classification labels - comma-separated list |
--pass-labels TEXT | Labels considered as passing (required for classify type). A comma-separated list. |
--min-score FLOAT | Minimum score value (required for score type). |
--max-score FLOAT | Maximum score value (required for score type). |
--pass-threshold FLOAT | Threshold score for passing (required for score type). |
--model-a-field TEXT | Name of the field in the input file containing text generated by Model A. Can not be used when model-a-name and other model config parameters are specified |
--model-a TEXT | Model name or URL for model A when using detailed config. |
--model-a-source [serverless|dedicated|external] | Source of model A. |
--model-a-external-api-token TEXT | Optional external API token for model A. |
--model-a-external-base-url TEXT | Optional external base URL for model A. |
--model-a-max-tokens INTEGER | Max tokens for model A. |
--model-a-temperature FLOAT | Temperature for model A. |
--model-a-system-template TEXT | System template for model A. |
--model-a-input-template TEXT | Input template for model A. |
--model-b-field TEXT | Name of the field in the input file containing text generated by Model B. Can not be used when model-b-name and other model config parameters are specified |
--model-b TEXT | Model name or URL for model B when using detailed config. |
--model-b-source [serverless|dedicated|external] | Source of model B. |
--model-b-external-api-token TEXT | Optional external API token for model B. |
--model-b-external-base-url TEXT | Optional external base URL for model B. |
--model-b-max-tokens INTEGER | Max tokens for model B. |
--model-b-temperature FLOAT | Temperature for model B. |
--model-b-system-template TEXT | System template for model B. |
--model-b-input-template TEXT | Input template for model B. |
List
Shell
Options
| Name | Args | Description |
|---|---|---|
--status | pending, queued, running, completed, error, or user_error | Filter by job status. |
--limit | number | Limit number of results (max 100). |
Retrieve
Get details of a specific evaluation jobShell
Status
Get the status and results of a specific evaluation jobShell