Skip to main content

Introduction

This guide explains how to perform evaluations using the Together AI UI.
For a comprehensive guide with detailed parameter descriptions and API examples, see AI Evaluations.

Step 1: Upload Your Dataset

Navigate to https://api.together.ai/evaluations and click “Create Evaluation”.
Create Evaluation button
Upload your dataset or select one from your library. Preview your dataset content in the “Dataset Preview” section.
Dataset upload interface

Step 2: Customize Your Evaluation Job

Evaluation Types

TypeDescription
ClassifyCategorizes input into one of the provided categories
ScoreEvaluates input and produces a score within a specified range
CompareCompares responses from two models to determine which performs better

Judge Configuration

Configure the judge model that will evaluate your inputs:
FieldTypeRequiredDescription
judge modelstringYesThe model used for evaluation
system templateJinja2 templateYesInstructions for the judge to assess the data
Judge configuration interface

Evaluation Type Parameters

Classify parameters:
FieldTypeDescription
labelslist of stringsCategories for classification. Mark each as ‘pass’ or ‘fail’ for statistics
model_to_evaluateobject or stringModel configuration or dataset column name
Score parameters:
FieldTypeDescription
min_scorefloatMinimum score the judge can assign
max_scorefloatMaximum score the judge can assign
pass_thresholdfloatScore at or above which is considered “passing” (optional)
model_to_evaluateobject or stringModel configuration or dataset column name
Compare parameters:
FieldTypeDescription
model_aobject or stringFirst model configuration or dataset column name
model_bobject or stringSecond model configuration or dataset column name

Model Evaluation Configuration

Choose how to provide responses for evaluation:
  • Configure – Generate new responses using a model
  • Field name – Use existing responses from your dataset

Option 1: Model Configuration Object

Use when generating new responses for evaluation:
FieldTypeRequiredDescription
model_namestringYesOne of our supported models
model_sourcestringYes"serverless", "dedicated", or "external"
system_templateJinja2 templateYesGeneration instructions (see Templates)
input_templateJinja2 templateYesInput format, e.g., "{{prompt}}"
max_tokensintegerNoMaximum tokens for generation
temperaturefloatNoTemperature setting for generation
external_api_tokenstringWhen externalAPI bearer token for external providers
external_base_urlstringNoCustom base URL for external APIs

Option 2: Column Reference

Use when evaluating pre-existing data from your dataset. Simply specify the column name containing the data to evaluate.
Model configuration interface

Using External Models

When using model_source = "external":
  • Enter a supported shortcut (e.g., openai/gpt-5). See Supported External Models.
  • Provide your external_api_token for the provider.
  • Optionally set external_base_url for custom OpenAI chat/completions-compatible endpoints.
For dedicated endpoints, set model_source = "dedicated" and paste your endpoint ID into the model field. See Dedicated Inference.

Step 3: Monitor Job Progress

Wait for your evaluation job to complete. The UI will show the current status of your job.
Job progress monitoring

Step 4: Review Results

Once complete, you can:
  • Preview statistics and responses in the Dataset Preview
  • Download the result file using the “Download” button
Results preview