CLI

Learn how to run inference using the command-line interface.

In this tutorial, we will teach you how to use the command-line interface to run models. We will be querying the RedPajama-INCITE-7B-Base model to complete the phrase "Space robots are".

For the full API reference go to API Reference.

Pre-requisites

  • Make sure you have Python installed in your CLI.
  • Obtain an API key from Together to access the Inference API by signing up.

Install the Library

Launch your terminal and install or update the Together CLI by executing the following command:

pip install --upgrade together

Authenticate

The API Key can be configured by setting the TOGETHER_API_KEY environment variable, like this:

export TOGETHER_API_KEY=xxxxx

Find your API token in your account settings.

Query a chat model

Query a chat model by running the following command.

together chat.completions \
  --message "system" "You are a helpful assistant named Together" \
  --message "user" "What is your name?" \
  --model mistralai/Mixtral-8x7B-Instruct-v0.1

The Chat Completions CLI enables streaming tokens to stdout by default. To disable streaming, use --no-stream.

Query a language or code model

To run a completion for a language or code model, use the completions method in the CLI.

together completions \
  "Large language models are " \
  --model mistralai/Mixtral-8x7B-v0.1 \
  --max-tokens 512 \
  --stop "."

The Completions CLI enables streaming tokens to stdout by default. To disable streaming, use --no-stream.

Query an image model

To query an image model, use the images method in the CLI.

together images generate \
  "space robots" \
  --model stabilityai/stable-diffusion-xl-base-1.0 \
  --n 4

The image is opened in the default image viewer by default. To disable this, use --no-show.

List all models

To list all the available models, run the following:

# Help
together models --help

# List models
together models list