Frequently asked questions

We’ve aggregated questions frequently asked by our customers in this page. If you have a question that is not answered anywhere in our documentation, contact our support team. We will respond to you promptly!

Table of Contents


I encountered an error while using your API. What should I do?

Refer to the Inference Error Codes page for more information.

Will my data be used to train other models? What's your privacy policy?

No, data sent to the endpoint will not be used to train models, unless you grant us permission to.

We allow you to set your privacy settings to control whether or not Together AI retains training data, prompts or model responses, or uses any of your data for training our own models. You can adjust your privacy settings or request your data be deleted by going to Settings > Profile.

You can also find our detailed privacy policy here: https://www.together.ai/privacy

What models are available to run inference on?

See the Models page for a list of models that can be queried. You can also see these models by navigating to the Playground.

  • See 100+ models hosted for inference.
  • For technical details, see the API Reference.
  • Check out our web-based Chat, Language, Code, and Image Playgrounds.
  • Pricing for inference is distinct for each model based on tokens used. See inference pricing.
  • Learn best practices and prompt engineering techniques through examples.
  • Learn other ways to run inference with Rest API or the Python API.

What does pricing look like for Serverless Endpoints vs. Dedicated Instances of my fine-tuned models?

For the Serverless Endpoints you pay per 1K tokens for your requests to these models. Our latest pricing is described here: https://together.ai/pricing.

If your own FT model, then you do pay hourly pricing to host it. The start and stop instances APIs are useful for managing instances of your own fine tuned models which you do pay an hourly hosting fee for.

My response is getting truncated when running inference. How do I fix this?

This is probably an issue with max_tokens. Please set max_tokens to a higher value to avoid truncation.

What is the difference between Chat and Complete for inference?

Complete can be made to behave similar to Chat, but requires formatting the prompts manually. Here's an example of Chat and Complete commands that will produce the same results:

$ together chat
Loading togethercomputer/RedPajama-INCITE-7B-Chat
Type /quit to quit, /help, or /? to list commands.

>>> List the best restaurants in SF
$ together complete "<human>: List the best restaurants in SF\n<bot>: "

The key difference is that Chat is a purely interactive experience with features like chat history, while Complete offers a more customizable and manual experience useful for implementations in custom use-cases.

I can't run inference with my model. What is going on?

If you want to run inference:

  1. Choose from the available models list.
  2. For Serverless Endpoints models, direct inference is possible without the need to initiate a virtual machine (VM).
  3. If you're trying to run inference on a model you fine-tuned, initiate its VM instance either:
    • Directly from the model's page on api.together.ai, or
    • Utilizing the start and stop instances of our APIs.
  4. If your desired model isn't listed, feel free to request a model.


What happens if the training data (jsonl), has some examples with token counts much smaller (or longer) than the model context length?

We use dataset packing, so sequences shorter than the max sequence length are concattenated with a token to separate them. If an example is greater than maximum sequence length, it is split so the entire example is used, but in exclusive subsets.

How long will it take for my job to start?

It depends. Factors that affect waiting time include he number of number of pending jobs from other customers, the number of jobs currently running, and available hardware. If there are no other pending jobs and there is available hardware, your job should start within a minute of submission. Typically jobs will start within an hour of submission. However, there is no guarantee on waiting time.

How long will my job take to run?

It depends. Factors that impact your job run time are model size, training data size, and network conditions when downloading/uploading model/training files. You can estimate how long your job will take to complete training by multiplying the number of epochs by the time to complete the first epoch.

Why am I getting an error when uploading a training file?

There are two common issues you may encouter,

  1. Your API key may be incorrect. If you get a 403 status code, this indicates your API Key is incorrect.
  2. Your balance may be less than the job minimum. We verify that you have sufficient balance on your account that is equal to the minimum job charge ($5). If you do not have sufficient balance, you can increase your account limit by adding a credit card to your account, adjusting your spending limit if you ready have a credit card, or paying your outstanding account balance. If you have sufficient balance on your account, contact support for assistance.

Why was my job cancelled?

There are two reasons that a job may be automatically cancelled.

  1. You do not have sufficient balance on your account to cover the cost of the job.
  2. You have entered an incorrect WandB API key

You can determine why your job was cancelled by either checking the events list for your job via the together-CLI tool,

$ together list-events <job-fine-tune-id>

Or via the web interface https://api.together.ai > Jobs > cancelled job > events list

The following is an example of a job that was cancelled due to an incorrect WandB key is incorrect (see message 4 and 5):

$ together list-events ft-392ef45d-a4f4-4a4d-b50c-c5b551d852c9
|    | Message                                                                                                | Type                            | Hash                 |
|  0 | Fine tune request created                                                                              | JOB_PENDING                     |                      |
|  1 | Job started at Tue Jan 23 07:20:10 PST 2024                                                            | JOB_START                       | 8275378180435023547  |
|  2 | Model data downloaded for togethercomputer/RedPajama-INCITE-7B-Chat at Tue Jan 23 07:22:34 PST 2024    | MODEL_DOWNLOAD_COMPLETE         | -988165705840572841  |
|  3 | Training data downloaded for togethercomputer/RedPajama-INCITE-7B-Chat at Tue Jan 23 07:22:36 PST 2024 | TRAINING_DATA_DOWNLOAD_COMPLETE | -1605514659064971718 |
|  4 | WandB login or init failed: API key must be 40 characters long, yours was 17                           | WANDB_INIT                      | -748217494451531697  |
|  5 | Job cancelled due to error in WandB login/init                                                         | CANCEL_REQUESTED                |                      |
|  6 | Training started for model /work/ft-392ef45d-a4f4-4a4d-b50c-c5b551d852c9/model                         | TRAINING_START                  | 1731272555435848274  |
|  7 | Job stopped due to cancel request                                                                      | JOB_STOPPED                     |                      |

The following is an example of a job an event log in the web jobs tab where the billing limit was reached.

What should I do if my job is cancelled due to billing limits?

You can an add a credit card to your account to increase your spending limit. If you already have a credit card on your account, you can make a payment or adjust your spending limit. Contact support if you need assistance with your account balance.

Why was there an error while running my job?

If your job fails after downloading the training file, but before training starts, the most likely source of the error is the training data. For example, your event log might look like

You can verify the formatting of your input file with the Together CLI tool with the following command:

$ together files check ~/Downloads/unified_joke_explanations.jsonl
    "is_check_passed": true,
    "model_special_tokens": "we are not yet checking end of sentence tokens for this model",
    "file_present": "File found",
    "file_size": "File size 0.0 GB",
    "num_samples": 356

Despite our best efforts, the file checker does not catch all errors. Please contact support if your training data file passes the checks, but you are still seeing the above error conditions.

If you see an error during other steps in your training job, this may be due to internal errors in our training stack (e.g. hardware failure or bugs). We actively monitor job failures, and work as quickly as we can to resolve these issues. Once the issue has been resolved by our engineers, your job will be automatically or manually restarted. Charges for the restarted job will be refunded.

How do I know if my job was restarted?

A job will be automatically or manually restarted if the job fails to complete due to an internal error. You can view the event log to see if the job was restarted, to determine the new fine tune ID of the restarted job, and check the refund amount (if applicable). Any charges from the failed job will be refunded when your job is restarted. An example event log for a restarted job is