Rate limits

Rate limits restrict how often a user or client can access our API within a set timeframe.

Rate limiting refers to the constraints our API enforces on how frequently a user or client can access our services within a given timeframe. Rate limits are denoted as HTTP status code 429s. Read more about our rate limit tiers below, and find out how you can increase them here:

  • If you have a high volume of steady traffic and good payment history for this traffic, you can request a higher limit here.
  • If you are interested in our Scale or Enterprise packages, with custom requests per minute (RPM) and unlimited tokens per minute (TPM), please reach out to sales here.

What is the purpose of rate limits?

Rate limits in APIs are a standard approach, and they serve to safeguard against abuse or misuse of the API, helping to ensure equitable access to the API with consistent performance.

How are our rate limits implemented?

Our rate limits are currently measured in requests per second (RPS) and tokens per second (TPS) for each model type. If you exceed any of the rate limits you will get a 429 error. We show you the values per minute below, as its the industry standard.

Rate limit tiers

You can view your rate limit by navigating to Settings > Billing. As your usage of the Together API and your spend on our API increases, we will automatically increase your rate limits.

Chat, language & code models

TierQualification criteriaRPMTPM
FreeUser must be in an allowed geography6060,000
Tier 1Credit card added600180,000
Tier 2$50 paid1,800250,000
Tier 3$100 paid3,000500,000
Tier 4$250 paid4,5001,000,000
Tier 5$1,000 paid6,0002,000,000

Embedding models

TierQualification criteriaRPMTPM
FreeUser must be in an allowed geography3,0001,000,000
Tier 1Credit card added3,0002,000,000
Tier 2$50 paid5,0002,000,000
Tier 3$100 paid5,00010,000,000
Tier 4$250 paid10,00010,000,000
Tier 5$1,000 paid10,00020,000,000

Re-rank models

TierQualification criteriaRPMTPM
FreeUser must be in an allowed geography1,000150,000
Tier 1Credit card added2,500500,000
Tier 2$50 paid3,5001,500,000
Tier 3$100 paid4,0002,000,000
Tier 4$250 paid7,5003,000,000
Tier 5$1,000 paid9,0005,000,000

Image models

TierQualification criteriaImg/min
FreeUser must be in an allowed geography60
Tier 1Credit card added240
Tier 2$50 paid480
Tier 3$100 paid600
Tier 4$250 paid960
Tier 5$1,000 paid1200

Note: FLUX.1 [schnell] Free has a model specific rate limit of 10 img/min.

You may experience congestion based on traffic from other users, and may be throttled to a lower level because of that. If you want committed capacity, contact our sales team to inquire about our Scale and Enterprise plans, which include custom RPM and unlimited TPM.

Rate limits in headers

The API response includes headers that display the rate limit enforcement, current usage, and when the limit will reset. We enforce limits per second and minute for token usage and per second for request rates, but the headers display per second limits only.

FieldDescription
x-ratelimit-limitThe maximum number of requests per sec that are permitted before exhausting the rate limit.
x-ratelimit-remainingThe remaining number of requests per sec that are permitted before exhausting the rate limit.
x-ratelimit-resetThe time until the rate limit (based on requests per sec) resets to its initial state.
x-tokenlimit-limitThe maximum number of tokens per sec that are permitted before exhausting the rate limit.
x-tokenlimit-remainingThe remaining number of tokens per sec that are permitted before exhausting the rate limit.