Rate limits
Rate limits restrict how often a user or client can access our API within a set timeframe.
Rate limiting refers to the constraints our API enforces on how frequently a user or client can access our services within a given timeframe. Rate limits are denoted as HTTP status code 429s. Read more about our rate limit tiers below, and find out how you can increase them here:
- If you have a high volume of steady traffic and good payment history for this traffic, you can request a higher limit here.
- If you are interested in our Scale or Enterprise packages, with custom requests per minute (RPM) and unlimited tokens per minute (TPM), please reach out to sales here.
What is the purpose of rate limits?
Rate limits in APIs are a standard approach, and they serve to safeguard against abuse or misuse of the API, helping to ensure equitable access to the API with consistent performance.
How are our rate limits implemented?
Our rate limits are currently measured in requests per second (RPS) and tokens per second (TPS) for each model type. If you exceed any of the rate limits you will get a 429 error. We show you the values per minute below, as its the industry standard.
Important: when we launch support for a brand new model, we may temporarily disable automatic increases for that given model to make sure our service levels are stable. We strive to turn this feature on as soon as possible.
Rate limit tiers
You can view your rate limit by navigating to Settings > Billing. As your usage of the Together API and your spend on our API increases, we will automatically increase your rate limits.
Chat, language & code models
Tier | Qualification criteria | RPM | TPM |
---|---|---|---|
Free | User must be in an allowed geography | 60 | 60,000 |
Tier 1 | Credit card added, $25 paid | 600 | 180,000 |
Tier 2 | $50 paid | 1,800 | 250,000 |
Tier 3 | $100 paid | 3,000 | 500,000 |
Tier 4 | $250 paid | 4,500 | 1,000,000 |
Tier 5 | $1,000 paid | 6,000 | 2,000,000 |
Note: Due to high demand, DeepSeek R1 has a model specific rate limit of 6 RPM for Free Users, and 12 RPM for Build Tier 1 users.
Embedding models
Tier | Qualification criteria | RPM | TPM |
---|---|---|---|
Free | User must be in an allowed geography | 3,000 | 1,000,000 |
Tier 1 | Credit card added, $25 paid | 3,000 | 2,000,000 |
Tier 2 | $50 paid | 5,000 | 2,000,000 |
Tier 3 | $100 paid | 5,000 | 10,000,000 |
Tier 4 | $250 paid | 10,000 | 10,000,000 |
Tier 5 | $1,000 paid | 10,000 | 20,000,000 |
Re-rank models
Tier | Qualification criteria | RPM | TPM |
---|---|---|---|
Free | User must be in an allowed geography | 1,000 | 150,000 |
Tier 1 | Credit card added, $25 paid | 2,500 | 500,000 |
Tier 2 | $50 paid | 3,500 | 1,500,000 |
Tier 3 | $100 paid | 4,000 | 2,000,000 |
Tier 4 | $250 paid | 7,500 | 3,000,000 |
Tier 5 | $1,000 paid | 9,000 | 5,000,000 |
Image models
Tier | Qualification criteria | Img/min |
---|---|---|
Free | User must be in an allowed geography | 60 |
Tier 1 | Credit card added, $25 paid | 240 |
Tier 2 | $50 paid | 480 |
Tier 3 | $100 paid | 600 |
Tier 4 | $250 paid | 960 |
Tier 5 | $1,000 paid | 1200 |
Note: Due to high demand, FLUX.1 [schnell] Free has a model specific rate limit of 10 img/min.
You may experience congestion based on traffic from other users, and may be throttled to a lower level because of that. If you want committed capacity, contact our sales team to inquire about our Scale and Enterprise plans, which include custom RPM and unlimited TPM.
Rate limits in headers
The API response includes headers that display the rate limit enforcement, current usage, and when the limit will reset. We enforce limits per second and minute for token usage and per second for request rates, but the headers display per second limits only.
Field | Description |
---|---|
x-ratelimit-limit | The maximum number of requests per sec that are permitted before exhausting the rate limit. |
x-ratelimit-remaining | The remaining number of requests per sec that are permitted before exhausting the rate limit. |
x-ratelimit-reset | The time until the rate limit (based on requests per sec) resets to its initial state. |
x-tokenlimit-limit | The maximum number of tokens per sec that are permitted before exhausting the rate limit. |
x-tokenlimit-remaining | The remaining number of tokens per sec that are permitted before exhausting the rate limit. |
Updated about 24 hours ago