Rate limits
1 query per second (QPS) for free users, 10 QPS for users who add a credit card.
Rate limiting refers to the constraints our API enforces on how frequently a user or client can access our services within a given timeframe. Rate limits are denoted as HTTP status code 429s.
What is the purpose of rate limits?
Rate limits in APIs are a standard approach, and they serve to safeguard against abuse or misuse of the API, helping to ensure equitable access to the API with consistent performance.
Tier-based rate limits
Tier | Rate limit |
---|---|
Free | 1 QPS |
Paid | 10 QPS |
More tiers based on consistent usage coming soon!
Exceptions
Model | Rate limit |
---|---|
Llama 3-405B | 20 QPS for paid users |
Rate limits in headers
Field | Description |
---|---|
x-ratelimit-limit | The maximum number of requests that are permitted before exhausting the rate limit. |
x-ratelimit-remaining | The remaining number of requests that are permitted before exhausting the rate limit. |
x-ratelimit-reset | The time until the rate limit (based on requests) resets to its initial state. |
If you're interested in a higher rate limit, contact our sales team!
Updated 4 days ago