Rate Limits
Understand rate limits per tier: RPM, TPM, and concurrent request limits.
Navigation
Rate Limits
Rate limits protect your deployment from excessive load and ensure fair usage. Limits vary by pricing tier.
Limits by Tier
| Tier | RPM (Requests/min) | TPM (Tokens/min) | Concurrent Requests |
|---|---|---|---|
| Starter | 60 | 40,000 | 5 |
| Standard | 120 | 100,000 | 10 |
| Advanced | 300 | 300,000 | 25 |
| Pro | 600 | 600,000 | 50 |
| Max | 1,200 | 1,000,000 | 100 |
| Enterprise | Custom | Custom | Custom |
Rate Limit Headers
Every API response includes rate limit information:
X-RateLimit-Limit-Requests: 120
X-RateLimit-Remaining-Requests: 118
X-RateLimit-Limit-Tokens: 100000
X-RateLimit-Remaining-Tokens: 98500
X-RateLimit-Reset: 2026-04-22T10:01:00ZExceeding Limits
When you exceed the rate limit, the API returns a 429 status with a Retry-After header indicating how many seconds to wait.
{
"error": {
"message": "Rate limit exceeded. Please retry after 2 seconds.",
"type": "rate_limit_error",
"code": "rate_limit_exceeded"
}
}