DocsDeployment APIRate Limits

Rate Limits

Understand rate limits per tier: RPM, TPM, and concurrent request limits.

Navigation

Rate Limits

Rate limits protect your deployment from excessive load and ensure fair usage. Limits vary by pricing tier.

Limits by Tier

TierRPM (Requests/min)TPM (Tokens/min)Concurrent Requests
Starter6040,0005
Standard120100,00010
Advanced300300,00025
Pro600600,00050
Max1,2001,000,000100
EnterpriseCustomCustomCustom

Rate Limit Headers

Every API response includes rate limit information:

X-RateLimit-Limit-Requests: 120
X-RateLimit-Remaining-Requests: 118
X-RateLimit-Limit-Tokens: 100000
X-RateLimit-Remaining-Tokens: 98500
X-RateLimit-Reset: 2026-04-22T10:01:00Z

Exceeding Limits

When you exceed the rate limit, the API returns a 429 status with a Retry-After header indicating how many seconds to wait.

{
  "error": {
    "message": "Rate limit exceeded. Please retry after 2 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}