Skip to main content
Inference rate limits apply only to BYOK (Bring Your Own Key) requests and scale with your usage tier. Platform keys have no inference rate limits. Management endpoints have separate per-user limits.

Prerequisites

Inference rate limits

Your rate limit tier is determined by rolling 30-day inference spend and recalculates every 60 minutes:
Tier30-day spendBYOK RPMBYOK monthly capPlatform fee
Starter00 – 500301,0002.0%
Growth500500 – 10,00012050,0001.0%
Scale$10,000+600Unlimited0.5%
EnterpriseCustom1,200UnlimitedCustom
Enterprise tier is assigned manually — it is not auto-detected from spend.
The limits above apply only to BYOK requests. See BYOK for details.

Rate limit headers

Every response carries OpenAI-compatible rate limit headers:
HeaderDescription
Retry-AfterSeconds until rate limit resets (RFC 7231)
X-RateLimit-Limit-RequestsRequests allowed per window
X-RateLimit-Remaining-RequestsRequests remaining in current window
X-RateLimit-Reset-RequestsISO 8601 timestamp when the window resets

Management API rate limits

Management endpoints have separate per-user rate limits:
EndpointLimit
API key creation10/min
Billing checkout5/min
Billing portal5/min
Team invites20/min
BYOK operations20/min
Workspace creation5/min
Account deletion2/min
Budget writes10/min
Management reads (API key)60/min (per IP)
Public registry60/min (per IP)

Handle 429 responses

When you exceed a rate limit, the API returns a 429 Too Many Requests response with a Retry-After header indicating when to retry.
{
  "error": {
    "message": "Rate limit exceeded. Retry after 12 seconds.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}
The Auriko SDK handles retries automatically with exponential backoff (up to 2 retries by default). For manual handling, see Error handling — Retry manually.