Prerequisites
Inference rate limits
Your rate limit tier is determined by rolling 30-day inference spend and recalculates every 60 minutes:| Tier | 30-day spend | BYOK RPM | BYOK monthly cap | Platform fee |
|---|---|---|---|---|
| Starter | 500 | 30 | 1,000 | 2.0% |
| Growth | 10,000 | 120 | 50,000 | 1.0% |
| Scale | $10,000+ | 600 | Unlimited | 0.5% |
| Enterprise | Custom | 1,200 | Unlimited | Custom |
The limits above apply only to BYOK requests. See BYOK for details.
Rate limit headers
Every response carries OpenAI-compatible rate limit headers:| Header | Description |
|---|---|
Retry-After | Seconds until rate limit resets (RFC 7231) |
X-RateLimit-Limit-Requests | Requests allowed per window |
X-RateLimit-Remaining-Requests | Requests remaining in current window |
X-RateLimit-Reset-Requests | ISO 8601 timestamp when the window resets |
Management API rate limits
Management endpoints have separate per-user rate limits:| Endpoint | Limit |
|---|---|
| API key creation | 10/min |
| Billing checkout | 5/min |
| Billing portal | 5/min |
| Team invites | 20/min |
| BYOK operations | 20/min |
| Workspace creation | 5/min |
| Account deletion | 2/min |
| Budget writes | 10/min |
| Management reads (API key) | 60/min (per IP) |
| Public registry | 60/min (per IP) |
Handle 429 responses
When you exceed a rate limit, the API returns a429 Too Many Requests response with a Retry-After header indicating when to retry.