A few months ago, a deploy script I wrote hammered a third-party API with 500 requests in 10 seconds. The API returned 429 Too Many Requests on every request after the first 100, and the script’s error handling was catch(e) { retry() } — which made things exponentially worse. Within 30 seconds, the script had been IP-banned for 24 hours right before a critical deployment window.
Rate limiting is the API world’s traffic management system. Whether you’re building an API or consuming one, understanding how it works saves you from outages, bans, and very stressful on-call incidents.
What Rate Limiting Actually Does
Rate limiting restricts how many API requests a client can make within a time window. If you exceed the limit, the server responds with HTTP 429 Too Many Requests instead of processing your request.
From the server side, rate limiting serves three purposes:
- Prevents abuse — stops malicious scripts from overwhelming the API
- Ensures fair usage — one heavy user shouldn’t degrade performance for everyone else
- Controls infrastructure costs — every API call costs compute, memory, and bandwidth
The Common Rate Limit Algorithms
Token Bucket
The token bucket is the most widely used algorithm because it allows controlled bursts while enforcing an average rate.
Imagine a bucket that holds 100 tokens. Each API request takes one token. Tokens are added back at a steady rate (say, 10 per second). When the bucket is empty, requests are rejected until more tokens accumulate.
Bucket capacity: 100 tokens
Refill rate: 10 tokens/second
→ You can burst 100 requests instantly
→ After that, sustain 10 requests/second
→ If idle for 10 seconds, bucket refills to 100
class TokenBucket {
constructor(capacity, refillRate) {
this.capacity = capacity;
this.tokens = capacity;
this.refillRate = refillRate; // tokens per millisecond
this.lastRefill = Date.now();
}
consume() {
this.refill();
if (this.tokens >= 1) {
this.tokens -= 1;
return true; // Request allowed
}
return false; // Rate limited
}
refill() {
const now = Date.now();
const elapsed = now - this.lastRefill;
this.tokens = Math.min(
this.capacity,
this.tokens + elapsed * this.refillRate
);
this.lastRefill = now;
}
}
Sliding Window
The sliding window algorithm counts requests within a rolling time window. Unlike fixed windows (which can allow 2x the limit at window boundaries), sliding windows provide consistent enforcement.
Window: 60 seconds
Limit: 100 requests
At any point in time, the server counts requests
from the last 60 seconds. If count >= 100, reject.
Most production APIs use a weighted sliding window (combining the current and previous window counts) for efficiency — it avoids storing every individual request timestamp.
Fixed Window
The simplest algorithm: reset the counter every N seconds.
Window: 60 seconds
Limit: 100 requests
Request at 59s: allowed (99/100 used)
Request at 61s: allowed (1/100 — new window)
The catch: a burst of 100 requests at second 59 followed by 100 at second 61 means 200 requests in 2 seconds, even though the “limit” is 100 per minute. This boundary problem is why most serious APIs use sliding windows instead.
Reading Rate Limit Headers
Well-designed APIs tell you exactly where you stand with response headers:
HTTP/1.1 200 OK
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 73
X-RateLimit-Reset: 1700000060
Retry-After: 30
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Maximum requests allowed per window |
X-RateLimit-Remaining | How many requests you have left |
X-RateLimit-Reset | Unix timestamp when the window resets |
Retry-After | Seconds to wait before retrying (on 429 responses) |
Need to convert timestamps? The
X-RateLimit-Resetheader is a Unix timestamp. Use our Unix Timestamp Converter to quickly see when your rate limit window resets in human-readable format.
Handling Rate Limits as an API Consumer
Exponential Backoff with Jitter
When you hit a 429, the worst thing you can do is immediately retry. The second worst thing is retrying after a fixed delay — because every other rate-limited client will also retry at the same moment, creating a “thundering herd.”
The solution is exponential backoff with jitter:
async function fetchWithRetry(url, options = {}, maxRetries = 5) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.status !== 429) {
return response;
}
// Check Retry-After header first
const retryAfter = response.headers.get('Retry-After');
let delay;
if (retryAfter) {
delay = parseInt(retryAfter, 10) * 1000;
} else {
// Exponential backoff: 1s, 2s, 4s, 8s, 16s
const baseDelay = Math.pow(2, attempt) * 1000;
// Add jitter: random value between 0 and baseDelay
delay = baseDelay + Math.random() * baseDelay;
}
console.log(`Rate limited. Retrying in ${Math.round(delay)}ms...`);
await new Promise(resolve => setTimeout(resolve, delay));
}
throw new Error(`Failed after ${maxRetries} retries`);
}
The jitter is crucial — it spreads retry attempts across time so they don’t all hit the server simultaneously.
Client-Side Rate Limiting
The best strategy is to never hit the limit in the first place. If you know the API allows 100 requests per minute, throttle your client:
class RateLimiter {
constructor(maxRequests, windowMs) {
this.maxRequests = maxRequests;
this.windowMs = windowMs;
this.queue = [];
this.timestamps = [];
}
async execute(fn) {
return new Promise((resolve, reject) => {
this.queue.push({ fn, resolve, reject });
this.processQueue();
});
}
processQueue() {
const now = Date.now();
// Remove timestamps outside the window
this.timestamps = this.timestamps.filter(
t => now - t < this.windowMs
);
while (this.queue.length > 0 &&
this.timestamps.length < this.maxRequests) {
const { fn, resolve, reject } = this.queue.shift();
this.timestamps.push(now);
fn().then(resolve).catch(reject);
}
// Schedule next check if queue has items
if (this.queue.length > 0) {
const oldestTimestamp = this.timestamps[0];
const waitTime = this.windowMs - (now - oldestTimestamp);
setTimeout(() => this.processQueue(), waitTime);
}
}
}
// Usage
const limiter = new RateLimiter(100, 60000); // 100 req/min
await limiter.execute(() => fetch('/api/users'));
Implementing Rate Limiting on Your Server
Express.js with Redis
For production APIs, use Redis as the rate limit store — it handles concurrent requests correctly and works across multiple server instances:
const rateLimit = require('express-rate-limit');
const RedisStore = require('rate-limit-redis');
const Redis = require('ioredis');
const redis = new Redis(process.env.REDIS_URL);
const apiLimiter = rateLimit({
store: new RedisStore({ sendCommand: (...args) => redis.call(...args) }),
windowMs: 60 * 1000, // 1 minute
max: 100, // 100 requests per window
standardHeaders: true, // Send RateLimit-* headers
legacyHeaders: false, // Disable X-RateLimit-* headers
message: {
error: 'Too many requests',
retryAfter: 60,
},
});
app.use('/api/', apiLimiter);
Per-User vs Per-IP Limiting
IP-based limiting is easy but inaccurate — multiple users behind a corporate NAT share one IP. API key or token-based limiting is more precise:
const userLimiter = rateLimit({
keyGenerator: (req) => {
// Use API key if available, fall back to IP
return req.headers['x-api-key'] || req.ip;
},
max: (req) => {
// Different limits for different tiers
if (req.user?.plan === 'premium') return 1000;
if (req.user?.plan === 'basic') return 100;
return 20; // Anonymous / free tier
},
windowMs: 60 * 1000,
});
Common Mistakes
Mistake 1: Retrying Immediately on 429
// WRONG: creates a retry storm
try {
const res = await fetch(url);
} catch (e) {
const res = await fetch(url); // Instantly retries
}
Always use exponential backoff. And always check the Retry-After header — the server is telling you exactly how long to wait.
Mistake 2: Rate Limiting After Authentication Only
Public endpoints (login, registration, password reset) need rate limiting too — arguably more than authenticated endpoints. Without it, attackers can brute-force credentials or enumerate email addresses.
// Rate limit login attempts aggressively
const loginLimiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 10, // 10 attempts per window
skipSuccessfulRequests: true, // Don't count successful logins
});
app.post('/auth/login', loginLimiter, loginHandler);
Mistake 3: Not Distinguishing Read vs Write Limits
GET requests are typically cheaper than POST/PUT/DELETE. Apply different limits:
const readLimiter = rateLimit({ windowMs: 60000, max: 200 });
const writeLimiter = rateLimit({ windowMs: 60000, max: 50 });
app.get('/api/*', readLimiter);
app.post('/api/*', writeLimiter);
app.put('/api/*', writeLimiter);
app.delete('/api/*', writeLimiter);
Mistake 4: Ignoring Rate Limit Headers in Your Client
I’ve reviewed codebases where developers parse API responses carefully but completely ignore the rate limit headers. Those headers are free information — use them to preemptively pause requests before hitting the limit.
Quick Checklist
As an API provider:
- Implement sliding window or token bucket rate limiting
- Use Redis for distributed rate limiting across instances
- Return standard
RateLimit-*headers andRetry-Afteron 429 - Rate limit public endpoints (login, signup, password reset)
- Apply different limits for read vs write operations
- Log rate limit events for monitoring abuse patterns
As an API consumer:
- Implement exponential backoff with jitter on 429
- Read and respect
Retry-Afterheaders - Throttle requests client-side to stay under limits
- Monitor
X-RateLimit-Remainingto preemptively slow down - Cache responses to reduce unnecessary API calls
Further Reading
Need to debug API responses? Our JSON Formatter helps you inspect rate limit error payloads, and the Unix Timestamp Converter decodes those X-RateLimit-Reset timestamps into readable dates.