Without rate limiting, your API is vulnerable to abuse, accidental overload, and deliberate attacks. A single misconfigured client can bring down your entire infrastructure. A malicious actor can scrape your data or execute denial-of-service attacks with ease.
This guide provides production-ready implementations of every major rate limiting algorithm, discusses trade-offs, and shows you how to choose the right approach for your use case.
Why Rate Limiting Matters
Rate limiting serves multiple critical purposes:
- Prevent resource exhaustion: Protect your infrastructure from being overwhelmed
- Ensure fair access: Prevent one client from monopolizing resources
- Mitigate DDoS attacks: Limit damage from distributed denial-of-service attempts
- Prevent scraping: Make systematic data harvesting economically infeasible
- Control costs: Limit downstream API costs (cloud services, third-party APIs)
- Enforce pricing tiers: Implement different limits for free vs paid users
Rate Limiting Algorithms
Different algorithms offer different trade-offs between simplicity, precision, and resource usage. Here are the four most important algorithms you need to understand.
1. Fixed Window Counter
The simplest algorithm: count requests in fixed time windows (e.g., per minute, per hour).
// Fixed window counter implementation
class FixedWindowRateLimiter {
constructor(limit, windowSize) {
this.limit = limit; // Max requests per window
this.windowSize = windowSize; // Window size in milliseconds
this.requests = new Map();
}
async isAllowed(key) {
const now = Date.now();
const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
const windowKey = `${key}:${windowStart}`;
const count = (await redis.get(windowKey)) || 0;
if (count >= this.limit) {
return {
allowed: false,
limit: this.limit,
remaining: 0,
resetAt: windowStart + this.windowSize
};
}
await redis.incr(windowKey);
await redis.pexpire(windowKey, this.windowSize);
return {
allowed: true,
limit: this.limit,
remaining: this.limit - count - 1,
resetAt: windowStart + this.windowSize
};
}
}
// Usage: 100 requests per minute
const limiter = new FixedWindowRateLimiter(100, 60000);
Pros: Simple to implement, memory-efficient, precise window resets
Cons: Vulnerable to burst attacks at window boundaries—a user can make 100 requests at 00:59 and another 100 at 01:00, effectively 200 requests in 1 second
Fixed windows allow double the limit around window boundaries. If you have a 100 req/min limit, attackers can send 100 requests at 12:00:59 and 100 more at 12:01:00—200 requests in 2 seconds. Use sliding windows or token bucket for better protection.
2. Sliding Window Log
Store timestamps of each request and count only those within the current window. Provides precise rate limiting without boundary issues.
// Sliding window log implementation
class SlidingWindowLogRateLimiter {
constructor(limit, windowSize) {
this.limit = limit;
this.windowSize = windowSize;
}
async isAllowed(key) {
const now = Date.now();
const windowStart = now - this.windowSize;
// Remove requests older than window
await redis.zremrangebyscore(key, 0, windowStart);
// Count requests in current window
const count = await redis.zcard(key);
if (count >= this.limit) {
const oldestTimestamp = await redis.zrange(key, 0, 0, 'WITHSCORES');
const resetAt = oldestTimestamp[1] + this.windowSize;
return {
allowed: false,
limit: this.limit,
remaining: 0,
resetAt: resetAt
};
}
// Add current request
await redis.zadd(key, now, `${now}-${Math.random()}`);
await redis.pexpire(key, this.windowSize);
return {
allowed: true,
limit: this.limit,
remaining: this.limit - count - 1
};
}
}
// Usage: 100 requests per 60 seconds
const limiter = new SlidingWindowLogRateLimiter(100, 60000);
Pros: No boundary burst issues, precise rate limiting
Cons: High memory usage (stores every timestamp), more complex to implement
3. Sliding Window Counter
Hybrid approach combining fixed windows with weighted counts from previous window. Balances accuracy with efficiency.
// Sliding window counter implementation
class SlidingWindowCounterRateLimiter {
constructor(limit, windowSize) {
this.limit = limit;
this.windowSize = windowSize;
}
async isAllowed(key) {
const now = Date.now();
const currentWindow = Math.floor(now / this.windowSize);
const previousWindow = currentWindow - 1;
const currentKey = `${key}:${currentWindow}`;
const previousKey = `${key}:${previousWindow}`;
const currentCount = (await redis.get(currentKey)) || 0;
const previousCount = (await redis.get(previousKey)) || 0;
// Calculate position in current window (0 to 1)
const percentageIntoCurrentWindow =
(now % this.windowSize) / this.windowSize;
// Weighted count: (previous window * remaining %) + current window
const estimatedCount =
(previousCount * (1 - percentageIntoCurrentWindow)) + currentCount;
if (estimatedCount >= this.limit) {
return { allowed: false, limit: this.limit, remaining: 0 };
}
await redis.incr(currentKey);
await redis.pexpire(currentKey, this.windowSize * 2);
return {
allowed: true,
limit: this.limit,
remaining: Math.floor(this.limit - estimatedCount - 1)
};
}
}
// Usage: 100 requests per minute
const limiter = new SlidingWindowCounterRateLimiter(100, 60000);
Pros: Memory-efficient, smoother than fixed windows, no boundary bursts
Cons: Approximate (not exact), slightly more complex than fixed window
4. Token Bucket
The most flexible algorithm. Tokens are added to a bucket at a steady rate. Each request consumes a token. Allows controlled bursts while maintaining long-term rate.
// Token bucket implementation
class TokenBucketRateLimiter {
constructor(capacity, refillRate) {
this.capacity = capacity; // Max tokens (burst capacity)
this.refillRate = refillRate; // Tokens added per second
}
async isAllowed(key, tokens = 1) {
const now = Date.now() / 1000; // Convert to seconds
const bucket = await redis.hmget(key, 'tokens', 'lastRefill');
let availableTokens = parseFloat(bucket[0]) || this.capacity;
let lastRefill = parseFloat(bucket[1]) || now;
// Calculate tokens to add based on time elapsed
const timePassed = now - lastRefill;
const tokensToAdd = timePassed * this.refillRate;
availableTokens = Math.min(
this.capacity,
availableTokens + tokensToAdd
);
if (availableTokens < tokens) {
const waitTime = (tokens - availableTokens) / this.refillRate;
return {
allowed: false,
tokens: availableTokens,
capacity: this.capacity,
retryAfter: waitTime
};
}
availableTokens -= tokens;
await redis.hmset(key, {
tokens: availableTokens,
lastRefill: now
});
await redis.expire(key, Math.ceil(this.capacity / this.refillRate));
return {
allowed: true,
tokens: availableTokens,
capacity: this.capacity
};
}
}
// Usage: 100 token capacity, refill at 10 tokens/second
// Allows bursts of 100, but sustained rate of 10/sec
const limiter = new TokenBucketRateLimiter(100, 10);
Pros: Allows controlled bursts, flexible, smooth long-term rate control
Cons: More complex to implement and understand, requires floating-point math
Token bucket is ideal when you want to allow short bursts (e.g., uploading multiple files) while controlling the long-term average rate. It's the most user-friendly algorithm because it doesn't penalize occasional burst activity.
Multi-Dimensional Rate Limiting
Production APIs need multiple rate limiting dimensions to defend against sophisticated attacks:
// Multi-dimensional rate limiting
async function checkRateLimits(req) {
const checks = [
// Per-user limits
{
key: `user:${req.userId}`,
limiter: new TokenBucketRateLimiter(1000, 10), // 1000 burst, 10/sec sustained
priority: 1
},
// Per-IP limits (broader, for shared IPs)
{
key: `ip:${req.ip}`,
limiter: new TokenBucketRateLimiter(5000, 50), // More lenient for shared IPs
priority: 2
},
// Per-endpoint limits
{
key: `endpoint:${req.userId}:${req.endpoint}`,
limiter: getEndpointLimiter(req.endpoint), // Different limits per endpoint
priority: 1
},
// Global limits (protect infrastructure)
{
key: 'global',
limiter: new TokenBucketRateLimiter(100000, 1000), // Infrastructure limit
priority: 3
}
];
for (const check of checks) {
const result = await check.limiter.isAllowed(check.key);
if (!result.allowed) {
return {
allowed: false,
limitType: check.key.split(':')[0],
retryAfter: result.retryAfter || 60,
...result
};
}
}
return { allowed: true };
}
function getEndpointLimiter(endpoint) {
const limits = {
'/api/search': new TokenBucketRateLimiter(100, 5), // Expensive
'/api/users/:id': new TokenBucketRateLimiter(1000, 20), // Moderate
'/api/health': new TokenBucketRateLimiter(10000, 100) // Cheap
};
return limits[endpoint] || new TokenBucketRateLimiter(500, 10);
}
Distributed Rate Limiting
For multi-server deployments, you need distributed rate limiting with shared state:
Redis-Based Implementation
// Distributed rate limiting with Redis
const Redis = require('ioredis');
const redis = new Redis({
host: 'redis-cluster.example.com',
port: 6379,
enableReadyCheck: true,
maxRetriesPerRequest: 3
});
// Atomic token bucket using Redis + Lua script
const tokenBucketScript = `
local capacity = tonumber(ARGV[1])
local refill_rate = tonumber(ARGV[2])
local tokens_requested = tonumber(ARGV[3])
local now = tonumber(ARGV[4])
local tokens = tonumber(redis.call('HGET', KEYS[1], 'tokens')) or capacity
local last_refill = tonumber(redis.call('HGET', KEYS[1], 'lastRefill')) or now
local tokens_to_add = (now - last_refill) * refill_rate
tokens = math.min(capacity, tokens + tokens_to_add)
if tokens < tokens_requested then
return {0, tokens}
end
tokens = tokens - tokens_requested
redis.call('HMSET', KEYS[1], 'tokens', tokens, 'lastRefill', now)
redis.call('EXPIRE', KEYS[1], math.ceil(capacity / refill_rate))
return {1, tokens}
`;
async function checkDistributedRateLimit(key, capacity, refillRate, tokensRequested = 1) {
const now = Date.now() / 1000;
const result = await redis.eval(
tokenBucketScript,
1,
key,
capacity,
refillRate,
tokensRequested,
now
);
return {
allowed: result[0] === 1,
remaining: result[1]
};
}
Handling Redis Failures
When Redis is unavailable, you must choose: fail-open (allow requests, risk overload) or fail-closed (deny requests, impact availability). Most systems fail-open with local in-memory fallback limits to balance risk.
// Graceful degradation when Redis fails
async function rateLimitWithFallback(key, limiter) {
try {
return await limiter.isAllowed(key);
} catch (redisError) {
console.error('Redis failure, falling back to local limit', redisError);
// Fallback to local in-memory rate limiting
// More permissive to avoid false positives across servers
return localMemoryLimiter.isAllowed(key);
}
}
Response Headers and User Experience
Communicate rate limit status to clients using standard headers:
// Rate limit response headers
app.use(async (req, res, next) => {
const limitResult = await checkRateLimit(req.userId);
// Standard rate limit headers
res.setHeader('X-RateLimit-Limit', limitResult.limit);
res.setHeader('X-RateLimit-Remaining', limitResult.remaining);
res.setHeader('X-RateLimit-Reset', limitResult.resetAt);
if (!limitResult.allowed) {
res.setHeader('Retry-After', limitResult.retryAfter || 60);
return res.status(429).json({
error: 'Rate limit exceeded',
message: 'Too many requests. Please try again later.',
retryAfter: limitResult.retryAfter,
limit: limitResult.limit,
resetAt: new Date(limitResult.resetAt).toISOString()
});
}
next();
});
Advanced Strategies
Adaptive Rate Limiting
Adjust limits dynamically based on user behavior and threat level:
// Adaptive rate limiting
async function getAdaptiveLimit(userId) {
const baseLimit = 1000; // Base limit for normal users
const suspiciousScore = await calculateSuspiciousScore(userId);
if (suspiciousScore > 0.8) {
return baseLimit * 0.2; // Reduce to 20% for highly suspicious
} else if (suspiciousScore > 0.5) {
return baseLimit * 0.5; // Reduce to 50% for moderately suspicious
}
// Check if user is premium
const isPremium = await checkPremiumStatus(userId);
if (isPremium) {
return baseLimit * 5; // 5x limit for premium users
}
return baseLimit;
}
Quota Systems
Implement monthly/daily quotas in addition to per-second rate limits:
// Quota system with rate limiting
async function checkQuotaAndRate(userId) {
// Check monthly quota
const monthlyQuota = await getMonthlyQuota(userId);
const monthlyUsage = await redis.get(`quota:monthly:${userId}`);
if (monthlyUsage >= monthlyQuota) {
return {
allowed: false,
reason: 'Monthly quota exceeded',
quotaReset: getStartOfNextMonth()
};
}
// Check rate limit
const rateLimit = await checkRateLimit(userId);
if (!rateLimit.allowed) {
return rateLimit;
}
// Increment both counters
await redis.incr(`quota:monthly:${userId}`);
await redis.expireat(`quota:monthly:${userId}`, getStartOfNextMonth());
return { allowed: true };
}
Choosing the Right Algorithm
Here's a decision framework:
- Fixed Window: Use for simple quotas (e.g., "1000 requests per day") where precision isn't critical
- Sliding Window Log: Use when you need precise rate limiting and can afford memory overhead
- Sliding Window Counter: Best all-around choice for most APIs—good balance of accuracy and efficiency
- Token Bucket: Use when you want to allow bursts while controlling long-term rate. Best user experience.
For most APIs, start with Token Bucket for user-facing endpoints (better UX with burst tolerance) and Sliding Window Counter for internal/backend rate limiting (more predictable resource usage).
How KnoxCall Implements Intelligent Rate Limiting
KnoxCall provides production-ready rate limiting out of the box:
- Adaptive algorithms: Automatically chooses optimal algorithm per endpoint
- Multi-dimensional: Per-user, per-IP, per-endpoint, and global limits
- Distributed by default: Works across multiple servers with Redis backend
- Threat-aware: Reduces limits automatically when scraping detected
- Graceful degradation: Fallback to local limits if Redis fails
- Clear feedback: Standard headers and error messages for clients
- Quota management: Built-in daily/monthly quota tracking
Key Takeaways
- Rate limiting is essential for API security, stability, and fair resource allocation
- Token bucket offers the best user experience by allowing controlled bursts
- Sliding window counter provides the best balance of accuracy and efficiency
- Multi-dimensional rate limiting (user + IP + endpoint) is necessary for modern APIs
- Always implement graceful degradation for distributed rate limiting systems
- Communicate limits clearly to clients using standard headers
- Consider adaptive rate limiting that adjusts based on user behavior