API Rate Limiting: Complete Implementation Guide

Without rate limiting, your API is vulnerable to abuse, accidental overload, and deliberate attacks. A single misconfigured client can bring down your entire infrastructure. A malicious actor can scrape your data or execute denial-of-service attacks with ease.

85%

of API outages could have been prevented with proper rate limiting

This guide provides production-ready implementations of every major rate limiting algorithm, discusses trade-offs, and shows you how to choose the right approach for your use case.

Why Rate Limiting Matters

Rate limiting serves multiple critical purposes:

Prevent resource exhaustion: Protect your infrastructure from being overwhelmed
Ensure fair access: Prevent one client from monopolizing resources
Mitigate DDoS attacks: Limit damage from distributed denial-of-service attempts
Prevent scraping: Make systematic data harvesting economically infeasible
Control costs: Limit downstream API costs (cloud services, third-party APIs)
Enforce pricing tiers: Implement different limits for free vs paid users

Rate Limiting Algorithms

Different algorithms offer different trade-offs between simplicity, precision, and resource usage. Here are the four most important algorithms you need to understand.

1. Fixed Window Counter

The simplest algorithm: count requests in fixed time windows (e.g., per minute, per hour).

// Fixed window counter implementation
class FixedWindowRateLimiter {
  constructor(limit, windowSize) {
    this.limit = limit;          // Max requests per window
    this.windowSize = windowSize; // Window size in milliseconds
    this.requests = new Map();
  }

  async isAllowed(key) {
    const now = Date.now();
    const windowStart = Math.floor(now / this.windowSize) * this.windowSize;
    const windowKey = `${key}:${windowStart}`;

    const count = (await redis.get(windowKey)) || 0;

    if (count >= this.limit) {
      return {
        allowed: false,
        limit: this.limit,
        remaining: 0,
        resetAt: windowStart + this.windowSize
      };
    }

    await redis.incr(windowKey);
    await redis.pexpire(windowKey, this.windowSize);

    return {
      allowed: true,
      limit: this.limit,
      remaining: this.limit - count - 1,
      resetAt: windowStart + this.windowSize
    };
  }
}

// Usage: 100 requests per minute
const limiter = new FixedWindowRateLimiter(100, 60000);

Pros: Simple to implement, memory-efficient, precise window resets

Cons: Vulnerable to burst attacks at window boundaries—a user can make 100 requests at 00:59 and another 100 at 01:00, effectively 200 requests in 1 second

Boundary Burst Problem

Fixed windows allow double the limit around window boundaries. If you have a 100 req/min limit, attackers can send 100 requests at 12:00:59 and 100 more at 12:01:00—200 requests in 2 seconds. Use sliding windows or token bucket for better protection.

2. Sliding Window Log

Store timestamps of each request and count only those within the current window. Provides precise rate limiting without boundary issues.

// Sliding window log implementation
class SlidingWindowLogRateLimiter {
  constructor(limit, windowSize) {
    this.limit = limit;
    this.windowSize = windowSize;
  }

  async isAllowed(key) {
    const now = Date.now();
    const windowStart = now - this.windowSize;

    // Remove requests older than window
    await redis.zremrangebyscore(key, 0, windowStart);

    // Count requests in current window
    const count = await redis.zcard(key);

    if (count >= this.limit) {
      const oldestTimestamp = await redis.zrange(key, 0, 0, 'WITHSCORES');
      const resetAt = oldestTimestamp[1] + this.windowSize;

      return {
        allowed: false,
        limit: this.limit,
        remaining: 0,
        resetAt: resetAt
      };
    }

    // Add current request
    await redis.zadd(key, now, `${now}-${Math.random()}`);
    await redis.pexpire(key, this.windowSize);

    return {
      allowed: true,
      limit: this.limit,
      remaining: this.limit - count - 1
    };
  }
}

// Usage: 100 requests per 60 seconds
const limiter = new SlidingWindowLogRateLimiter(100, 60000);

Pros: No boundary burst issues, precise rate limiting

Cons: High memory usage (stores every timestamp), more complex to implement

3. Sliding Window Counter

Hybrid approach combining fixed windows with weighted counts from previous window. Balances accuracy with efficiency.

// Sliding window counter implementation
class SlidingWindowCounterRateLimiter {
  constructor(limit, windowSize) {
    this.limit = limit;
    this.windowSize = windowSize;
  }

  async isAllowed(key) {
    const now = Date.now();
    const currentWindow = Math.floor(now / this.windowSize);
    const previousWindow = currentWindow - 1;

    const currentKey = `${key}:${currentWindow}`;
    const previousKey = `${key}:${previousWindow}`;

    const currentCount = (await redis.get(currentKey)) || 0;
    const previousCount = (await redis.get(previousKey)) || 0;

    // Calculate position in current window (0 to 1)
    const percentageIntoCurrentWindow =
      (now % this.windowSize) / this.windowSize;

    // Weighted count: (previous window * remaining %) + current window
    const estimatedCount =
      (previousCount * (1 - percentageIntoCurrentWindow)) + currentCount;

    if (estimatedCount >= this.limit) {
      return { allowed: false, limit: this.limit, remaining: 0 };
    }

    await redis.incr(currentKey);
    await redis.pexpire(currentKey, this.windowSize * 2);

    return {
      allowed: true,
      limit: this.limit,
      remaining: Math.floor(this.limit - estimatedCount - 1)
    };
  }
}

// Usage: 100 requests per minute
const limiter = new SlidingWindowCounterRateLimiter(100, 60000);

Pros: Memory-efficient, smoother than fixed windows, no boundary bursts

Cons: Approximate (not exact), slightly more complex than fixed window

4. Token Bucket

The most flexible algorithm. Tokens are added to a bucket at a steady rate. Each request consumes a token. Allows controlled bursts while maintaining long-term rate.

// Token bucket implementation
class TokenBucketRateLimiter {
  constructor(capacity, refillRate) {
    this.capacity = capacity;        // Max tokens (burst capacity)
    this.refillRate = refillRate;    // Tokens added per second
  }

  async isAllowed(key, tokens = 1) {
    const now = Date.now() / 1000;  // Convert to seconds

    const bucket = await redis.hmget(key, 'tokens', 'lastRefill');
    let availableTokens = parseFloat(bucket[0]) || this.capacity;
    let lastRefill = parseFloat(bucket[1]) || now;

    // Calculate tokens to add based on time elapsed
    const timePassed = now - lastRefill;
    const tokensToAdd = timePassed * this.refillRate;

    availableTokens = Math.min(
      this.capacity,
      availableTokens + tokensToAdd
    );

    if (availableTokens < tokens) {
      const waitTime = (tokens - availableTokens) / this.refillRate;
      return {
        allowed: false,
        tokens: availableTokens,
        capacity: this.capacity,
        retryAfter: waitTime
      };
    }

    availableTokens -= tokens;

    await redis.hmset(key, {
      tokens: availableTokens,
      lastRefill: now
    });
    await redis.expire(key, Math.ceil(this.capacity / this.refillRate));

    return {
      allowed: true,
      tokens: availableTokens,
      capacity: this.capacity
    };
  }
}

// Usage: 100 token capacity, refill at 10 tokens/second
// Allows bursts of 100, but sustained rate of 10/sec
const limiter = new TokenBucketRateLimiter(100, 10);

Pros: Allows controlled bursts, flexible, smooth long-term rate control

Cons: More complex to implement and understand, requires floating-point math

When to Use Token Bucket

Token bucket is ideal when you want to allow short bursts (e.g., uploading multiple files) while controlling the long-term average rate. It's the most user-friendly algorithm because it doesn't penalize occasional burst activity.

Multi-Dimensional Rate Limiting

Production APIs need multiple rate limiting dimensions to defend against sophisticated attacks:

// Multi-dimensional rate limiting
async function checkRateLimits(req) {
  const checks = [
    // Per-user limits
    {
      key: `user:${req.userId}`,
      limiter: new TokenBucketRateLimiter(1000, 10), // 1000 burst, 10/sec sustained
      priority: 1
    },

    // Per-IP limits (broader, for shared IPs)
    {
      key: `ip:${req.ip}`,
      limiter: new TokenBucketRateLimiter(5000, 50), // More lenient for shared IPs
      priority: 2
    },

    // Per-endpoint limits
    {
      key: `endpoint:${req.userId}:${req.endpoint}`,
      limiter: getEndpointLimiter(req.endpoint), // Different limits per endpoint
      priority: 1
    },

    // Global limits (protect infrastructure)
    {
      key: 'global',
      limiter: new TokenBucketRateLimiter(100000, 1000), // Infrastructure limit
      priority: 3
    }
  ];

  for (const check of checks) {
    const result = await check.limiter.isAllowed(check.key);
    if (!result.allowed) {
      return {
        allowed: false,
        limitType: check.key.split(':')[0],
        retryAfter: result.retryAfter || 60,
        ...result
      };
    }
  }

  return { allowed: true };
}

function getEndpointLimiter(endpoint) {
  const limits = {
    '/api/search': new TokenBucketRateLimiter(100, 5),      // Expensive
    '/api/users/:id': new TokenBucketRateLimiter(1000, 20), // Moderate
    '/api/health': new TokenBucketRateLimiter(10000, 100)   // Cheap
  };

  return limits[endpoint] || new TokenBucketRateLimiter(500, 10);
}

Distributed Rate Limiting

For multi-server deployments, you need distributed rate limiting with shared state:

Redis-Based Implementation

// Distributed rate limiting with Redis
const Redis = require('ioredis');
const redis = new Redis({
  host: 'redis-cluster.example.com',
  port: 6379,
  enableReadyCheck: true,
  maxRetriesPerRequest: 3
});

// Atomic token bucket using Redis + Lua script
const tokenBucketScript = `
  local capacity = tonumber(ARGV[1])
  local refill_rate = tonumber(ARGV[2])
  local tokens_requested = tonumber(ARGV[3])
  local now = tonumber(ARGV[4])

  local tokens = tonumber(redis.call('HGET', KEYS[1], 'tokens')) or capacity
  local last_refill = tonumber(redis.call('HGET', KEYS[1], 'lastRefill')) or now

  local tokens_to_add = (now - last_refill) * refill_rate
  tokens = math.min(capacity, tokens + tokens_to_add)

  if tokens < tokens_requested then
    return {0, tokens}
  end

  tokens = tokens - tokens_requested
  redis.call('HMSET', KEYS[1], 'tokens', tokens, 'lastRefill', now)
  redis.call('EXPIRE', KEYS[1], math.ceil(capacity / refill_rate))

  return {1, tokens}
`;

async function checkDistributedRateLimit(key, capacity, refillRate, tokensRequested = 1) {
  const now = Date.now() / 1000;

  const result = await redis.eval(
    tokenBucketScript,
    1,
    key,
    capacity,
    refillRate,
    tokensRequested,
    now
  );

  return {
    allowed: result[0] === 1,
    remaining: result[1]
  };
}

Handling Redis Failures

Fail-Open vs Fail-Closed

When Redis is unavailable, you must choose: fail-open (allow requests, risk overload) or fail-closed (deny requests, impact availability). Most systems fail-open with local in-memory fallback limits to balance risk.

// Graceful degradation when Redis fails
async function rateLimitWithFallback(key, limiter) {
  try {
    return await limiter.isAllowed(key);
  } catch (redisError) {
    console.error('Redis failure, falling back to local limit', redisError);

    // Fallback to local in-memory rate limiting
    // More permissive to avoid false positives across servers
    return localMemoryLimiter.isAllowed(key);
  }
}

Response Headers and User Experience

Communicate rate limit status to clients using standard headers:

// Rate limit response headers
app.use(async (req, res, next) => {
  const limitResult = await checkRateLimit(req.userId);

  // Standard rate limit headers
  res.setHeader('X-RateLimit-Limit', limitResult.limit);
  res.setHeader('X-RateLimit-Remaining', limitResult.remaining);
  res.setHeader('X-RateLimit-Reset', limitResult.resetAt);

  if (!limitResult.allowed) {
    res.setHeader('Retry-After', limitResult.retryAfter || 60);

    return res.status(429).json({
      error: 'Rate limit exceeded',
      message: 'Too many requests. Please try again later.',
      retryAfter: limitResult.retryAfter,
      limit: limitResult.limit,
      resetAt: new Date(limitResult.resetAt).toISOString()
    });
  }

  next();
});

Advanced Strategies

Adaptive Rate Limiting

Adjust limits dynamically based on user behavior and threat level:

// Adaptive rate limiting
async function getAdaptiveLimit(userId) {
  const baseLimit = 1000; // Base limit for normal users

  const suspiciousScore = await calculateSuspiciousScore(userId);

  if (suspiciousScore > 0.8) {
    return baseLimit * 0.2;  // Reduce to 20% for highly suspicious
  } else if (suspiciousScore > 0.5) {
    return baseLimit * 0.5;  // Reduce to 50% for moderately suspicious
  }

  // Check if user is premium
  const isPremium = await checkPremiumStatus(userId);
  if (isPremium) {
    return baseLimit * 5;  // 5x limit for premium users
  }

  return baseLimit;
}

Quota Systems

Implement monthly/daily quotas in addition to per-second rate limits:

// Quota system with rate limiting
async function checkQuotaAndRate(userId) {
  // Check monthly quota
  const monthlyQuota = await getMonthlyQuota(userId);
  const monthlyUsage = await redis.get(`quota:monthly:${userId}`);

  if (monthlyUsage >= monthlyQuota) {
    return {
      allowed: false,
      reason: 'Monthly quota exceeded',
      quotaReset: getStartOfNextMonth()
    };
  }

  // Check rate limit
  const rateLimit = await checkRateLimit(userId);
  if (!rateLimit.allowed) {
    return rateLimit;
  }

  // Increment both counters
  await redis.incr(`quota:monthly:${userId}`);
  await redis.expireat(`quota:monthly:${userId}`, getStartOfNextMonth());

  return { allowed: true };
}

Choosing the Right Algorithm

Here's a decision framework:

Fixed Window: Use for simple quotas (e.g., "1000 requests per day") where precision isn't critical
Sliding Window Log: Use when you need precise rate limiting and can afford memory overhead
Sliding Window Counter: Best all-around choice for most APIs—good balance of accuracy and efficiency
Token Bucket: Use when you want to allow bursts while controlling long-term rate. Best user experience.

Recommended Default

For most APIs, start with Token Bucket for user-facing endpoints (better UX with burst tolerance) and Sliding Window Counter for internal/backend rate limiting (more predictable resource usage).

How KnoxCall Implements Intelligent Rate Limiting

KnoxCall provides production-ready rate limiting out of the box:

Adaptive algorithms: Automatically chooses optimal algorithm per endpoint
Multi-dimensional: Per-user, per-IP, per-endpoint, and global limits
Distributed by default: Works across multiple servers with Redis backend
Threat-aware: Reduces limits automatically when scraping detected
Graceful degradation: Fallback to local limits if Redis fails
Clear feedback: Standard headers and error messages for clients
Quota management: Built-in daily/monthly quota tracking

Key Takeaways

Rate limiting is essential for API security, stability, and fair resource allocation
Token bucket offers the best user experience by allowing controlled bursts
Sliding window counter provides the best balance of accuracy and efficiency
Multi-dimensional rate limiting (user + IP + endpoint) is necessary for modern APIs
Always implement graceful degradation for distributed rate limiting systems
Communicate limits clearly to clients using standard headers
Consider adaptive rate limiting that adjusts based on user behavior