Skip to main content
SDMastery

Rate Limiting Algorithms Explained

2025-01-159 min read
Rate Limiting Algorithms Explained system design overview showing key components and metrics
High-level overview of Rate Limiting Algorithms Explained

Rate Limiting Algorithms Explained

Rate limiting controls how many requests a client can make in a given time period. It protects servers from overload, ensures fair usage, and prevents abuse. The choice of algorithm determines how strictly the limit is enforced, how burst-friendly the system is, and how much memory is required.

Algorithm 1: Token Bucket

A bucket holds tokens, up to a maximum capacity. Each request consumes one token. Tokens are added to the bucket at a fixed rate (e.g., 10 tokens per second). If the bucket is empty, the request is rejected.

How it works: Initialize the bucket with a capacity of N tokens. On each request, check if tokens are available. If yes, decrement and allow. If no, reject. Refill tokens at the configured rate.

Rate Limiting Algorithms Explained system architecture with service components and data flow
System architecture for Rate Limiting Algorithms Explained

Implementation detail: You do not need a background thread to add tokens. Calculate the elapsed time since the last request and add the appropriate number of tokens (up to the capacity) at request time. This is called "lazy refill."

Behavior: Allows bursts up to the bucket capacity, then enforces the steady-state rate. A bucket with capacity 100 and rate 10/sec allows a burst of 100 requests, then 10/sec thereafter.

Pros: Simple, memory efficient (one counter + one timestamp per client), allows controlled bursts. Cons: Burst size is fixed by bucket capacity; tuning capacity and rate independently can be confusing.

Used by: AWS API Gateway, Stripe, many CDNs.

Algorithm 2: Leaky Bucket

Step-by-step diagram showing how Rate Limiting Algorithms Explained works in practice
How Rate Limiting Algorithms Explained works step by step

Requests enter a queue (the "bucket") and are processed at a fixed rate. If the queue is full, new requests are rejected.

How it works: Maintain a FIFO queue with a fixed capacity. Requests are added to the queue on arrival. A processor dequeues and processes requests at a constant rate.

Behavior: Smooths out bursts — even if 100 requests arrive at once, they are processed at the fixed rate. The queue absorbs bursts, but if the burst exceeds the queue capacity, excess requests are dropped.

Pros: Produces a perfectly smooth output rate, useful when downstream systems cannot handle bursts. Cons: Does not allow legitimate bursts (a user who has been idle should be able to make a few fast requests). Requires a queue data structure.

Used by: Network traffic shaping, some API gateways.

Comparison table for Rate Limiting Algorithms Explained showing key metrics and tradeoffs
Comparing key aspects of Rate Limiting Algorithms Explained

Algorithm 3: Fixed Window Counter

Divide time into fixed windows (e.g., each minute). Maintain a counter per window per client. If the counter exceeds the limit within the current window, reject the request.

How it works: Key = client_id + window_start_time. Increment the counter on each request. If counter > limit, reject.

Behavior: Simple to implement and memory-efficient. However, it has a boundary problem: a client can make the maximum requests at the end of one window and the beginning of the next, effectively doubling their rate for a brief period.

Example: Limit is 100 requests per minute. Client makes 100 requests at 11:00:59, then 100 more at 11:01:00. That is 200 requests in 2 seconds, despite a "100 per minute" limit.

Data flow diagram for Rate Limiting Algorithms Explained showing request and response paths
Data flow through Rate Limiting Algorithms Explained

Pros: Very simple, uses minimal memory (one counter per client per window). Cons: Boundary burst problem allows up to 2x the intended rate at window edges.

Algorithm 4: Sliding Window Log

Record the timestamp of every request in a sorted set. On each new request, remove timestamps older than the window and count the remaining entries. If the count exceeds the limit, reject.

How it works: Store timestamps in a sorted set (e.g., Redis ZSET). On request: remove entries older than (now - window_size), count remaining, allow if count < limit, add current timestamp.

Behavior: Perfectly accurate — no boundary burst problem. The window slides with each request.

Key components of Rate Limiting Algorithms Explained with roles and responsibilities
Key components of Rate Limiting Algorithms Explained

Pros: Precise rate limiting with no edge cases. Cons: Memory-expensive — stores one timestamp per request per client. For a limit of 10,000 requests per hour, that is 10,000 entries per client.

Algorithm 5: Sliding Window Counter (Hybrid)

Combine fixed window and sliding window. Maintain counters for the current and previous windows. Estimate the rate by weighting the previous window's count based on how far into the current window we are.

How it works: Count for current window + (count for previous window * overlap percentage). If this weighted sum exceeds the limit, reject.

Example: Limit is 100/min. Previous window had 80 requests. Current window (30 seconds in) has 40 requests. Estimated count = 40 + 80 * 0.5 = 80. Under the limit, so allow.

Interview tips for Rate Limiting Algorithms Explained system design questions
Interview tips for Rate Limiting Algorithms Explained

Pros: Nearly as accurate as sliding window log, but uses only two counters per client (same as fixed window). Cons: An approximation — not perfectly precise, but good enough for almost all use cases.

Comparison Table

AlgorithmMemoryAccuracyBurst HandlingComplexity
Token BucketLowGoodAllows controlled burstsLow
Leaky BucketMediumGoodSmooths all burstsMedium
Fixed WindowLowPoor (boundary issue)Allows 2x burst at edgesVery low
Sliding Window LogHighPerfectNo bursts beyond limitMedium
Sliding Window CounterLowGood (approximate)Minimal boundary issueLow

Which One Should You Use?

  • Default choice: Token bucket. Simple, allows reasonable bursts, low memory.
  • Smooth output needed: Leaky bucket. When downstream cannot handle any bursts.
  • Memory-constrained, low accuracy OK: Fixed window. Simplest implementation.
  • Precision critical: Sliding window log. When you cannot tolerate any over-limit requests.
  • Best overall tradeoff: Sliding window counter. Near-perfect accuracy with minimal memory.

Practical Implementation for .NET Developers

Decision guide showing when to use Rate Limiting Algorithms Explained and when to avoid
When to use Rate Limiting Algorithms Explained

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Pros and cons analysis of Rate Limiting Algorithms Explained for system design decisions
Advantages and disadvantages of Rate Limiting Algorithms Explained

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

What Most Articles Get Wrong

Most articles present rate limiting algorithms in isolation and miss the crucial point: in production, you almost always use multiple algorithms simultaneously. Your CDN uses a fixed window counter for DDoS protection (simple, fast, handles millions of IPs). Your API gateway uses a token bucket per API key (allows bursts for legitimate traffic). Your internal services use sliding window counters for precise throttling. The choice is not "which algorithm" but "which algorithm at which layer."

Real-world companies using Rate Limiting Algorithms Explained in production systems
Real-world examples of Rate Limiting Algorithms Explained

Another common mistake is treating rate limiting as purely defensive. Smart rate limiting is also a product feature. GitHub's rate limit headers (X-RateLimit-Remaining) let developers build self-throttling clients. Stripe's rate limits differ by endpoint (payment creation is more limited than payment retrieval) because the impact of overload differs. Rate limiting shapes behavior — it tells developers which API patterns are cheap and which are expensive.

The Numbers That Matter

  • Token bucket allows bursts of up to bucket capacity N, then sustains at rate R. Ideal for APIs where occasional bursts are acceptable.
  • Sliding window with 60-second window and 100-request limit uses approximately 100 * 16 bytes = 1.6 KB per client in Redis (storing timestamps)
  • Fixed window uses a single counter per client: 8 bytes per client. 10 million clients = 80 MB (extremely memory efficient)
  • Redis INCR operation: ~0.1ms latency — rate limiting adds negligible overhead per request
  • GitHub: 5,000 requests/hour authenticated, 60/hour unauthenticated
  • Stripe: 100 reads/sec and 100 writes/sec per API key in live mode

Sources