medium9 min readUpdated 2026-06-08

Design Rate Limiter

System design interview solution for Design Rate Limiter. Includes requirements, API design, data model, architecture, scaling strategy, and tradeoffs.

Problem Statement

Design a system similar to Rate Limiter. The system should handle millions of users and provide a reliable, scalable experience.

Step 1: Clarifying Questions

Before diving into the design, ask these clarifying questions:

What is the expected scale (users, requests per second)?
What are the most critical features to support?
What are the latency requirements?
Do we need to support real-time features?
What consistency guarantees are needed?

Step 2: Functional Requirements

System architecture diagram for Design Rate Limiter showing how services, databases, and caches connect — System architecture for Design Rate Limiter

Core feature set for Rate Limiter
User-facing APIs and interactions
Data storage and retrieval
Search and discovery (if applicable)
Notifications (if applicable)

Step 3: Non-Functional Requirements

Scalability: Handle millions of concurrent users
Availability: 99.99% uptime (four nines)
Latency: Sub-200ms for read operations
Consistency: Eventually consistent where acceptable, strongly consistent for critical paths
Durability: No data loss

Step 4: Back-of-the-Envelope Estimation

Metric	Estimate
Daily Active Users	10M
Read:Write Ratio	10:1
Average Request Size	1 KB
Storage per year	~10 TB
Peak QPS	100K

Step 5: API Design

text

POST /api/v1/resource
GET  /api/v1/resource/{id}
PUT  /api/v1/resource/{id}
DELETE /api/v1/resource/{id}

Step-by-step diagram showing how Design Rate Limiter processes a request from start to finish — How Design Rate Limiter works step by step

Step 6: Data Model

Define the core entities and their relationships. Consider the access patterns when choosing between SQL and NoSQL.

Step 7: High-Level Architecture

The system consists of these major components:

Client Layer — Web/mobile clients
API Gateway — Rate limiting, authentication, routing
Application Servers — Business logic
Database Layer — Primary storage
Cache Layer — Redis/Memcached for hot data
Message Queue — Async processing

Step 8: Detailed Component Design

Data flow diagram for Design Rate Limiter showing how requests and responses move through the system — Data flow through Design Rate Limiter

Write Path

How data flows from client to persistent storage.

Read Path

How data is retrieved, including cache interactions.

Step 9: Scaling Strategy

Horizontal scaling of application servers behind a load balancer
Database sharding by user ID or geographic region
Read replicas for read-heavy workloads
CDN for static content delivery
Auto-scaling based on traffic patterns

Step 10: Reliability and Fault Tolerance

Interview preparation checklist for Design Rate Limiter with key points to mention and mistakes to avoid — Interview tips for Design Rate Limiter

Data replication across availability zones
Circuit breakers for dependent services
Graceful degradation under high load
Health checks and automated failover

Step 11: Monitoring and Observability

Request latency (p50, p95, p99)
Error rates by endpoint
Database query performance
Cache hit/miss ratios
Queue depth and processing lag

Key Tradeoffs

Decision	Option A	Option B	Chosen
Database	SQL	NoSQL	Depends on access patterns
Consistency	Strong	Eventual	Eventual for most reads
Communication	Sync	Async	Async for non-critical paths

How to Present This in an Interview

Decision guide for when to choose Design Rate Limiter and when alternative approaches are better — When to use Design Rate Limiter

Start with clarifying questions (2 min)
Define requirements (3 min)
Do estimation (2 min)
Design API and data model (5 min)
Draw high-level architecture (10 min)
Deep dive into critical components (10 min)
Discuss tradeoffs and bottlenecks (5 min)

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Tradeoff analysis for Design Rate Limiter listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Design Rate Limiter

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Production deployment examples of Design Rate Limiter at companies like Netflix, Google, and Amazon — Real-world examples of Design Rate Limiter

Deep-Dive: Clarifying Questions for Rate Limiter Design

Where in the stack? Client-side, API gateway level, application level, or all three? Each layer has different tradeoffs.
What counting granularity? Per user (API key), per IP, per endpoint, or a combination? Corporate NAT can put thousands of users behind one IP.
Distributed or single-node? If you have 50 API servers, rate limits must be coordinated. A user hitting different servers should still be rate limited globally.
What happens when rate limited? Drop the request? Queue it? Return HTTP 429 with Retry-After header?
What algorithms should we support? Token bucket (allows bursts), sliding window (precise), fixed window (simple), leaky bucket (smooth rate)?
Soft or hard limits? Should limits be strict (exactly 100 requests/minute) or approximate (roughly 100, with brief bursts allowed)?

Specific Functional Requirements

Configurable Rate Rules: Define rate limits per API key, per endpoint, per IP, or any combination (e.g., "100 requests/minute for /api/search per API key")
Multiple Algorithms: Support token bucket (burst-friendly), sliding window (precise), and fixed window (simple)
Distributed Counting: Rate limits are consistent across all API servers using centralized counting
Informative Responses: Return HTTP 429 with X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset, and Retry-After headers
Low Latency: Rate limit check must add under 1ms of latency to each request
Tiered Limits: Different rate limits for free, basic, pro, and enterprise API tiers
Monitoring: Dashboard showing rate limit hit rates per rule, per API key, per endpoint

Specific Internal API (Rate Limiter as a Service)

text

POST /ratelimit/check
  Body: &#123; "key": "apikey_123:/api/search", "limit": 100, "window_seconds": 60, "algorithm": "sliding_window" &#125;
  Response: &#123; "allowed": true, "remaining": 72, "reset_at": 1234567890, "retry_after": null &#125;

POST /ratelimit/check (rate limited response)
  Response: &#123; "allowed": false, "remaining": 0, "reset_at": 1234567920, "retry_after": 28 &#125;

GET /ratelimit/rules
  Response: &#123; "rules": [&#123; "pattern": "/api/search", "limit": 100, "window": 60, "tier": "free" &#125;, ...] &#125;

PUT /ratelimit/rules/:rule_id
  Body: &#123; "limit": 200, "window": 60 &#125;
  Response: &#123; "rule_id": "r_123", "updated": true &#125;

Comparison table for Design Rate Limiter contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Design Rate Limiter

Redis implementation (sliding window counter):

text

-- Lua script for atomic sliding window rate limiting
local key = KEYS[1]
local window = tonumber(ARGV[1])
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count < limit then
  redis.call('ZADD', key, now, now .. math.random())
  redis.call('EXPIRE', key, window)
  return {1, limit - count - 1}
end
return {0, 0}

Specific Data Model

Rate Limit Rules (PostgreSQL)

Column	Type	Notes
rule_id	UUID	Primary key
endpoint_pattern	VARCHAR	e.g., "/api/v1/search", "/api/v1/*"
tier	ENUM	free, basic, pro, enterprise
limit_count	INT	Max requests allowed
window_seconds	INT	Time window
algorithm	ENUM	token_bucket, sliding_window, fixed_window

Rate Limit Counters (Redis)

Token Bucket: Key = "tb:API_KEY:ENDPOINT", Value = (tokens: N, last_refill: timestamp)
Sliding Window: Key = "sw:API_KEY:ENDPOINT", Value = Sorted Set of request timestamps
Fixed Window: Key = "fw:API_KEY:ENDPOINT:WINDOW_START", Value = counter (integer)

Rate Limit Metrics (Time-series DB / InfluxDB)

Field	Type	Notes
api_key	TAG	For per-key analysis
endpoint	TAG	For per-endpoint analysis
allowed_count	FIELD	Requests that passed
rejected_count	FIELD	Requests that were rate limited
timestamp	TIME	1-minute granularity

Component diagram for Design Rate Limiter showing each building block and its responsibility — Key components of Design Rate Limiter

Specific Back-of-the-Envelope Numbers

Traffic:

API receives 100,000 requests/second at peak
Each request requires one rate limit check = 100K Redis operations/second
Redis single instance handles ~100K operations/second — need sharding or clustering for higher throughput

Memory (Redis):

Token bucket: 1 key per (API key, endpoint) combination. 1M API keys * 10 endpoints = 10M keys * 100 bytes = 1 GB
Sliding window: stores each request timestamp. 1M keys * 100 timestamps * 16 bytes = 1.6 GB. More memory-intensive but more precise.
Fixed window: 1 key per (API key, endpoint, window). Same as token bucket memory but keys auto-expire.

Latency:

Redis lookup: 0.1-0.5ms (local network)
Lua script execution: 0.05ms
Total rate limit overhead: under 1ms per request (target: under 0.5ms)

Availability:

Rate limiter must be as available as the API itself
If Redis is down: fail OPEN (allow all requests) for availability-focused systems, fail CLOSED (reject all) for security-focused systems
Use Redis Sentinel or Redis Cluster for high availability

Sources

Design Rate Limiter — Reference
Source: System-Design-Overview

Reference

Reference Solutionvideo

Problem Statement

Step 1: Clarifying Questions

Step 2: Functional Requirements

Step 3: Non-Functional Requirements

Step 4: Back-of-the-Envelope Estimation

Step 5: API Design

Step 6: Data Model

Step 7: High-Level Architecture

Step 8: Detailed Component Design

Write Path

Read Path

Step 9: Scaling Strategy

Step 10: Reliability and Fault Tolerance

Step 11: Monitoring and Observability

Key Tradeoffs

How to Present This in an Interview

Practical Implementation for .NET Developers

Deep-Dive: Clarifying Questions for Rate Limiter Design

Specific Functional Requirements

Specific Internal API (Rate Limiter as a Service)

Specific Data Model

Specific Back-of-the-Envelope Numbers

Sources

Reference

Related Topics