intermediate11 min readUpdated 2026-06-08

Cache Stampede

A cache stampede (thundering herd) occurs when many requests simultaneously miss the cache and hit the database, causing a load spike that can bring down.

Cache Stampede

A cache stampede (also called thundering herd) happens when a popular cache entry expires and hundreds of concurrent requests all miss the cache simultaneously, all hitting the database at once. The database, designed to handle 100 QPS because the cache handles the other 9,900, suddenly receives 10,000 queries for the same key. It crashes, the cache never repopulates, and the stampede continues. This pattern has caused outages at Facebook, Reddit, and Twitter. The three main defenses are locking (only one request fetches, others wait), probabilistic early expiration, and stale-while-revalidate.

Aspect	Details
What it is	A surge of simultaneous cache misses for the same key overwhelming the database when a popular cache entry expires
When to use	Any system with high-traffic cached data that has TTL-based expiration — product pages, user profiles, API responses
When NOT to use	Low-traffic systems where cache misses are rare, or systems that can tolerate brief latency spikes on cache expiration
Real-world example	Facebook's Memcache layer handles billions of operations — they pioneered lease-based stampede prevention for their top-100 most accessed keys
Interview tip	Name the three defenses: locking, probabilistic early refresh, and stale-while-revalidate
Common mistake	Setting the same TTL on all cache entries — mass expiration at TTL boundary causes a stampede for every key simultaneously
Key tradeoff	Consistency vs availability — serving stale cached data prevents stampedes but users may see outdated information briefly

Why This Matters

Cache stampedes are among the most common causes of database overload in high-traffic systems. A single popular key expiring can trigger enough concurrent database queries to crash the backend. Reddit's front page is a famous example — the front page cache expires, thousands of requests hit the database, the database dies, and users see errors. Understanding stampede prevention is essential for any caching design in a system design interview.

System architecture diagram for Cache Stampede showing how services, databases, and caches connect — System architecture for Cache Stampede

The Building Blocks

Mutex/Lock Pattern: When a cache miss occurs, the first request acquires a lock (Redis SETNX) and fetches from the database. Subsequent requests see the lock and wait or return stale data instead of hitting the database.
Probabilistic Early Expiration: Each request has a small probability of refreshing the cache before it expires. As TTL approaches, the probability increases. Statistically, one request refreshes the cache before mass expiration.
Stale-While-Revalidate: Serve the expired cache entry while asynchronously refreshing it in the background. Users get fast (possibly stale) responses, and the cache is eventually updated without a stampede.
TTL Jitter: Add random variation to cache TTLs. Instead of all entries expiring at exactly 300s, expire at 270-330s. This prevents synchronized mass expiration that causes stampedes across many keys.
External Refresh: A background job proactively refreshes popular cache entries before they expire. The cache never goes empty, so stampedes cannot occur. Requires knowing which keys are hot.

Under the Hood

The lock-based approach works like this: Request A arrives and finds the cache empty. It sets a Redis lock (SETNX "lock:product:123" with a 5-second TTL) and queries the database. Requests B through Z arrive, find the lock active, and either wait (spin-loop with backoff) or return a stale cached value. When Request A completes, it writes to the cache and releases the lock. Requests B-Z then read from the fresh cache.

Step-by-step diagram showing how Cache Stampede processes a request from start to finish — How Cache Stampede works step by step

Probabilistic early expiration (also called XFetch) uses the formula: should_refresh = (time_remaining / ttl) < random(). As the TTL approaches, the probability of refreshing increases. With 1000 requests per second and a 300-second TTL, about one request will trigger a refresh 5-10 seconds before expiration. The key insight: only one request refreshes, not all 1000.

Facebook's lease mechanism combines both: when a cache miss occurs, memcached issues a "lease" (a token) to the first requester. Other requesters are told to wait briefly. If the lease holder does not populate the cache within 10 seconds, the lease expires and another requester gets a new lease. This prevents both stampedes and stuck locks.

How Companies Actually Do This

Facebook pioneered the lease mechanism in their Memcache infrastructure. Their TAO layer uses lease-based stampede prevention for the social graph — profile data, friend lists, and news feed entries.

Comparison table for Cache Stampede contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Cache Stampede

Reddit experienced front-page stampedes when the homepage cache expired. They implemented stale-while-revalidate so users always see cached content while the background refresh occurs.

Cloudflare uses request coalescing at the CDN edge. When multiple users request the same uncached resource, only one request goes to the origin. Others wait for that single response and are served from the edge cache.

Common Pitfalls

Using distributed locks without TTL — if the lock holder crashes, the lock is never released and no request can repopulate the cache
Setting identical TTLs across all cache entries — everything expires at the same time, causing a stampede for every key, not just hot ones
Not warming the cache after a cold start or deployment — the first traffic wave hits an empty cache and every request goes to the database

Data flow diagram for Cache Stampede showing how requests and responses move through the system — Data flow through Cache Stampede

Interview Questions Worth Practicing

How would you prevent a cache stampede for a product page viewed 10,000 times per second?
What are the tradeoffs between lock-based and probabilistic stampede prevention?
How does Facebook's lease mechanism work, and why is it better than a simple mutex?

The Tradeoffs

Freshness vs Availability: Stale-while-revalidate serves possibly outdated data to prevent database overload. If freshness is critical (pricing, inventory), you need lock-based refresh instead.
Simplicity vs Robustness: Lock-based prevention is simple but introduces a single point of contention. Probabilistic approaches are more resilient but harder to tune and reason about.
Proactive vs Reactive: Background refresh prevents stampedes entirely but requires knowing hot keys and wastes resources refreshing rarely-accessed data. Reactive approaches only refresh on demand.

Component diagram for Cache Stampede showing each building block and its responsibility — Key components of Cache Stampede

How to Explain This in an Interview

Here is how I would explain Cache Stampede in a system design interview:

A cache stampede happens when a popular cache key expires and hundreds of requests simultaneously miss the cache and hit the database. I would prevent this with three techniques depending on the scenario. First, a Redis-based lock (SETNX with TTL) — the first request acquires the lock and fetches from the database, others wait or return stale data. Second, TTL jitter — instead of all keys expiring at exactly 5 minutes, I add random variation (270-330 seconds) to prevent synchronized expiration. Third, for the most critical keys (homepage, popular products), I use background refresh — a scheduled job repopulates the cache 30 seconds before expiration so it never goes empty. The key tradeoff is freshness vs availability: stale-while-revalidate is the safest default.

Interview preparation checklist for Cache Stampede with key points to mention and mistakes to avoid — Interview tips for Cache Stampede

The Real-World Incident That Made This Famous

Understanding Cache Stampede became critical after multiple high-profile production incidents at major tech companies. When systems handle millions of users, even small misunderstandings about Cache Stampede can lead to cascading failures that cost millions in lost revenue and erode user trust. Companies like Netflix, Google, Amazon, and Meta have all invested heavily in mastering Cache Stampede because they learned the hard way that ignoring it leads to outages.

The key lesson from these incidents: Cache Stampede is not just a theoretical concept — it is a practical skill that separates engineers who build resilient systems from those who build fragile ones. Every major outage report from the past decade involves at least one Cache Stampede-related design decision that was either implemented incorrectly or overlooked entirely during the initial architecture review.

Decision guide for when to choose Cache Stampede and when alternative approaches are better — When to use Cache Stampede

How Senior Engineers Think About This

Senior engineers approach Cache Stampede differently from textbook definitions. Instead of memorizing rules, they build mental models. They ask: "What problem does Cache Stampede solve? When does it fail? What are the alternatives?" This problem-first thinking leads to better design decisions because every system has unique constraints.

When evaluating Cache Stampede in a system design context, experienced engineers consider the failure modes first. What happens when this component goes down? How does the system degrade? Is the degradation graceful or catastrophic? These questions reveal more about your understanding than any textbook definition.

The key difference between junior and senior engineers when it comes to Cache Stampede: juniors focus on the happy path, while seniors design for what happens when things go wrong. They consider operational cost, team expertise, monitoring requirements, and how the decision will look six months from now when traffic has grown 10x.

Tradeoff analysis for Cache Stampede listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Cache Stampede

Common Interview Mistakes

Mistake 1: Giving a textbook definition without context. Interviewers want to see you connect Cache Stampede to real systems and real problems. Instead of reciting definitions, explain when and why you would use Cache Stampede in the system you are designing.

Mistake 2: Not discussing trade-offs. Every design decision involving Cache Stampede has trade-offs. Discuss what you gain and what you give up. Acknowledge the downsides and explain why the benefits outweigh them for your specific use case.

Mistake 3: Overcomplicating the solution. Start with the simplest approach to Cache Stampede that meets the requirements, then add complexity only when justified. Many candidates jump to complex implementations when a simpler solution would work perfectly.

Production deployment examples of Cache Stampede at companies like Netflix, Google, and Amazon — Real-world examples of Cache Stampede

Production Checklist

Define clear metrics for measuring the effectiveness of your Cache Stampede implementation
Set up monitoring and alerting that specifically tracks Cache Stampede-related failures
Document your Cache Stampede design decisions in Architecture Decision Records (ADRs)
Test failure scenarios related to Cache Stampede in staging before production deployment
Review and update your Cache Stampede implementation quarterly as system requirements evolve
Train new team members on the specific Cache Stampede patterns used in your system
Establish runbooks for common Cache Stampede-related incidents and recovery procedures

Practical Implementation for .NET Developers

In .NET, use SemaphoreSlim for in-process locking or Redis SETNX for distributed locking to prevent stampedes. Libraries like FusionCache (ZiggyCreatures.FusionCache) provide built-in stampede prevention with fail-safe stale data, adaptive caching, and distributed locking. For probabilistic refresh, implement an IDistributedCache wrapper that checks TTL remaining and triggers background refresh via IHostedService. Use IMemoryCache with CacheItemPriority and size limits for in-process caching.

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing {Operation} for {ResourceId}", operation, resourceId);

This gives you searchable, structured logs in Azure Monitor or Seq.