Cache Warming
Pre-populating caches before traffic hits to avoid cold-start latency spikes, covering warming strategies, real-world implementations, and when warming is.
The Cold Start Problem
Every cache starts empty. When you deploy a new cache cluster, restart a node, or scale out to handle increased traffic, the new cache instance has zero data in it. Every request becomes a cache miss, which means every request hits the database directly. For a system that normally serves 95% of reads from cache, a cold cache turns a smooth 5ms response into a 200ms database query — and multiplies database load by 20x.
This is not a theoretical problem. In 2010, Facebook experienced a cascading failure when a cache cluster was restarted. The thundering herd of cache misses overwhelmed the database servers, which caused timeouts, which caused retries, which generated even more load. The incident lasted hours and affected hundreds of millions of users.
Cache warming is the practice of pre-populating a cache with data before it receives production traffic, so that the first real user request finds the data already in memory.
When Cache Warming Matters
Not every system needs cache warming. If your cache holds millions of keys but only receives a few hundred requests per second, the cold-start period is short and the database can handle the temporary spike in direct reads.
Cache warming becomes critical when:
- Your cache hit ratio is above 90% and the database is sized only for the cache-miss traffic. If 95% of reads normally hit the cache, a cold cache means 20x the normal database load.
- You frequently deploy or restart cache nodes. In a Kubernetes environment where pods are ephemeral, cache instances may restart multiple times per day. Each restart is a cold start.
- You have predictable traffic spikes. Black Friday, a product launch, a sports event — you know traffic will surge at a specific time and you need the cache ready.
- You are migrating between cache clusters. Switching from one Redis cluster to another without warming means an instant cold start for the new cluster.
Warming Strategies
Preloading from Database
The most straightforward approach: before the cache receives traffic, run a batch job that reads frequently accessed data from the database and writes it into the cache.
At Amazon, product catalog pages for the top 10,000 products are pre-loaded into the cache before every deployment. The warming job queries the database for these products and populates the cache in bulk. By the time the load balancer routes traffic to the new instance, the cache already contains the hottest data.
The challenge is knowing which keys to preload. You need a way to identify the hot set — the subset of keys that receive the most traffic. Common approaches:
- Access logs analysis. Parse recent access logs to find the most frequently requested keys. This works well for content-heavy systems like news sites or product catalogs.
- Popularity tracking. Maintain a separate data structure (like a Redis Sorted Set) that tracks access frequency. Use it to identify the top-N keys.
- Full preload. For small datasets (under a few million keys), load everything. This is the simplest approach and avoids the complexity of hot-set identification.
Shadow Traffic Warming
Route a copy of production traffic to the new cache instance without serving responses from it. The shadow instance processes the requests (populating its cache) but the responses come from the existing cache or database. Once the shadow cache reaches a target hit ratio, swap it into production.
This approach is used by Facebook's TAO (The Associations and Objects) cache. When a new TAO cache server is added to the cluster, it receives shadow reads from production traffic. The server processes these reads, populating its local cache. After the shadow period (typically 10-30 minutes depending on the dataset), the server is added to the active pool.
Shadow warming produces the most realistic cache state because it uses actual traffic patterns. But it requires infrastructure to duplicate traffic and temporarily absorbs extra resources.
Lazy Warming with Request Coalescing
Instead of pre-populating, let the cache warm naturally — but protect the database during the warming period. When multiple requests arrive for the same uncached key simultaneously, only one request goes to the database. The others wait for the first request to complete and then share the result.
This technique, called request coalescing or request collapsing, dramatically reduces database load during cold starts. Discord uses this pattern in their data services layer: when a popular Discord server channel is accessed after a cache restart, hundreds of users may request the same channel data simultaneously. Without coalescing, that is hundreds of identical database queries. With coalescing, it is one.
Nginx supports request coalescing natively with the proxy_cache_lock directive. If multiple requests arrive for the same cache key while the cache is being populated, only the first request is forwarded to the backend. The rest wait (up to a configurable timeout) for the cache entry to be filled.
Warm-Through from a Peer Cache
In a distributed cache cluster, when one node goes cold, it can warm from a neighboring node that still has the data, rather than going to the database. This is faster (cache-to-cache reads are faster than database reads) and avoids putting additional load on the database.
Memcached at Facebook uses this approach. When a new Memcached server joins a pool, it can pull hot keys from other pool members. The trade-off is that it increases inter-cache traffic during the warming period.
Implementation Patterns
Gradual Traffic Shift
Instead of instantly sending 100% of traffic to a cold cache, gradually increase the traffic percentage:
- Start by sending 5% of traffic to the new cache
- Monitor hit ratio and database load
- Increase to 25%, 50%, 75%, 100% over the course of an hour
- If the hit ratio drops below a threshold, pause and let the cache accumulate more data
Load balancers like Envoy support weighted routing, making this pattern straightforward to implement.
Two-Tier Warming
Use a fast warming mechanism for the hottest data and lazy warming for the long tail:
- Pre-load the top 1,000 keys from a precomputed hot-keys list (takes seconds)
- Route traffic through the cache with request coalescing (handles the next tier)
- Background warm the next 100,000 keys from a database dump (takes minutes)
This two-tier approach gives you immediate coverage for the highest-traffic keys while the rest of the cache fills in progressively.
Cache Warming as Part of Deployment
In a blue-green deployment, the green (new) environment is warmed before the traffic switch. The CI/CD pipeline includes a warming step:
- Deploy the green environment
- Run the cache warming job against the green cache
- Verify cache hit ratio meets the threshold (e.g., above 80%)
- Switch the load balancer to the green environment
- Drain the blue environment
This ensures users never experience cold-cache latency during deployments.
Pitfalls
Warming the wrong data. If your access pattern changes (a viral post shifts traffic to new keys), pre-loading based on historical patterns warms data nobody will request. Always complement preloading with lazy warming.
Warming too aggressively. Loading 100 million keys into a cache takes time and network bandwidth. If the warming job runs at full speed, it can overload the database during the loading phase. Throttle the warming job to use only a fraction of available database capacity.
Stale warming data. If the warming source is a database snapshot taken an hour ago, the cache starts with stale data. For systems where freshness matters, combine preloading with a change data capture (CDC) stream that applies recent updates after the initial load.
Neglecting TTL alignment. If you warm a cache with entries that have a 5-minute TTL, the entire cache expires simultaneously 5 minutes later, creating another cold-start event. Jitter the TTL values during warming (e.g., 4-6 minutes randomly) to spread out expiration.
Cache warming is not glamorous infrastructure work, but it is the difference between a smooth deployment and a cascading failure during your busiest hour.
Real-World Production Example
When Cloudflare deploys updates to their edge servers across 300+ data centers worldwide, every deployment creates a cold cache scenario. Cloudflare's CDN caches billions of objects, and each server restart means that server has an empty cache. Without warming, the first visitors after a deployment would experience significantly higher latency as every request becomes a cache miss that must fetch from the origin server.
Cloudflare addresses this with a multi-layered warming strategy. Their tiered caching architecture means that even when an edge server's cache is cold, the request hits a regional "upper tier" cache before reaching the origin. This upper tier absorbs the majority of cache misses from cold edge servers. For the remaining misses, they use request coalescing (which they call "Always Online" for certain workloads) to ensure that multiple requests for the same resource result in only one origin fetch.
The most interesting aspect of Cloudflare's approach is their use of stale-while-revalidate semantics. When a cached object's TTL expires, Cloudflare can serve the stale version to the requesting client while asynchronously fetching a fresh copy from the origin. This means that even cache "expiration" events do not cause latency spikes for users. The combination of tiered caching, request coalescing, and stale-while-revalidate means that origin servers experience smooth, predictable load regardless of edge cache state — which is the entire point of a CDN.
Common Interview Mistakes
- Not distinguishing planned from unplanned cold starts: Deployments and scaling events are planned — you can warm before switching traffic. Crashes and failovers are unplanned — you need a strategy that works without preparation. Discuss both scenarios.
- Proposing to warm the entire dataset: If your cache holds 10TB of data, you cannot preload everything before serving traffic. Discuss how to identify the "hot set" (top 1-5% of keys by access frequency) and warm only that, relying on lazy loading for the long tail.
- Ignoring the warming job's impact on the database: A warming job that reads millions of keys from the database can itself cause a performance degradation. Discuss throttling the warming rate and scheduling warming during low-traffic periods.
- Not connecting warming to deployment strategy: Cache warming should be part of the deployment pipeline, not an afterthought. In blue-green deployments, the green environment should be warmed before the traffic switch. Candidates who mention this integration show operational experience.
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
The Real-World Incident That Made This Famous
Understanding Cache Warming became critical after multiple high-profile production incidents at major tech companies. When systems handle millions of users, even small misunderstandings about Cache Warming can lead to cascading failures that cost millions in lost revenue and erode user trust. Companies like Netflix, Google, Amazon, and Meta have all invested heavily in mastering Cache Warming because they learned the hard way that ignoring it leads to outages.
The key lesson from these incidents: Cache Warming is not just a theoretical concept — it is a practical skill that separates engineers who build resilient systems from those who build fragile ones.
How Senior Engineers Think About This
Senior engineers approach Cache Warming differently from textbook definitions. Instead of memorizing rules, they build mental models. They ask: "What problem does Cache Warming solve? When does it fail? What are the alternatives?" This problem-first thinking leads to better design decisions because every system has unique constraints.
When evaluating Cache Warming in a system design context, experienced engineers consider the failure modes first. What happens when this component goes down? How does the system degrade? Is the degradation graceful or catastrophic? These questions reveal more about your understanding than any textbook definition.
Common Interview Mistakes
Mistake 1: Giving a textbook definition without context. Interviewers want to see you connect Cache Warming to real systems and real problems.
Mistake 2: Not discussing trade-offs. Every design decision involving Cache Warming has trade-offs. Discuss what you gain and what you give up.
Mistake 3: Overcomplicating the solution. Start with the simplest approach to Cache Warming that meets the requirements, then add complexity only when justified.
Production Checklist
- Define clear metrics for measuring the effectiveness of your Cache Warming implementation
- Set up monitoring and alerting that specifically tracks Cache Warming-related failures
- Document your Cache Warming design decisions in Architecture Decision Records (ADRs)
- Test failure scenarios related to Cache Warming in staging before production deployment
- Review and update your Cache Warming implementation quarterly as system requirements evolve
- Train new team members on the specific Cache Warming patterns used in your system