Skip to main content
SDMastery
intermediate10 min readUpdated 2026-06-03

Caching Strategies

Choosing the wrong caching strategy leads to stale data, cache misses, or unnecessary database load.

Caching Strategies system design overview showing key components and metrics
High-level overview of Caching Strategies
Caching Strategies

When You Need Caching Strategies

Choosing the wrong caching strategy leads to stale data, cache misses, or unnecessary database load. In interviews, you must justify which strategy fits your system's read/write patterns.

What It Is

Caching strategies define how data flows between the cache and the database. The main patterns are: Cache-Aside (Lazy Loading), Read-Through, Write-Through, Write-Behind (Write-Back), and Write-Around. Each has different tradeoffs for consistency, latency, and complexity.

Caching Strategies system architecture with service components and data flow
System architecture for Caching Strategies

How It Works

For a read-heavy social media feed: Use Cache-Aside. When a user loads their feed, check Redis. If cached, return immediately. If not, query the database, store in Redis with 5-minute TTL, return. When a new post is created, invalidate the cached feed.

For a write-heavy IoT system: Use Write-Behind. Sensor data is written to Redis (fast), then a background process flushes to the database in batches (efficient). But if Redis crashes, unflushed data is lost.

The Decision Framework

  • Cache-Aside (Lazy Loading): Application checks cache first. On miss, reads from DB, writes to cache. Most common strategy. Simple, but first request always misses.
  • Read-Through: Cache itself fetches from DB on miss. Application only talks to cache. Simplifies application code.
  • Write-Through: Every write goes to cache AND database synchronously. Guarantees cache is always fresh. But adds write latency.
  • Write-Behind (Write-Back): Writes go to cache first, then asynchronously to database. Fast writes but risk data loss if cache crashes.
  • Write-Around: Writes go directly to database, bypassing cache. Avoids cache pollution for write-heavy data. But first read always misses.
Step-by-step diagram showing how Caching Strategies works in practice
How Caching Strategies works step by step

What the Industry Uses

Facebook uses Cache-Aside with Memcached — the application manages cache reads and writes explicitly.

Amazon DynamoDB Accelerator (DAX) is a Read-Through cache — the application queries DAX, which handles cache misses transparently.

MySQL InnoDB Buffer Pool is essentially a Write-Behind cache — changes are buffered in memory and flushed to disk asynchronously.

Performance and Tradeoffs

Comparison table for Caching Strategies showing key metrics and tradeoffs
Comparing key aspects of Caching Strategies
  • Cache-Aside: Simple, flexible. But application must manage cache explicitly. Cold start problem.
  • Write-Through: Cache always fresh. But slower writes.
  • Write-Behind: Fastest writes. But risk of data loss. Complex.
  • Read-Through: Simpler application code. But less control over caching logic.

Mistakes Engineers Make

  1. Not invalidating cache after writes — leads to stale data
  2. Using Write-Behind without understanding the data loss risk
  3. Not handling the thundering herd on cache miss

Practice These Interview Questions

  1. What are the main caching strategies?
  2. When would you use Write-Through vs Write-Behind?
  3. How do you handle cache invalidation in Cache-Aside?
  4. What is the risk of Write-Behind caching?
Data flow diagram for Caching Strategies showing request and response paths
Data flow through Caching Strategies

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Key components of Caching Strategies with roles and responsibilities
Key components of Caching Strategies

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Further Reading

Interview tips for Caching Strategies system design questions
Interview tips for Caching Strategies

The Real-World Incident That Made This Famous

In 2010, Facebook published a paper about their Memcached infrastructure that became legendary in the systems community. At the time, Facebook was serving 1 billion cache requests per second from a fleet of Memcached servers holding 28 TB of data across thousands of servers. The paper, titled "Scaling Memcache at Facebook," revealed the engineering challenges that no textbook had covered.

The most fascinating problem they encountered was the "thundering herd" or "cache stampede." When a popular cache key expired, hundreds of web servers would simultaneously try to regenerate it by querying the database. For a celebrity's profile page, this meant hundreds of identical complex database queries hitting at the exact same moment. The database would buckle under the load, causing a cascading failure.

Facebook's solution was a lease mechanism: when a cache miss occurs, Memcached gives the first requesting server a "lease" (a token). Other servers asking for the same key within the next few seconds get a "wait and retry" response. Only the lease holder regenerates the cache entry. This simple mechanism reduced the peak database load by 10x during cache miss events.

Decision guide showing when to use Caching Strategies and when to avoid
When to use Caching Strategies

Another critical lesson from Facebook's deployment: network bandwidth between web servers and cache servers became the bottleneck before CPU or memory did. They solved this by co-locating cache servers with web servers in the same rack and using UDP for cache gets (TCP overhead was too high for small, frequent requests). This architecture decision is counterintuitive — UDP is unreliable — but for cache reads where a miss just means going to the database, the tradeoff was worth it.

How Senior Engineers Think About This

The first mental model: caching is not a feature, it is a performance optimization with a consistency tax. Every cache introduces a window where stale data can be served. The question is not "should I cache?" but "how much staleness can this use case tolerate?"

Senior engineers categorize caching strategies by the write pattern. Cache-aside (lazy loading) is the most common: read from cache, on miss read from DB and populate cache. Simple and safe, but every cache miss is slow. Write-through writes to cache and DB simultaneously: always fresh, but writes are slower and you cache data that might never be read. Write-behind (write-back) writes to cache immediately and asynchronously flushes to DB: fastest writes, but you can lose data if the cache crashes before flushing. Read-through is like cache-aside but the cache itself fetches from the DB, simplifying application code.

The mental model that separates senior from junior: think about cache invalidation as a distributed consistency problem. When you update a record in the database, you need to invalidate or update the cache. But what if the invalidation message is lost? What if there is a race condition where a stale read populates the cache right after you invalidated it? Facebook's solution is to delete cache entries (not update them) and use a delay before allowing re-population. This prevents the read-after-write race condition.

Pros and cons analysis of Caching Strategies for system design decisions
Advantages and disadvantages of Caching Strategies

Always ask: "What happens when the cache is completely empty?" This is the cold start problem. If your entire Redis cluster restarts, can your database handle the full load while the cache warms up? If not, you need a cache warming strategy.

Common Interview Mistakes

Mistake 1: Saying "just add Redis" without discussing the invalidation strategy. Caching is easy. Cache invalidation is the hard part. Always explain how you will keep the cache consistent with the source of truth.

Mistake 2: Not sizing the cache. Candidates often forget to estimate how much memory the cache needs. If your working set is 500 GB and your Redis instance has 64 GB, you need an eviction policy and you need to know the hit rate implications.

Mistake 3: Ignoring cache penetration. This is when requests for non-existent data bypass the cache and hit the database every time. Solution: cache negative results (cache the fact that a key does not exist) with a short TTL.

Real-world companies using Caching Strategies in production systems
Real-world examples of Caching Strategies

Mistake 4: Caching everything with the same TTL. Different data has different staleness tolerance. User profile data can be cached for 5 minutes. Inventory counts for a flash sale should be cached for 0 seconds (or use write-through). Price data might tolerate 30 seconds.

Mistake 5: Not discussing multi-layer caching. Production systems cache at multiple layers: browser cache, CDN cache, application-level cache (Redis), and database query cache. Each layer serves a different purpose and has different invalidation characteristics.

Production Checklist

  • Define a cache invalidation strategy for every cached entity BEFORE caching it — cache-aside with delete-on-write is the safest default
  • Set appropriate TTLs per data type: user sessions (30 min), product catalog (5 min), search results (1 min), real-time inventory (no cache or 5 sec)
  • Implement cache stampede protection using distributed locks or request coalescing (only one request regenerates the cache entry)
  • Cache negative results (empty responses) with a short TTL (30-60 seconds) to prevent cache penetration attacks
  • Monitor cache hit rate — below 80% means your cache is not effective and you should investigate access patterns
  • Plan for cold start: implement cache warming scripts that pre-populate the cache from the database after deployment or restart
  • Use consistent hashing for cache sharding so adding or removing cache nodes only invalidates a small portion of keys
  • Set memory limits with an eviction policy (LRU is the default choice) — never let Redis run out of memory
  • Separate cache clusters by use case: session cache, data cache, and rate limiting cache should be independent so a problem in one does not affect others
  • Test cache failure scenarios: what happens when Redis is completely unavailable? Your application should degrade gracefully, not crash

Read the original source | Content from System-Design-Overview

External Resources

Original Sourcearticle