easy7 min readUpdated 2026-06-08

Design a Distributed Cache

Design a distributed cache with consistent hashing, eviction policies, replication, and cache invalidation. Covers Redis-like architecture at scale.

Problem Statement

Design a distributed in-memory cache (similar to Redis or Memcached) that provides sub-millisecond read/write latency across a cluster of nodes. The system must handle node failures gracefully, distribute data evenly, and support configurable eviction policies and TTLs.

Requirements

System architecture diagram for Design a Distributed Cache showing how services, databases, and caches connect — System architecture for Design a Distributed Cache

Functional

GET/SET/DELETE operations with sub-millisecond latency
Support TTL (time-to-live) per key for automatic expiration
Configurable eviction policies: LRU, LFU, or random
Cache invalidation via explicit delete or pub/sub broadcast

Non-Functional

Latency: <1ms for cache hits, <5ms for cache misses (passthrough to DB)
Throughput: 500K ops/second per node
Availability: Cluster survives individual node failures without data loss for replicated keys
Scale: Horizontally scalable to 100+ nodes, petabytes of cached data

Core Architecture

Step-by-step diagram showing how Design a Distributed Cache processes a request from start to finish — How Design a Distributed Cache works step by step

Cache Node -- Each node runs a single-threaded event loop (like Redis) for lock-free execution. Stores data in a hash table with a doubly-linked list for LRU ordering. Supports multiple data types: strings, lists, sets, sorted sets, and hashes.
Consistent Hash Ring -- Maps keys to nodes using a hash ring with 150+ virtual nodes per physical node to ensure even distribution. When a node joins or leaves, only 1/N of keys are redistributed. Client libraries compute the target node locally, avoiding a central router.
Replication Manager -- Each primary node asynchronously replicates writes to 1-2 replica nodes. On primary failure, a replica is promoted. Uses a replication log with sequence numbers so replicas can catch up after brief network partitions.

Comparison table for Design a Distributed Cache contrasting approaches, tradeoffs, and when to use each — Comparing key aspects of Design a Distributed Cache

Cluster Coordinator -- A lightweight gossip protocol (like Redis Cluster) where nodes exchange heartbeats every second. Detects node failures within 5-10 seconds. Maintains and propagates the hash ring topology. No single point of failure.

Database Choice

The cache itself is purely in-memory with optional append-only file (AOF) persistence for durability across restarts. The AOF logs every write command and is fsync'd every second (configurable). For cluster metadata, each node maintains its own copy of the hash ring propagated via gossip protocol -- no external database needed. Monitoring metrics are exported to Prometheus.

Interview preparation checklist for Design a Distributed Cache with key points to mention and mistakes to avoid — Interview tips for Design a Distributed Cache

Key API Endpoints

text

GET /cache/\{key\}
  -> Returns: \{ value: "...", ttl_remaining_ms: 4500 \}

PUT /cache/\{key\}
  -> Body: \{ value: "...", ttl_ms: 60000 \}

DELETE /cache/\{key\}
  -> Returns: \{ deleted: true \}

Scaling Insight

The most important scaling decision is client-side routing via consistent hashing. Instead of routing all requests through a proxy (which becomes a bottleneck), each client library maintains a local copy of the hash ring and routes directly to the correct cache node. This eliminates the proxy hop, halves latency, and doubles throughput. The hash ring is updated via cluster topology change events pushed to clients.

Decision guide for when to choose Design a Distributed Cache and when alternative approaches are better — When to use Design a Distributed Cache

Key Tradeoffs

Decision	Option A	Option B	Chosen
Eviction	LRU (recency-based)	LFU (frequency-based)	LRU default -- simpler, works well for most workloads; LFU as option for skewed access patterns
Replication	Synchronous (strong consistency)	Asynchronous (eventual)	Async -- sub-ms latency preserved; accept small window of data loss on failure
Threading	Single-threaded event loop	Multi-threaded with locks	Single-threaded -- eliminates lock overhead, simpler reasoning, scale out with more nodes

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

Tradeoff analysis for Design a Distributed Cache listing advantages, disadvantages, and real-world considerations — Advantages and disadvantages of Design a Distributed Cache

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Production deployment examples of Design a Distributed Cache at companies like Netflix, Google, and Amazon — Real-world examples of Design a Distributed Cache

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text

Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Data flow diagram for Design a Distributed Cache showing how requests and responses move through the system — Data flow through Design a Distributed Cache

System-Specific Clarifying Questions

Before designing Design Distributed Cache, ask questions specific to THIS system:

Component diagram for Design a Distributed Cache showing each building block and its responsibility — Key components of Design a Distributed Cache

Who are the primary users? Understanding the user base shapes every technical decision — consumer apps have different requirements than enterprise B2B systems.
What is the read-to-write ratio? This determines whether you optimize for fast reads (caching, denormalization) or fast writes (write-ahead logs, async processing).
What is the geographic distribution? Users in one country vs. global users fundamentally changes your data replication and CDN strategy.
What is the acceptable latency? Some features need sub-100ms responses, others can tolerate seconds. This determines your caching and architecture strategy.
What is the consistency requirement? Some data (payments, inventory) needs strong consistency. Other data (social feeds, recommendations) can be eventually consistent.

Architecture Deep Dive

The architecture for Design Distributed Cache should be designed around the specific access patterns of the system. Do not apply generic templates — every system has unique hotspots, bottlenecks, and scaling challenges.

Write Path: How does data enter the system? Is it bursty (event-driven, flash sales) or steady (sensor data, logs)? Bursty writes need queuing and backpressure. Steady writes can go directly to the database.

Read Path: How is data consumed? Is it fan-out (one write, many reads like social feeds) or point lookups (one read for specific data like user profiles)? Fan-out reads benefit from pre-computation and caching. Point lookups benefit from efficient indexing.

Hot Spots: Where are the bottlenecks? For Design Distributed Cache, identify the component that will fail first under load and design mitigation strategies: caching, sharding, rate limiting, or async processing.

Sources

Reference

Reference Solutionvideo