Skip to main content
SDMastery
easy7 min readUpdated 2026-06-03

Design a Distributed Cache

Design a distributed cache with consistent hashing, eviction policies, replication, and cache invalidation. Covers Redis-like architecture at scale.

Design a Distributed Cache system design overview showing key components and metrics
High-level overview of Design a Distributed Cache

Problem Statement

Design a distributed in-memory cache (similar to Redis or Memcached) that provides sub-millisecond read/write latency across a cluster of nodes. The system must handle node failures gracefully, distribute data evenly, and support configurable eviction policies and TTLs.

Requirements

Design a Distributed Cache system architecture with service components and data flow
System architecture for Design a Distributed Cache

Functional

  • GET/SET/DELETE operations with sub-millisecond latency
  • Support TTL (time-to-live) per key for automatic expiration
  • Configurable eviction policies: LRU, LFU, or random
  • Cache invalidation via explicit delete or pub/sub broadcast

Non-Functional

  • Latency: <1ms for cache hits, <5ms for cache misses (passthrough to DB)
  • Throughput: 500K ops/second per node
  • Availability: Cluster survives individual node failures without data loss for replicated keys
  • Scale: Horizontally scalable to 100+ nodes, petabytes of cached data

Core Architecture

Step-by-step diagram showing how Design a Distributed Cache works in practice
How Design a Distributed Cache works step by step
  1. Cache Node -- Each node runs a single-threaded event loop (like Redis) for lock-free execution. Stores data in a hash table with a doubly-linked list for LRU ordering. Supports multiple data types: strings, lists, sets, sorted sets, and hashes.

  2. Consistent Hash Ring -- Maps keys to nodes using a hash ring with 150+ virtual nodes per physical node to ensure even distribution. When a node joins or leaves, only 1/N of keys are redistributed. Client libraries compute the target node locally, avoiding a central router.

  3. Replication Manager -- Each primary node asynchronously replicates writes to 1-2 replica nodes. On primary failure, a replica is promoted. Uses a replication log with sequence numbers so replicas can catch up after brief network partitions.

Comparison table for Design a Distributed Cache showing key metrics and tradeoffs
Comparing key aspects of Design a Distributed Cache
  1. Cluster Coordinator -- A lightweight gossip protocol (like Redis Cluster) where nodes exchange heartbeats every second. Detects node failures within 5-10 seconds. Maintains and propagates the hash ring topology. No single point of failure.

Database Choice

The cache itself is purely in-memory with optional append-only file (AOF) persistence for durability across restarts. The AOF logs every write command and is fsync'd every second (configurable). For cluster metadata, each node maintains its own copy of the hash ring propagated via gossip protocol -- no external database needed. Monitoring metrics are exported to Prometheus.

Interview tips for Design a Distributed Cache system design questions
Interview tips for Design a Distributed Cache

Key API Endpoints

text
GET /cache/\{key\}
  -> Returns: \{ value: "...", ttl_remaining_ms: 4500 \}

PUT /cache/\{key\}
  -> Body: \{ value: "...", ttl_ms: 60000 \}

DELETE /cache/\{key\}
  -> Returns: \{ deleted: true \}

Scaling Insight

The most important scaling decision is client-side routing via consistent hashing. Instead of routing all requests through a proxy (which becomes a bottleneck), each client library maintains a local copy of the hash ring and routes directly to the correct cache node. This eliminates the proxy hop, halves latency, and doubles throughput. The hash ring is updated via cluster topology change events pushed to clients.

Decision guide showing when to use Design a Distributed Cache and when to avoid
When to use Design a Distributed Cache

Key Tradeoffs

DecisionOption AOption BChosen
EvictionLRU (recency-based)LFU (frequency-based)LRU default -- simpler, works well for most workloads; LFU as option for skewed access patterns
ReplicationSynchronous (strong consistency)Asynchronous (eventual)Async -- sub-ms latency preserved; accept small window of data loss on failure
ThreadingSingle-threaded event loopMulti-threaded with locksSingle-threaded -- eliminates lock overhead, simpler reasoning, scale out with more nodes

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

Pros and cons analysis of Design a Distributed Cache for system design decisions
Advantages and disadvantages of Design a Distributed Cache

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Real-world companies using Design a Distributed Cache in production systems
Real-world examples of Design a Distributed Cache

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Data flow diagram for Design a Distributed Cache showing request and response paths
Data flow through Design a Distributed Cache

System-Specific Clarifying Questions

Before designing Design Distributed Cache, ask questions specific to THIS system:

Key components of Design a Distributed Cache with roles and responsibilities
Key components of Design a Distributed Cache
  1. Who are the primary users? Understanding the user base shapes every technical decision — consumer apps have different requirements than enterprise B2B systems.
  2. What is the read-to-write ratio? This determines whether you optimize for fast reads (caching, denormalization) or fast writes (write-ahead logs, async processing).
  3. What is the geographic distribution? Users in one country vs. global users fundamentally changes your data replication and CDN strategy.
  4. What is the acceptable latency? Some features need sub-100ms responses, others can tolerate seconds. This determines your caching and architecture strategy.
  5. What is the consistency requirement? Some data (payments, inventory) needs strong consistency. Other data (social feeds, recommendations) can be eventually consistent.

Architecture Deep Dive

The architecture for Design Distributed Cache should be designed around the specific access patterns of the system. Do not apply generic templates — every system has unique hotspots, bottlenecks, and scaling challenges.

Write Path: How does data enter the system? Is it bursty (event-driven, flash sales) or steady (sensor data, logs)? Bursty writes need queuing and backpressure. Steady writes can go directly to the database.

Read Path: How is data consumed? Is it fan-out (one write, many reads like social feeds) or point lookups (one read for specific data like user profiles)? Fan-out reads benefit from pre-computation and caching. Point lookups benefit from efficient indexing.

Hot Spots: Where are the bottlenecks? For Design Distributed Cache, identify the component that will fail first under load and design mitigation strategies: caching, sharding, rate limiting, or async processing.

Sources