Design a Distributed Cache
Design a distributed cache with consistent hashing, eviction policies, replication, and cache invalidation. Covers Redis-like architecture at scale.
Problem Statement
Design a distributed in-memory cache (similar to Redis or Memcached) that provides sub-millisecond read/write latency across a cluster of nodes. The system must handle node failures gracefully, distribute data evenly, and support configurable eviction policies and TTLs.
Requirements
Functional
- GET/SET/DELETE operations with sub-millisecond latency
- Support TTL (time-to-live) per key for automatic expiration
- Configurable eviction policies: LRU, LFU, or random
- Cache invalidation via explicit delete or pub/sub broadcast
Non-Functional
- Latency: <1ms for cache hits, <5ms for cache misses (passthrough to DB)
- Throughput: 500K ops/second per node
- Availability: Cluster survives individual node failures without data loss for replicated keys
- Scale: Horizontally scalable to 100+ nodes, petabytes of cached data
Core Architecture
-
Cache Node -- Each node runs a single-threaded event loop (like Redis) for lock-free execution. Stores data in a hash table with a doubly-linked list for LRU ordering. Supports multiple data types: strings, lists, sets, sorted sets, and hashes.
-
Consistent Hash Ring -- Maps keys to nodes using a hash ring with 150+ virtual nodes per physical node to ensure even distribution. When a node joins or leaves, only 1/N of keys are redistributed. Client libraries compute the target node locally, avoiding a central router.
-
Replication Manager -- Each primary node asynchronously replicates writes to 1-2 replica nodes. On primary failure, a replica is promoted. Uses a replication log with sequence numbers so replicas can catch up after brief network partitions.
- Cluster Coordinator -- A lightweight gossip protocol (like Redis Cluster) where nodes exchange heartbeats every second. Detects node failures within 5-10 seconds. Maintains and propagates the hash ring topology. No single point of failure.
Database Choice
The cache itself is purely in-memory with optional append-only file (AOF) persistence for durability across restarts. The AOF logs every write command and is fsync'd every second (configurable). For cluster metadata, each node maintains its own copy of the hash ring propagated via gossip protocol -- no external database needed. Monitoring metrics are exported to Prometheus.
Key API Endpoints
GET /cache/\{key\}
-> Returns: \{ value: "...", ttl_remaining_ms: 4500 \}
PUT /cache/\{key\}
-> Body: \{ value: "...", ttl_ms: 60000 \}
DELETE /cache/\{key\}
-> Returns: \{ deleted: true \}
Scaling Insight
The most important scaling decision is client-side routing via consistent hashing. Instead of routing all requests through a proxy (which becomes a bottleneck), each client library maintains a local copy of the hash ring and routes directly to the correct cache node. This eliminates the proxy hop, halves latency, and doubles throughput. The hash ring is updated via cluster topology change events pushed to clients.
Key Tradeoffs
| Decision | Option A | Option B | Chosen |
|---|---|---|---|
| Eviction | LRU (recency-based) | LFU (frequency-based) | LRU default -- simpler, works well for most workloads; LFU as option for skewed access patterns |
| Replication | Synchronous (strong consistency) | Asynchronous (eventual) | Async -- sub-ms latency preserved; accept small window of data loss on failure |
| Threading | Single-threaded event loop | Multi-threaded with locks | Single-threaded -- eliminates lock overhead, simpler reasoning, scale out with more nodes |
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
System-Specific Clarifying Questions
Before designing Design Distributed Cache, ask questions specific to THIS system:
- Who are the primary users? Understanding the user base shapes every technical decision — consumer apps have different requirements than enterprise B2B systems.
- What is the read-to-write ratio? This determines whether you optimize for fast reads (caching, denormalization) or fast writes (write-ahead logs, async processing).
- What is the geographic distribution? Users in one country vs. global users fundamentally changes your data replication and CDN strategy.
- What is the acceptable latency? Some features need sub-100ms responses, others can tolerate seconds. This determines your caching and architecture strategy.
- What is the consistency requirement? Some data (payments, inventory) needs strong consistency. Other data (social feeds, recommendations) can be eventually consistent.
Architecture Deep Dive
The architecture for Design Distributed Cache should be designed around the specific access patterns of the system. Do not apply generic templates — every system has unique hotspots, bottlenecks, and scaling challenges.
Write Path: How does data enter the system? Is it bursty (event-driven, flash sales) or steady (sensor data, logs)? Bursty writes need queuing and backpressure. Steady writes can go directly to the database.
Read Path: How is data consumed? Is it fan-out (one write, many reads like social feeds) or point lookups (one read for specific data like user profiles)? Fan-out reads benefit from pre-computation and caching. Point lookups benefit from efficient indexing.
Hot Spots: Where are the bottlenecks? For Design Distributed Cache, identify the component that will fail first under load and design mitigation strategies: caching, sharding, rate limiting, or async processing.