Consistent Hashing Implementation
Working Java and Python implementations of consistent hashing with virtual nodes, hash ring, and key distribution.
Consistent Hashing — Code Lab
Consistent hashing is a technique that minimizes key redistribution when nodes are added or removed from a distributed system. This implementation demonstrates the core algorithm with virtual nodes for balanced distribution.
How the Algorithm Works
- Hash Ring: Both servers and keys are mapped onto a circular hash space (0 to 2{}32 - 1)
- Virtual Nodes: Each physical server is mapped to multiple positions on the ring for even distribution
- Key Assignment: A key is assigned to the first server found by walking clockwise from the key's hash position
- Node Changes: When a server is added/removed, only K/N keys need to be remapped (K = total keys, N = total nodes)
Complexity Analysis
| Operation | Time Complexity | Space Complexity |
|---|---|---|
| Add Node | O(K log N) | O(N * V) |
| Remove Node | O(K log N) | O(N * V) |
| Lookup Key | O(log N * V) | O(1) |
Where N = number of nodes, V = virtual nodes per server, K = total keys.
Java Implementation
package implementations.java.consistent_hashing;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.*;
public class ConsistentHashing {
private final int numReplicas; // Number of virtual nodes per server
private final TreeMap<Long, String> ring; // Hash ring storing virtual nodes
private final Set<String> servers; // Set of physical servers
public ConsistentHashing(List<String> servers, int numReplicas) {
this.numReplicas = numReplicas;
this.ring = new TreeMap<>();
this.servers = new HashSet<>();
// Add each server to the hash ring
for (String server : servers) {
addServer(server);
}
}
private long hash(String key) {
try {
MessageDigest md = MessageDigest.getInstance("MD5");
md.update(key.getBytes());
byte[] digest = md.digest();
return ((long) (digest[0] & 0xFF) << 24) |
((long) (digest[1] & 0xFF) << 16) |
((long) (digest[2] & 0xFF) << 8) |
((long) (digest[3] & 0xFF));
} catch (NoSuchAlgorithmException e) {
throw new RuntimeException("MD5 algorithm not found", e);
}
}
public void addServer(String server) {
servers.add(server);
for (int i = 0; i < numReplicas; i++) {
long hash = hash(server + "-" + i); // Unique hash for each virtual node
ring.put(hash, server);
}
}
public void removeServer(String server) {
if (servers.remove(server)) {
for (int i = 0; i < numReplicas; i++) {
long hash = hash(server + "-" + i);
ring.remove(hash);
}
}
}
public String getServer(String key) {
if (ring.isEmpty()) {
return null; // No servers available
}
long hash = hash(key);
// Find the closest server in a clockwise direction
Map.Entry<Long, String> entry = ring.ceilingEntry(hash);
if (entry == null) {
// If we exceed the highest node, wrap around to the first node
entry = ring.firstEntry();
}
return entry.getValue();
}
public static void main(String[] args) {
List<String> servers = Arrays.asList("S0", "S1", "S2", "S3", "S4", "S5");
ConsistentHashing ch = new ConsistentHashing(servers, 3);
// Step 2: Assign requests (keys) to servers
System.out.println("UserA is assigned to: " + ch.getServer("UserA"));
System.out.println("UserB is assigned to: " + ch.getServer("UserB"));
// Step 3: Add a new server dynamically
ch.addServer("S6");
System.out.println("UserA is now assigned to: " + ch.getServer("UserA"));
// Step 4: Remove a server dynamically
ch.removeServer("S2");
System.out.println("UserB is now assigned to: " + ch.getServer("UserB"));
}
}
Python Implementation
import hashlib
import bisect
class ConsistentHashing:
def __init__(self, servers, num_replicas=3):
"""
Initializes the consistent hashing ring.
- servers: List of initial server names (e.g., ["S0", "S1", "S2"])
- num_replicas: Number of virtual nodes per server for better load balancing
"""
self.num_replicas = num_replicas # Number of virtual nodes per server
self.ring = {} # Hash ring storing virtual node mappings
self.sorted_keys = [] # Sorted list of hash values (positions) on the ring
self.servers = set() # Set of physical servers (used for tracking)
# Add each server to the hash ring
for server in servers:
self.add_server(server)
def _hash(self, key):
"""Computes a hash value for a given key using MD5."""
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_server(self, server):
"""
Adds a server to the hash ring along with its virtual nodes.
- Each virtual node is a different hash of the server ID to distribute load.
- The server is hashed multiple times and placed at different positions.
"""
self.servers.add(server)
for i in range(self.num_replicas): # Creating multiple virtual nodes
hash_val = self._hash(f"{server}-{i}") # Unique hash for each virtual node
self.ring[hash_val] = server # Map hash to the server
bisect.insort(self.sorted_keys, hash_val) # Maintain a sorted list for efficient lookup
def remove_server(self, server):
"""
Removes a server and all its virtual nodes from the hash ring.
"""
if server in self.servers:
self.servers.remove(server)
for i in range(self.num_replicas):
hash_val = self._hash(f"{server}-{i}") # Remove each virtual node's hash
self.ring.pop(hash_val, None) # Delete from hash ring
self.sorted_keys.remove(hash_val) # Remove from sorted key list
def get_server(self, key):
"""
Finds the closest server for a given key.
- Hash the key to get its position on the ring.
- Move clockwise to find the nearest server.
- If it exceeds the last node, wrap around to the first node.
"""
if not self.ring:
return None # No servers available
hash_val = self._hash(key) # Hash the key
index = bisect.bisect(self.sorted_keys, hash_val) % len(self.sorted_keys) # Locate nearest server
return self.ring[self.sorted_keys[index]] # Return the assigned server
# ----------------- Usage Example -------------------
# Step 1: Initialize Consistent Hashing with servers
servers = ["S0", "S1", "S2", "S3", "S4", "S5"]
ch = ConsistentHashing(servers)
# Step 2: Assign requests (keys) to servers
print(ch.get_server("UserA")) # Maps UserA to a server
print(ch.get_server("UserB")) # Maps UserB to a server
# Step 3: Add a new server dynamically
ch.add_server("S6")
print(ch.get_server("UserA")) # Might be reassigned if affected
# Step 4: Remove a server dynamically
ch.remove_server("S2")
print(ch.get_server("UserB")) # Might be reassigned if affected
Key Design Decisions
- MD5 Hashing: Both implementations use MD5 to distribute keys uniformly across the ring. MD5 is not cryptographically secure, but its distribution properties are excellent for hashing.
- Virtual Nodes: Without virtual nodes, 3 physical servers might hash to adjacent positions, leaving large gaps. With 100-200 virtual nodes per server, distribution is nearly uniform.
- TreeMap / Sorted List: Java uses TreeMap (red-black tree) for O(log n) ceiling lookups. Python uses bisect on a sorted list for the same complexity.
Production Considerations
- Thread Safety: The Java implementation is not thread-safe. In production, wrap mutations in synchronized blocks or use ConcurrentSkipListMap.
- Hash Function: For production, consider xxHash or MurmurHash3 — they are faster than MD5 while maintaining good distribution.
- Monitoring: Track the number of keys per node to detect imbalance. Alert if any node holds more than 2x its fair share.
- Rebalancing: When adding a node, only keys between the new node and its predecessor need to migrate. Implement this as a background process.
Related Interview Questions
- How does consistent hashing differ from modular hashing (key % N)?
- What problem do virtual nodes solve?
- How would you implement rebalancing when a new node is added?
- How is consistent hashing used in distributed caching (Memcached)?
Source
Code from System-Design-Overview by Ashish Pratap Singh. Explanations are original.
Practical Implementation for .NET Developers
In a .NET application, you would typically implement this pattern using the following approach:
ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.
Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.
Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.
Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.
Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);
This gives you searchable, structured logs in Azure Monitor or Seq.
Key Takeaways for Interviews
- Understand the core problem this resource addresses and be able to explain it in 2-3 sentences without jargon
- Know the key trade-offs: what does this approach optimize for, and what does it sacrifice?
- Be ready to compare this with alternative approaches and explain when each is appropriate
- Connect the concepts to real-world systems you have worked with or studied
- Demonstrate depth by discussing failure modes and how they are handled
How This Applies to Modern .NET Systems
The concepts from this resource translate to .NET through several established libraries and patterns:
Azure managed services often abstract away the underlying distributed systems complexity, but understanding the fundamentals helps you configure them correctly, debug issues, and make informed architectural decisions.
NuGet packages in the .NET ecosystem provide production-ready implementations of many patterns described in this resource. Before building custom solutions, check if a well-maintained package already exists.
ASP.NET Core middleware pipeline is where many of these patterns are implemented in practice: caching, rate limiting, health checks, and circuit breaking all fit naturally into the middleware model.