Skip to main content
SDMastery

Consistent Hashing Implementation

Working Java and Python implementations of consistent hashing with virtual nodes, hash ring, and key distribution.

Consistent Hashing Implementation system design overview showing key components and metrics
High-level overview of Consistent Hashing Implementation

Consistent Hashing — Code Lab

Consistent hashing is a technique that minimizes key redistribution when nodes are added or removed from a distributed system. This implementation demonstrates the core algorithm with virtual nodes for balanced distribution.

How the Algorithm Works

Consistent Hashing Implementation system architecture with service components and data flow
System architecture for Consistent Hashing Implementation
  1. Hash Ring: Both servers and keys are mapped onto a circular hash space (0 to 2{}32 - 1)
  2. Virtual Nodes: Each physical server is mapped to multiple positions on the ring for even distribution
  3. Key Assignment: A key is assigned to the first server found by walking clockwise from the key's hash position
  4. Node Changes: When a server is added/removed, only K/N keys need to be remapped (K = total keys, N = total nodes)

Complexity Analysis

OperationTime ComplexitySpace Complexity
Add NodeO(K log N)O(N * V)
Remove NodeO(K log N)O(N * V)
Lookup KeyO(log N * V)O(1)

Where N = number of nodes, V = virtual nodes per server, K = total keys.

Step-by-step diagram showing how Consistent Hashing Implementation works in practice
How Consistent Hashing Implementation works step by step

Java Implementation

java
package implementations.java.consistent_hashing;

import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.*;

public class ConsistentHashing {
    private final int numReplicas; // Number of virtual nodes per server
    private final TreeMap<Long, String> ring; // Hash ring storing virtual nodes
    private final Set<String> servers; // Set of physical servers

    public ConsistentHashing(List<String> servers, int numReplicas) {
        this.numReplicas = numReplicas;
        this.ring = new TreeMap<>();
        this.servers = new HashSet<>();

        // Add each server to the hash ring
        for (String server : servers) {
            addServer(server);
        }
    }

    private long hash(String key) {
        try {
            MessageDigest md = MessageDigest.getInstance("MD5");
            md.update(key.getBytes());
            byte[] digest = md.digest();
            return ((long) (digest[0] & 0xFF) << 24) |
                   ((long) (digest[1] & 0xFF) << 16) |
                   ((long) (digest[2] & 0xFF) << 8) |
                   ((long) (digest[3] & 0xFF));
        } catch (NoSuchAlgorithmException e) {
            throw new RuntimeException("MD5 algorithm not found", e);
        }
    }

    public void addServer(String server) {
        servers.add(server);
        for (int i = 0; i < numReplicas; i++) {
            long hash = hash(server + "-" + i); // Unique hash for each virtual node
            ring.put(hash, server);
        }
    }

    public void removeServer(String server) {
        if (servers.remove(server)) {
            for (int i = 0; i < numReplicas; i++) {
                long hash = hash(server + "-" + i);
                ring.remove(hash);
            }
        }
    }

    public String getServer(String key) {
        if (ring.isEmpty()) {
            return null; // No servers available
        }

        long hash = hash(key);
        // Find the closest server in a clockwise direction
        Map.Entry<Long, String> entry = ring.ceilingEntry(hash);
        if (entry == null) {
            // If we exceed the highest node, wrap around to the first node
            entry = ring.firstEntry();
        }
        return entry.getValue();
    }

    public static void main(String[] args) {
        List<String> servers = Arrays.asList("S0", "S1", "S2", "S3", "S4", "S5");
        ConsistentHashing ch = new ConsistentHashing(servers, 3);

        // Step 2: Assign requests (keys) to servers
        System.out.println("UserA is assigned to: " + ch.getServer("UserA"));
        System.out.println("UserB is assigned to: " + ch.getServer("UserB"));

        // Step 3: Add a new server dynamically
        ch.addServer("S6");
        System.out.println("UserA is now assigned to: " + ch.getServer("UserA"));

        // Step 4: Remove a server dynamically
        ch.removeServer("S2");
        System.out.println("UserB is now assigned to: " + ch.getServer("UserB"));
    }
}

Python Implementation

python
import hashlib
import bisect

class ConsistentHashing:
    def __init__(self, servers, num_replicas=3):
        """
        Initializes the consistent hashing ring.

        - servers: List of initial server names (e.g., ["S0", "S1", "S2"])
        - num_replicas: Number of virtual nodes per server for better load balancing
        """
        self.num_replicas = num_replicas  # Number of virtual nodes per server
        self.ring = {}  # Hash ring storing virtual node mappings
        self.sorted_keys = []  # Sorted list of hash values (positions) on the ring
        self.servers = set()  # Set of physical servers (used for tracking)

        # Add each server to the hash ring
        for server in servers:
            self.add_server(server)

    def _hash(self, key):
        """Computes a hash value for a given key using MD5."""
        return int(hashlib.md5(key.encode()).hexdigest(), 16)

    def add_server(self, server):
        """
        Adds a server to the hash ring along with its virtual nodes.

        - Each virtual node is a different hash of the server ID to distribute load.
        - The server is hashed multiple times and placed at different positions.
        """
        self.servers.add(server)
        for i in range(self.num_replicas):  # Creating multiple virtual nodes
            hash_val = self._hash(f"{server}-{i}")  # Unique hash for each virtual node
            self.ring[hash_val] = server  # Map hash to the server
            bisect.insort(self.sorted_keys, hash_val)  # Maintain a sorted list for efficient lookup

    def remove_server(self, server):
        """
        Removes a server and all its virtual nodes from the hash ring.
        """
        if server in self.servers:
            self.servers.remove(server)
            for i in range(self.num_replicas):
                hash_val = self._hash(f"{server}-{i}")  # Remove each virtual node's hash
                self.ring.pop(hash_val, None)  # Delete from hash ring
                self.sorted_keys.remove(hash_val)  # Remove from sorted key list

    def get_server(self, key):
        """
        Finds the closest server for a given key.

        - Hash the key to get its position on the ring.
        - Move clockwise to find the nearest server.
        - If it exceeds the last node, wrap around to the first node.
        """
        if not self.ring:
            return None  # No servers available

        hash_val = self._hash(key)  # Hash the key
        index = bisect.bisect(self.sorted_keys, hash_val) % len(self.sorted_keys)  # Locate nearest server
        return self.ring[self.sorted_keys[index]]  # Return the assigned server

# ----------------- Usage Example -------------------

# Step 1: Initialize Consistent Hashing with servers
servers = ["S0", "S1", "S2", "S3", "S4", "S5"]
ch = ConsistentHashing(servers)

# Step 2: Assign requests (keys) to servers
print(ch.get_server("UserA"))  # Maps UserA to a server
print(ch.get_server("UserB"))  # Maps UserB to a server

# Step 3: Add a new server dynamically
ch.add_server("S6")
print(ch.get_server("UserA"))  # Might be reassigned if affected

# Step 4: Remove a server dynamically
ch.remove_server("S2")
print(ch.get_server("UserB"))  # Might be reassigned if affected

Key Design Decisions

Comparison table for Consistent Hashing Implementation showing key metrics and tradeoffs
Comparing key aspects of Consistent Hashing Implementation
  • MD5 Hashing: Both implementations use MD5 to distribute keys uniformly across the ring. MD5 is not cryptographically secure, but its distribution properties are excellent for hashing.
  • Virtual Nodes: Without virtual nodes, 3 physical servers might hash to adjacent positions, leaving large gaps. With 100-200 virtual nodes per server, distribution is nearly uniform.
  • TreeMap / Sorted List: Java uses TreeMap (red-black tree) for O(log n) ceiling lookups. Python uses bisect on a sorted list for the same complexity.

Production Considerations

  • Thread Safety: The Java implementation is not thread-safe. In production, wrap mutations in synchronized blocks or use ConcurrentSkipListMap.
  • Hash Function: For production, consider xxHash or MurmurHash3 — they are faster than MD5 while maintaining good distribution.
  • Monitoring: Track the number of keys per node to detect imbalance. Alert if any node holds more than 2x its fair share.
  • Rebalancing: When adding a node, only keys between the new node and its predecessor need to migrate. Implement this as a background process.
Data flow diagram for Consistent Hashing Implementation showing request and response paths
Data flow through Consistent Hashing Implementation
  1. How does consistent hashing differ from modular hashing (key % N)?
  2. What problem do virtual nodes solve?
  3. How would you implement rebalancing when a new node is added?
  4. How is consistent hashing used in distributed caching (Memcached)?

Source

Key components of Consistent Hashing Implementation with roles and responsibilities
Key components of Consistent Hashing Implementation

Code from System-Design-Overview by Ashish Pratap Singh. Explanations are original.

Practical Implementation for .NET Developers

In a .NET application, you would typically implement this pattern using the following approach:

Interview tips for Consistent Hashing Implementation system design questions
Interview tips for Consistent Hashing Implementation

ASP.NET Core setup: Create a service class that encapsulates the logic, register it with dependency injection, and inject it into your controllers or minimal API endpoints. The built-in DI container handles lifecycle management.

Entity Framework Core: For database interactions, EF Core provides the ORM layer. Use migrations for schema management and raw SQL for performance-critical queries. Consider Dapper for read-heavy paths where EF Core's overhead matters.

Azure integration: If deploying to Azure, leverage managed services — Azure Cache for Redis, Azure SQL, Azure Service Bus, Azure Cosmos DB. These eliminate operational overhead and provide built-in monitoring through Application Insights.

Decision guide showing when to use Consistent Hashing Implementation and when to avoid
When to use Consistent Hashing Implementation

Testing: Use xUnit with Testcontainers for integration tests that spin up real databases in Docker. Mock external dependencies with NSubstitute. The WebApplicationFactory class lets you test your entire HTTP pipeline in-process.

Monitoring: Add Application Insights telemetry to track request latency, dependency calls, and custom metrics. Use structured logging with Serilog to make production debugging possible:

text
Log.Information("Processing order {OrderId} for {CustomerId}", orderId, customerId);

This gives you searchable, structured logs in Azure Monitor or Seq.

Pros and cons analysis of Consistent Hashing Implementation for system design decisions
Advantages and disadvantages of Consistent Hashing Implementation

Key Takeaways for Interviews

  • Understand the core problem this resource addresses and be able to explain it in 2-3 sentences without jargon
  • Know the key trade-offs: what does this approach optimize for, and what does it sacrifice?
  • Be ready to compare this with alternative approaches and explain when each is appropriate
  • Connect the concepts to real-world systems you have worked with or studied
  • Demonstrate depth by discussing failure modes and how they are handled

How This Applies to Modern .NET Systems

Real-world companies using Consistent Hashing Implementation in production systems
Real-world examples of Consistent Hashing Implementation

The concepts from this resource translate to .NET through several established libraries and patterns:

Azure managed services often abstract away the underlying distributed systems complexity, but understanding the fundamentals helps you configure them correctly, debug issues, and make informed architectural decisions.

NuGet packages in the .NET ecosystem provide production-ready implementations of many patterns described in this resource. Before building custom solutions, check if a well-maintained package already exists.

ASP.NET Core middleware pipeline is where many of these patterns are implemented in practice: caching, rate limiting, health checks, and circuit breaking all fit naturally into the middleware model.