System Design
Consistent Hashing: Why Every Distributed System Uses It
From theory to production — how consistent hashing solves the rebalancing problem
The Problem with Naive Hashing
You have N servers and need to distribute requests. The obvious approach:
server = hash(key) % NThis works until you add or remove a server. When N changes, almost every key maps to a different server. If you have a cache layer, this means a thundering herd — every cached item becomes a cache miss simultaneously.
Consistent Hashing: The Key Insight
Instead of mapping keys to servers directly, map both keys and servers onto a ring (hash space from 0 to 2³²-1).
graph TB
subgraph "Hash Ring"
direction TB
A["Server A (pos: 1000)"]
B["Server B (pos: 5000)"]
C["Server C (pos: 9000)"]
end
K1["Key 'user:42' → hash: 3200"] --> B
K2["Key 'session:99' → hash: 7500"] --> C
K3["Key 'cache:abc' → hash: 800"] --> AEach key is handled by the first server encountered when walking clockwise from the key's position on the ring.
Why It's Better
When a server is added or removed, only keys between the affected server and its predecessor need to move. On average, only K/N keys are remapped (where K is total keys, N is total servers).
import hashlib
from bisect import bisect_right
class ConsistentHashRing:
def __init__(self, nodes=None, virtual_nodes=150):
self.virtual_nodes = virtual_nodes
self.ring = {}
self.sorted_keys = []
if nodes:
for node in nodes:
self.add_node(node)
def _hash(self, key):
return int(hashlib.md5(key.encode()).hexdigest(), 16)
def add_node(self, node):
for i in range(self.virtual_nodes):
vnode_key = f"{node}:vn{i}"
hash_val = self._hash(vnode_key)
self.ring[hash_val] = node
self.sorted_keys.append(hash_val)
self.sorted_keys.sort()
def remove_node(self, node):
for i in range(self.virtual_nodes):
vnode_key = f"{node}:vn{i}"
hash_val = self._hash(vnode_key)
del self.ring[hash_val]
self.sorted_keys.remove(hash_val)
def get_node(self, key):
if not self.ring:
return None
hash_val = self._hash(key)
idx = bisect_right(self.sorted_keys, hash_val) % len(self.sorted_keys)
return self.ring[self.sorted_keys[idx]]Virtual Nodes: Solving Hotspots
With few physical servers, the distribution on the ring can be uneven. Virtual nodes solve this — each physical server is represented by 100-200 points on the ring, giving much more uniform distribution.
Where You'll See This
- DynamoDB: Partition key distribution
- Cassandra: Token ring for data partitioning
- Memcached: Client-side consistent hashing for cache distribution
- CDNs: Request routing to edge servers
The next time you're designing a system that needs to scale horizontally, consistent hashing should be your default mental model for key distribution.
Ayoush Chourasia · 2026-03-15