一致性哈希的Java实现

发表于12月 10, 2016由lucas

一致性哈希

借用维基百科的定义:

consistent hashing is a special kind of hashing such that when a hash table is resized.
In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped because the mapping between the keys and the slots is defined by a modular operation. Consistent hashing is a particular case of rendezvous hashing, which has a conceptually simpler algorithm, and was first described in 1996.

实现

Java里面的实现，主要依赖了TreeMap来实现。
hash算法，可以用字符的原生hashCode方法.

public class ConsistentHash<T> {
    private final int numberOfReplicas;

    private final SortedMap<Integer, T> circle = new TreeMap<>();// 存储虚拟节点的hash值到真实节点的映射

    public ConsistentHash(int numberOfReplicas, Collection<T> nodes) {
        this.numberOfReplicas = numberOfReplicas;
        for (T node : nodes) {
            add(node);
        }
    }

    /**
     * 不同的虚拟节点(i不同)有不同的hash值,但都对应同一个实际机器node
     * 虚拟node一般是均衡分布在环上的,数据存储在顺时针方向的虚拟node上
     * <p>
     * 对于一个实际机器节点 node, 对应 numberOfReplicas 个虚拟节点
     */
    public void add(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            String nodeName = node.toString() + i;
            int hashcode = nodeName.hashCode();
            circle.put(hashcode, node);

        }
    }

    public void remove(T node) {
        for (int i = 0; i < numberOfReplicas; i++) {
            circle.remove((node.toString() + i).hashCode());
        }
    }

    /**
     * 获得一个最近的顺时针节点,根据给定的key 取Hash
     * 然后再取得顺时针方向上最近的一个虚拟节点对应的实际节点
     * 再从实际节点中取得数据
     */
    public T get(Object key) {
        if (circle.isEmpty()) {
            return null;
        }

        int hash = key.hashCode();
        if (!circle.containsKey(hash)) {//数据映射在两台虚拟机器所在环之间,就需要按顺时针方向寻找节点
            SortedMap<Integer, T> tailMap = circle.tailMap(hash);
            hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
        }
        return circle.get(hash);
    }

    public long size() {
        return circle.size();
    }
}

关键是理解treeMap的tailMap(hash)方法。

使用了一致性哈希的知名系统

Couchbase automated data partitioning
OpenStack’s Object Storage Service Swift
Partitioning component of Amazon’s storage system Dynamo
Data partitioning in Apache Cassandra
Data partitioning in Voldemort
Akka’s consistent hashing router
Riak, a distributed key-value database
Gluster, a network-attached storage file system
Akamai content delivery network
Discord chat application
Maglev network load balancer
Data partitioning in Azure Cosmos DB

(完)

发表评论取消回复