Entire website open-sourced at: https://github.com/grokit/website_grokit_ca/. Feel free to make pull requests if you find mistakes!

This is the famous paper by David K. et al. which introduces two distributed computing concepts: **distributed hashing** and **random trees**.

**Link to paper**: ACM website for citations, full pdf.

**Keywords**: hash function, consistent hashing, random trees, distributed systems.

**Random trees are introduced to solve the problem of 'hot spots' on the internet.**

Let's say you have P web-pages distributed on a server in C' (the set of all servers). Each page p in P maps to a server c in C'. In other words, you use this simple allocation scheme to map p to c:

```
h: P -> C'
```

... using a simple linear congruential generator:

$$X_{n+1} = \left( a X_n + c \right)~~\bmod~~m$$

This works fine if the load across all pages p in P is uniform. Unfortunately, on the Internet a minority of pages are likely to receive most of the load. Which mean that with the above allocation of direct page-to-server, most servers will sit idle while a few servers will be overloaded.

The idea behind *random trees* is to avoid 'hot-spotting' by 1.) assigning an *abstract d-ary tree* for **each page** with an owner root and 2.) picking a random leaf-to-root path in that tree for every client-side request.

Each page p in P has is associated with a tree (can be generated by a predictable random function seeded by the page name). For example, a web-page is associated with the following server tree (all nodes being servers):

```
root
/ \
a b
/ \ / \
c d e f
```

The node 'root' owns the web-page. If a client wants the web-page, it issues a request to *a random leaf node of the tree*; i.e. it picks a random node in set [c, d, e, f]. If that node does not have the page, it forwards the request to its parent, all the way to root. Since root owns the page, you are guaranteed to eventually hit a server that owns the page.

Every-time a node receives a request for a page, it increments a counter. If that counter reaches a high-enough number, then that node *caches the page p*.

This relieves the hot-spot problem since a page with low load will only be owned by root and serviced by root. If the load increases, [root, a, b] will start serving that page. If it increases even more, [root, a, b, c, d, e, f] will start serving that page. Since every page has a different random tree, the load is evenly distributed on all *leaf* servers (some root servers might have higher load, but this is greatly mitigated by the exponential number of nodes downwards that start serving the page as the load increases).

**Consistent hashing addresses the problem of partial membership view of servers.**

If you have a server farm with a constant number of servers that never went down, you can use a normal hash function in order to map a resource to a server. However, since on the Internet servers are added and removed (or go down) all the time, this is not a pragmatic approach.

If a typical hash function (see how a linear congruential generator can be used to create a hash function) is used, adding or removing one server to your server set would remap all resources (with a (C-1)/C probability), which would be disastrous. *What we are looking for is a hash function that minimally changes the membership mapping of resource to server when servers are added or removed*. We will call this the *smoothness* property.

Note that if you accept that the *views* of servers is not consistent on all servers, an item can always be mapped to more than one server and your client code has to support that. The good thing about consistent hashing, is that the number of servers to which a single resource can be assigned to is low (called the *spread* property) assuming that the inconsistency in the views is only a few servers. So let's say you have 100 servers, and each of those servers sees 95 other servers properly (with uniform probability of having bad information about a server). A web-page p will still be mapped to a small finite set of servers. If you ask every server is the cloud 'where is page p?', you will get a few different answers (let's say c7, c8, c9). If you used normal hashing, the servers will completely disagree on where the information should be (you could end-up with the complete set [c001...c100] as answers).

If a server caches and returns a page on request, then with a normal hash function all servers will end up having to hold all pages in memory (obviously bad). This is called the *load* property. Using a consistent hash function, all servers only need to know a subset of all pages p.

So, which hash functions have good 'consistent hashing properties' (smoothness, spread and load)?

One way that is easy to understand is to define a hash function that outputs a float between 0 and 1. Then servers from 1...C are spread (randomly or evenly) over that interval. The server closest to the hash output is the server who owns the resource.

Let's say you have 4 (c1, c2, ..., c4) servers and 12 resources (r1, r2, ..., r14) spread on those servers. The view is inconsistent. Some people view this:

```
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
c1 c2 c3 c4
c1: [r1, r2, r3]
c2: [r4, r5, r6]
c3: [r7, r8, r9, r10]
c4: [r11, r12]
```

... and some people never got the heads up that c3 went online, so they view this:

```
r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
c1 c2 c4
c1: [r1, r2, r3, r4]
c2: [r5, r6, r7, r8, r9]
c4: [r10, r11, r12]
```

So people disagree on who holds r4, r7, r8, r9, r10; 5/12 keys (42%) for a disagreement of 1/4 (25%) of the servers. The theoretical disagreement rate is (R/C)/R = 1/C, which is be 25% (our example uses a silly version of consistent hashing, hence the higher disagreement rate). Using a normal hashing function the disagreement would be (C-1)/C (75%). As C increases, the advantage of consistent hashing over normal hashing will only increase.

A better solution is to have a 0...1 ring instead of a line, but the simple line is enough to get an intuitive understanding of the principle.

**Note: this is only an example, never use this as an implementation.** For more information on the actual implementation of a hash function that has consistent hashing properties, refer to the following:

- Paper from the same author that actually goes over the implementation of consistent hashing "Web Caching with Consistent Hashing ".
- 3 minutes, extremely simple video overview.
- Wikipedia entry.
- Example of ring implementation with python code.

For a great overview of hash function notation, see section 2.1 of this paper.

The basic thing to understand is that:

```
h: P x [1..C] -> C'
```

... means that for each webpage p in the set of all webpages P AND each total number of caches from 1 to C, the hash function h maps p to a set C' of servers.

The 'x [1..C]' part is a bit confusing. The basic notation A x B means for each pair of each member a of set A and b of set B. For simplicity's sake you can think of h: P x [1..C] -> C' as "each webpage is mapped to a unique set of servers C', and this mapping changes if the number of servers in C' changes".

date: ['2015-01-08']

category: ['auto']

tags: ['viewA', 'academia', 'papers', 'distributed systems']

category: ['auto']

tags: ['viewA', 'academia', 'papers', 'distributed systems']