Optimizing Data Availability and Load Balancing in Scalable Content-Addressable Networks

A Scalable Content-Addressable Network S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Proceedings of ACM SIGCOMM ’01 Sections: 3.5 & 3.7 Παρουσίαση: Τσώτσος Θοδωρής

3.5 Multiple hash functions (1/3) CAN basic design • To store a pair (K1,V1), key K1 mapped onto a point P in the coordinate space using a uniform hash function. • The corresponding (key,value) pair is then stored at the node that owns the zone within which the point P lies. Improvement (Data availability) • Use k different hash functions to map a single key onto k points in the coordinate space. • Replicate a single (key, value) pair at k distinct nodes in the system.

3.5 Multiple hash functions (2/3) • Advantages • A (key, value) pair is then unavailable only when all k replicas are simultaneously unavailable. • Queries for a particular hash entry could be sent to all k nodes in parallel (thus, reducing the average query latency). (Figure 7) • Disadvantages • Increasing the size of the (key, value) database and query traffic (in the case of parallel queries) by a factor of k. • Note (!!!) Instead of querying all k nodes, a node can choose to retrieve an entry from that node which is closest to the coordinate space.

3.5 Multiple hash functions (3/3)

3.7 More Uniform Partitioning (1/7) CAN basic design • When a new node joins the system, • A JOIN message is sent to the owner of a random point in the space. • This current occupant node then splits its zone in half and assigns one half to the new node.

3.7 More Uniform Partitioning (2/7) Improvement • Note that the node who is the owner of the random point in the space, knows not only its own zone coordinates, but also those of its neighbors. • Instead of directly splitting its own zone • First compares the volume of its zone with those of its neighbors in the coordinate space. • The zone that is split to accommodate the new node is the one with the largest volume. Advantages • This volume balancing check tries to achieve a more uniform partitioning of the space over all the nodes.

3.7 More Uniform Partitioning (3/7) • Why the uniform partitioning is desirable?? • (Key, value) pairs are spread across the coordinate space using a uniform hash function. • Thus the volume of a node’s zone is indicative of the size of the (key, value) database the node will have to store. • Hence, indicative of the load placed on the node. • Thus a uniform partitioning is desirable to achieve load balancing. • It is not accurate for true load balancing because some (key, value) pairs will be more popular than others. => The nodes hosting those pairs will have higher load.

3.7 More Uniform Partitioning (4/7) Consider: • Total volume of the entire coordinate space is VT • n total number of nodes in the system • A perfect partitioning of the space among the n nodes would assign a zone of volume VT/n to each node. (We use V to denote VT/n)

3.7 More Uniform Partitioning (5/7) Experiment (Figure 9) • Ran simulations with 216 nodes both with and without this uniform partitioning feature. • The authors plot: • on the X axis different possible volumes in terms of V and • on the Y axis the percentage of the total number of nodes that were assigned zones of a particular volume. Results • Without the uniform partitioning feature a little over 40% of the nodes are assigned to zones with volume V. • Using the uniform partitioning feature almost 90% of the nodes are assigned to zones with volume V. • The largest volume without uniform partitioning is 8V compare to 2V using uniform partitioning feature.

3.7 More Uniform Partitioning (6/7)

3.7 More Uniform Partitioning (7/7) • Note (!!!) The partitioning of the space further improves with increasing dimensions.

Τέλος

Optimizing Data Availability and Load Balancing in Scalable Content-Addressable Networks

Optimizing Data Availability and Load Balancing in Scalable Content-Addressable Networks

Presentation Transcript

CONTENT ADDRESSABLE NETWORK

Content Addressable Memory

Scalable Content-Addressable Networks

A Scalable Content Addressable Network (CAN)

A Scalable Content Addressable Network

A Scalable Content-Addressable Network

A Scalable Content Addressable Network (CAN)

Content Addressable Networks

SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network

Content Addressable Memories

A scalable Content- Addressable Network

Content Addressable Network CAN

A Scalable, Content-Addressable Network

SCAN: A Dynamic, Scalable, and Efficient Content Distribution Network

A Scalable, Content-Addressable Network

A Scalable Content Addressable Network

CONTENT ADDRESSABLE NETWORK

Towards a Scalable, Adaptive and Network-aware Content Distribution Network

A Scalable Content-Addressable Network (CAN)