200 likes | 368 Views
A Scalable Content-Addressable Network. Sylvia Ratnasamy , Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims. Introduction. Objective is create a scalable indexing mechanism for large-scale peer-to-peer systems
E N D
A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims
Introduction Objective is create a scalable indexing mechanism for large-scale peer-to-peer systems Content-Addressable Networks (CAN) are presented as a scalable, fault-tolerant and completely self-organizing peer-to-peer overlay network Indexing is accomplished with Distributed Hash Table mapping keys to values
Design • Multi-dimensional coordinate space with d dimensions (d-torus) • Each node owns a zone in the space • zone is a section of the hash table • So each node stores a section of the table
Distributed Hash Table • Uniform hash function is used to map key K to point P • Creates table of key value pairs (K, V) • For any point P, the corresponding (K, V) stored at node N that owns the zone that contains point P • Entries are retrieved by using same hash function to map K to P and retrieve entry from node that owns the zone containing P
Routing • Each node stores the IP address and coordinate zone of adjoining, or neighboring, nodes • This data makes up the node’s routing table • Greedy algorithm if Pis within the Zone of current node, return(K, V) else forward the query to the neighbor with coordinates closest to P
More Routing • Draw a straight line from point in local zone to P • Follow straight line via neighbors • For d-dimensional space, each node maintains 2d neighbors • Nodes are self-organizing, making decisions dynamically
Node Joining the CAN • New node N1 attempts to locate node N2 already in the CAN, typically using the IP address of a bootstrap node • Generate random point P in the space • Use hash function to locate zone that contains P • Send JOIN message to node N3 that owns zone that contains P • N3 splits its zone in half, assigns half to N1 by sending half of (K, V)pairs to N1, along with neighbor information • N3 informs neighbors of space reallocation
Node departure • Explicit departure – assigns zone and (K, V) pairs to a neighbor node to produce a single zone • Attempt to combine with a neighboring node to form a valid zone, else two zones are temporarily handled by smallest neighbor
Failures • Each node sends periodic update messages to each of its neighbors • Crashed nodes are detected by neighbors by a lack of periodic update messages • Neighbor nodes start takeover timer • Send a takeover message to all of failed node’s neighbors • Neighboring nodes agree on node with smallest volume • Smallest node takes over crashed node’s zone
Design Improvements • Multiple dimensions • Multiple realities • Multiple Hash functions • Overload the coordinate zones • Round trip time (RTT) Ratio • Topologically-sensitive construction (landmarking) • Uniform Partitioning
Multiple Dimensions • Increase number of dimensions • Reduce average path length • Reduce path latency • Increases routing table size due to greater number of neighbors
Multiple Realities • Increase number of Realities • Multiple coordinate spaces exist at the same time, each space is called a reality • Each node assigned a different node in each reality • Shorter paths, higher fault-tolerance • (K, V) mapping to P at (x,y,z) is possibly stored at three different nodes
Dimensions v. Realities • Two improvements with greatest impact • Dimensions have a larger effect on reducing path length • Realities provide stronger fault-tolerance and data availability
Multiple Hash Functions • Multiple hash functions increases data availability, reduces query latency • Improve data availability by mapping a single key to k points in the coordinate space by using k hash functions • (K, V) only unavailable when all nodes crash • Parallel querying of k nodes with k hash functions can reduce lookup latency
Overload Coordinate Zones • Overload the coordinate zones by assigning more than one node to share the same zone • Reduces the average path length, improved fault-tolerance • No additional neighbors
RTT Ratio • Limiting the round-trip-time (RTT) • Each node measures RTT to neighbors • Favor the lower latency paths
Topologically Sensitive Construction • Use physical landmarks for construction • Each node measures RTT of each landmark
Uniform Partitioning • A form of volume balancing • When a JOIN is received by a node, it also checks its neighbor nodes when deciding to accept JOIN • Largest neighbor accepts and splits • Achieves a load balance amongst the nodes
Design Review • Ran two simulations using218 nodes • “bare bones” CAN withoutimprovements • “knobs-on-full” CAN using all features except landmarks and multiple hashes • Biggest gain from number of dimensions (path length 198 to 5)