360 likes | 573 Views
A scalable Content- Addressable Network. Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker. Pirammanayagam Manickavasagam. Overview. Introduction Design Design Improvements Design Review Related works Discussion. Introduction. Hash Table Functionality:
E N D
A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam
Overview • Introduction • Design • Design Improvements • Design Review • Related works • Discussion
Introduction • Hash Table Functionality: • Maps ‘key’ to a ‘value’. • Content Addressable Network (CAN) :- Is a concept that provides distributed infrastructure which has Hash Table like functionality on Internet like Scale. • Characteristics: • scalable, fault-tolerant and completely self-organizing.
Introduction (cont..) • Napster • Locating a file is centralized. • Gnutella • Floods the request for a file, not scalable • CAN provides a solution: • Scalable - Nodes maintain small amount of control state • Distributed - Hash table is stored in all Peers, so it is.
Design • Each node stores a chunk of hash table entry and details of adjacent zones. • Requests are forwarded towards the CAN node that contains the key. • Indexing uses virtual d-dimensional Cartesian coordinates. • Coordinates are purely logical
Coordinate Space • Each node randomly picks a coordinate. • Coordinate space is dynamically partitioned • Each node owns its individual zone 0,1 • A • C • D • B 1,0 0,0
Design (cont..) • Inserting a pair ( key K1, value V1) • Use Hash function to map K1 to a point P1 in space • Then this pair is stored in the Node that owns the zone • Retrieving a value: • Need to know the key and use the key to identify the node • Node learns and maintains the table of details of adjacent nodes.
Routing • Information's needed for routing • CAN node hold routing table that contains IP address and its virtual coordinate space. • Neighbor is determined if one of the d-dimension is same and another dimension abuts. • For a d-dimensional coordinate individual node maintains 2d neighbors
In figure nodes 5&1 are neighbors, as 5 has same Y coordinates as 1 and X coordinate abut 1’s.
Routing (Cont..) • CAN message has destination address • By simple greedy forwarding to the neighbor closest to the destination it proceeds it routing. • average path length = (d/4)n1/d hops. ( n - # of zones) • As many path is available, network sustains even if some node fails.
Construction • 1. First the new node must find a node already in the CAN. • 2. Next, using the CAN routing mechanisms, it must find a node whose zone will be split. • 3. Finally, the neighbors of the split zone must be notified so that routing can include the new node.
Bootstrap • From DNS domain name, one or more bootstrap nodes is determined. • A bootstrap node maintains a partial list of CAN nodes it believes are currently in the system. • TO join a CAN, a new node looks up the CAN domain name in DNS to retrieve a bootstrap nodes IP address. • This bootstrap node then supplies the IP address of several randomly chosen nodes currently in system.
Finding a zone • New node randomly chooses a point (p) in space. • Sends JOIN request destined for P. • This is sent into CAN via existing CAN node. • Current occupant node then splits its zone in half and assigns one half to the new node. • Splitting is done by assuming certain order. • Eg, in 2 d, X coordinate splits first and then Y coordinate.
Maintenance • Departure of a Node • Single Node Failure • Multiple Failure
Departure of a Node • The node that departs hands over the details to the one of its neighbor. • If the zone of one of the neighbors can be merged with the departing node’s zone to produce a valid single zone, then this is done. • If not, then the zone is handed to the neighbor whose current zone is smallest, and that node will then temporarily handle both zones.
Departure of a Node When node F fails, E will be merged with F 0,1 • D • A • C • E • F • . • D • B 1,0 0,0
Failures • Prolonged absence of update message will indicate the failure of a node. • Neighbor node starts a takeover timer running. • When the timer expires, a node sends a TAKEOVER message conveying its own zone volume to all of the failed node’s neighbors. • It accepts the TAKEOVER only if the zone volume in the message is smaller than its own zone volume. • Otherwise it sends its TAKEOVER message.
Multiple Failure • First does a ring search to get the unreachable nodes. • Then rebuilds neighbor state table to do safe takeover.
Design Improvements • Multi-dimensioned coordinate spaces • Increasing the dimensions of the CAN coordinate space reduces the routing path length, and hence the path latency. • Increase in Dimension => increase in neighbor => increase in routing => increases routing fault tolerance
Design Improvements • Realities: multiple coordinate spaces • Each node maintain multiple, independent coordinate spaces with each node in the system. Each such coordinate space is a “reality”. • Given a coordinate, it is searched in all realities. • This reduces the average path length. • Multiple dimensions vs. multiple realities • Multiple Reality has increased fault tolerance and data availability than multiple dimensions.
Design Improvements • Overloading coordinate zones • allow multiple nodes to share the same zone. Nodes that share the same zone are termed peers. • MAXPEERS, which is the maximum number of allowable peers per zone. • reduced path length (number of hops), and hence reduced path latency • improved fault tolerance • Multiple hash functions • Almost equal to multi realities.
Design Improvements • Topologically-sensitive construction of the CAN overlay network • CAN nodes are ordered with their round-trip-time to each of landmarks. • With m landmarks, m! such orderings are possible. • Every portion is assigned a landmark ordering. • a new node joins the CAN at a random point in that portion of the coordinate space associated with its landmark ordering.
Design Improvements • More Uniform Partitioning • Zone are split after comparing volume of its zone with those of its immediate neighbors in the coordinate space. • Zone with the largest volume is split. • we can see that without the uniform partitioning feature a little over 40% of the nodes are assigned to zones with volume V as compared to almost 90% with this feature and the largest zone volume drops from 8V to 2V . • Not surprisingly, the partitioning of the space further improves with increasing dimensions. • Caching and Replication techniques
Design Review • Following metrics were used to evaluate system performance: • Path length: the number of (application-level) hops required to route between two points in the coordinate space. • Neighbor-state:the number of CAN nodes for which an individual node must retain state. • Latency: we consider both the end-to-end latency of the total routing path between two points in the coordinate space and the per-hop latency, i.e., latency of individual application level hops obtained by dividing the end-to-end latency by the path length. • Volume: the volume of the zone to which a node is assigned that is indicative of the request and storage load a node must handle. • Routing fault tolerance: the availability of multiple paths between two points in the CAN. • Hash table availability: adequate replication of a (key,value) entry to withstand the loss of one or more replicas.
Design Review • The key design parameters affecting system performance are: • dimensionality of the virtual coordinate space: d • number of realities: r • number of peer nodes per zone: p • number of hash functions (i.e. number of points per reality at which a (key, value) pair is stored): k • use of the RTT-weighted routing metric • use of the uniform partitioning • Test system specification: • A system size of n=218 nodes ,Transit-Stub topology with delay of 100ms on intra-transit links, 10ms on stub-transit links and 1ms on intra-stub links (i.e. 100ms on links that connect two transit nodes, 10ms on links that connect a transit node to a stubnode and so forth). • Transit-stub models explicitly group vertices into domains, and reflect that grouping in the connectivity between vertices.
Bare bones: CAN that does not utilize most of our additional design featuresKnobs-on-full: CAN making full use of our added features (without the landmark ordering feature)
Related Work • Related Algorithms • Distance vector and Link State algorithms • These need widespread topological information. • CAN in other hand stores only less data. • Plaxton algorithm • Each node has n bit label divided into l levels. • Each level has width w = n/l. • Each node forwards a packet to a neighbor whose label matches the destination label in more digits.
Related Work • Algorithms with geographic routing. • ‘space’ in this algorithm refers to physical space. • No neighbor search problem. • Correctly mimic the space is a trivial problem • It is not extensible to multi dimension
Related System • Domain Name System • It stores (domain name, IP address). • Ocean Store • To provide continuous access to persistent information • Uses Plaxtons algorithm • Peer-to-Peer file sharing systems • Freenet • Stores Keys ( analogous URL ), address of other nodes, data corresponding to key.
Discussion • Addresses two key problems in the design of Content-Addressable Networks: scalable routing and indexing. • Simulation results validate the scalability of our overall design – for a CAN with over 260,000 nodes, we can route with a latency that is less than twice the IP path latency. • Future works • Secure CAN • Key word searching