930 likes | 1.04k Views
Structured P2P Networks. Guo Shuqiao Yao Zhen Rakesh Kumar Gupta. CS6203 Advanced Topics in Database Systems. Introduction-P2P Network.
E N D
Structured P2P Networks Guo Shuqiao Yao Zhen Rakesh Kumar Gupta CS6203 Advanced Topics in Database Systems
Introduction-P2P Network • A peer-to-peer (P2P) network is a distributed system in which peersemploy distributed resources to perform a critical function in a decentralized fashion[LW2004] • Classification of P2P networks • Unstructured and Structured • Centralized and Decentralized • Hierarchical and Non-Hierarchical
Structured P2P network • Distributed hash table (DHT) • DHT is a structured overlay that offers extreme scalabilityand hash-table-like lookup interface • CAN, Chord, Pastry • Other techniques • Skip list • Skipgraph, SkipNet
Outline • Hashed based techniques in P2P • Hashed based structured P2P system • Pastry • P-Grid • Two important issues • Load balancing • Neighbor table consistency preserving • Comparison of DHT techniques • Skip-list based system • SkipNet • Conclusion
Outline • Hashed based techniques in P2P • Hashed based structured P2P system • Pastry • P-Grid • Two important issues • Load balancing • Neighbor table consistency preserving • Comparison of DHT techniques • Skip-list based system • SkipNet • Conclusion
Pastry[RD2001] • Pastry is a P2P object location and routing scheme • Hash-based • Properties • Completely decentralized • Scalable • Self-organized • Fault-resilient • Efficient search
Design of Pastry • nodeID: each node has a unique numeric identifier (128 bit) • Assigned randomly • Nodes with adjacent nodeIDs are diverse in geography, ownership, etc • Assumption: nodeID is uniform in the ID space • Presented as a sequence of digits with base 2b • b is a configuration parameter (4)
Design of Pastry (cont’) • Message/query has a numeric key of same length with nodeIDs • Key is presented as a sequence of digits with base 2b • Route: a message is routed to the node with a nodeID that is numerically closest to the key
Destination of Routing Message Key =10 20 31 23 03 12 12 Destination node
Pastry Schema • Given a message of key k, a node A forwards the message to a node whose ID is numerically closest to k among all nodes known to A • Each node maintains some routing state
NodeID 10233102 Leaf set SMALLER LARGER 10233033 10233020 10233120 10233122 10233001 10233000 10233230 10233232 Routing table -0-2212102 1 -2-2301203 -3-1203203 0 1-1-301233 1-2-230203 1-3-021022 10-0-31203 10-1-32102 2 10-3-23302 102-0-0230 102-1-1302 102-2-2302 3 1023-0-322 1023-1-000 1023-2-120 3 10233-0-01 1 10233-2-32 0 102331-2-0 2 Neighborhood set 13021022 10200230 11301233 31301233 02212102 22301203 31203203 33213321 Pastry Node State • A leaf set L • A routing table • A neighborhood set M
Meanings of ‘Close’ Closest according to proximity metric (real distance ) Nearest Neighbor 20 31 31 23 23 03 12 Closest according to numerical meaning Node with closet nodeID
Pastry Node State • A leaf set • |L| nodes with closest nodeIDs • |L|/2 larger ones and |L|/2 smaller ones • Useful in message routing • A neighborhood set • |M| nearest neighbors • Useful in maintaining locality properties
NodeID 10233102 Leaf set SMALLER LARGER 10233033 10233021 10233120 10233122 10233001 10233000 10233230 10233232 Routing table -0-2212102 1 -2-2301203 -3-1203203 0 1-1-301233 1-2-230203 1-3-021022 10-0-31203 10-1-32102 2 10-3-23302 102-0-0230 102-1-1302 102-2-2302 3 1023-0-322 1023-1-000 1023-2-120 3 10233-0-01 1 10233-2-32 0 102331-2-0 2 Neighborhood set 13021022 10200230 11301233 31301233 02212102 22301203 31203203 33213321 Leaf Set and Neighborhood Set A • In this example b=2, l=8 • |L| = 2 × 2b = 8 • |M| = 2 × 2b = 8 SMALLER LARGER
NodeID 10233102 Leaf set SMALLER LARGER 10233033 10233021 10233120 10233122 10233001 10233000 10233230 10233232 Routing table -0-2212102 1 -2-2301203 -3-1203203 0 1-1-301233 1-2-230203 1-3-021022 10-0-31203 10-1-32102 2 10-3-23302 102-0-0230 102-1-1302 102-2-2302 3 1023-0-322 1023-1-000 1023-2-120 3 10233-0-01 1 10233-2-32 0 102331-2-0 2 Neighborhood set 13021022 10200230 11301233 31301233 02212102 22301203 31203203 33213321 Routing Table A NodeID 10233102 • l rows and 2b columns • ith row: i-prefix • jth column: next digit after the prefix is j • b=2 l=8-> 8 rows and 4 columns j=0 j=3 j=1 2nd 10-0-31203 10-0-31203 10-1-32102 10-1-32102 10-3-23302 10-3-23302
NodeID 10233102 Leaf set SMALLER LARGER 10233033 10233021 10233120 10233122 10233001 10233000 10233230 10233232 Routing table -0-2212102 1 -2-2301203 -3-1203203 0 1-1-301233 1-2-230203 1-3-021022 10-0-31203 10-1-32102 2 10-3-23302 102-0-0230 102-1-1302 102-2-2302 3 1023-0-322 1023-1-000 1023-2-120 3 10233-0-01 1 10233-2-32 0 102331-2-0 2 Neighborhood set 13021022 10200230 11301233 31301233 02212102 22301203 31203203 33213321 Routing A • Step1: If k falls within the range of nodeIDs covered by A’s leaf set, forwarded it to a node in the leaf set whose nodeID is closest to k • Eg. k = 10233022 falls in the range (10233000,10233232) Forword it to node10233021 • If k is not covered by the leaf set, go to step2
NodeID 10233102 Leaf set SMALLER LARGER 10233033 10233021 10233120 10233122 10233001 10233000 10233230 10233232 Routing table -0-2212102 1 -2-2301203 -3-1203203 0 1-1-301233 1-2-230203 1-3-021022 10-0-31203 10-1-32102 2 10-3-23302 102-0-0230 102-1-1302 102-2-2302 3 1023-0-322 1023-1-000 1023-2-120 3 10233-0-01 1 10233-2-32 0 102331-2-0 2 Neighborhood set 13021022 10200230 11301233 31301233 02212102 22301203 31203203 33213321 Routing A • Step2: The routing table is used and the message is forwarded to a node whose ID shares a longer prefix with the k than A’s nodeID does • Eg. k = 10223220 forward it to node 10222302 102-2-2302 • If the appropriate entry in the routing table is empty, go to step3
NodeID 10233102 Leaf set SMALLER LARGER 10233033 10233021 10233120 10233122 10233001 10233000 10233230 10233232 Routing table -0-2212102 1 -2-2301203 -3-1203203 0 1-1-301233 1-2-230203 1-3-021022 10-0-31203 10-1-32102 2 10-3-23302 102-0-0230 102-1-1302 102-2-2302 3 1023-0-322 1023-1-000 1023-2-120 3 10233-0-01 1 10233-2-32 0 102331-2-0 2 Neighborhood set 13021022 10200230 11301233 31301233 02212102 22301203 31203203 33213321 Routing • Step3: The message is forwarded to a node in the leaf set, whose ID has the same shared prefix as A but is numerically closer to k than A • Eg. k = 10233320 A • If such a node does not exist, A is the destination node forward it to node10233232
Routing • The routing procedure always converges, since each step chooses a node that • Shares a longer prefix • Shares the same long prefix, but is numerically closer • Routing performance • The expected number of routing steps is log2bN • Assumption: accurate routing tables and no recent node failures
Performance Average number of routing hops versus number of Pastry nodes b = 4, |L| = 16, |M| =32 and 200,000 lookups.
Discussion of Pastry • Pastry: the parameters make it flexible • b is the most important parameter that determines the power of the system • Trade-off between the routing efficient (log2bN) and routing table size (log2bN×2b) • Each node can choose its own |L| and |M| based on the node situation
NodeID 10233102 Leaf set SMALLER LARGER 10233033 10233021 10233120 10233122 10233001 10233000 10233132 10233133 Routing table -0-2212102 1 -2-2301203 -3-1203203 0 1-1-301233 1-2-230203 1-3-021022 10-0-31203 10-1-32102 2 10-3-23302 102-0-0230 102-1-1302 102-2-2302 3 1023-0-322 1023-1-000 1023-2-120 3 10233-0-01 1 10233-2-32 0 102331-2-0 2 Neighborhood set 13021022 10200230 11301233 31301233 02212102 22301203 31203203 33213321 Discussion of Pastry – routing schema • Local optimal?? • Eg. k = 10233200 A X’ nodeID = 10233232 Y’ nodeID = 10233133 Dis(k, X’ID) = (10233200, 10233232) = 32 Dis(k, Y’ID) = (10233200, 10233133) = 1 Local optimal node is Y Pastry forward to node X
0 1 Virtual binary tree 00 01 10 11 1 6 2 3 4 5 1 :3 01:2 1 :5 01:2 1 :4 00:6 0 :2 11:5 0 :6 11:5 0 :6 10:4 P-Grid [Aberer2001] • P-Grid is a scalable access structure for P2P • Hash-based & virtual binary search tree • Randomized algorithms are used for constructing the access structure Query k=100 4
P-Grid (cont’) • Properties • Complete decentralized • Scalable with the total number of nodes and data items • Fault-resilient, search is robust against failures of nodes • Efficient search
Discussion of Pastry and P-Grid • The two system both make uniform assumption • Pastry: ID space • P-Grid: data distribution and behavior on peer If data/message/query distribution is skewed, Pastry and P-Grid are not able to balance the load
Outline • Hashed based techniques in P2P • Hashed based structured P2P system • Pastry • P-Grid • Two important issues • Load balancing • Neighbor table consistency preserving • Comparison of DHT techniques • Skip-list based system • SkipNet • Conclusion
Load Balancing • Consider a DHT P2P system with N nodes • Θ(logN) imbalance factor if items IDs are uniformly distributed [SMKKB2001] • Even worse if applications associate semantics with the item IDs • IDs would no longer be uniformly distributed • How to • Minimize the load imbalance? • Minimize the amount of load moved?
Load Balancing • Challenges • Data items are continuously inserted/deleted • Nodes join and depart continuously • The distribution of data item IDs and item sizes can be skewed • Solution—[GLSKS2004]
Load Balancing • Virtual server • Represents a peer in the DHT rather than physical node • A physical node hosts one or more virtual server • Total load of virtual servers = load of node • E.g., in Chord Virtual Server Node: Physical Node 0 FT1 1 7 2 6 5 3 FT3 4
Load Balancing • Basic idea • Directories • To store load information of the peer nodes • Periodically schedule reassignments of virtual servers Distributed load balancing problem reduced to Centralized problem at each directory
Load Balancing Node • Load balancing algorithm Directory ID (known to all nodes) Randomly chooses a directory directory in new cycle OR utilization>Ke Receives information from nodes Computes a schedule of virtual server transfers among nodes contacting it in order to reduce their maximal utilization yes Send to directory:(1)Loads of all virtual servers that it is responsible for (2)Capacity Emergency load balancing Delay T time
Load Balancing • Load balancing algorithm (cont.) • Computing optimal reassignment is NP-complete • Greedy algorithm O(mlogm) • For each heavily loaded node, move the least loaded virtual server to pool • For each virtual server in pool, from heaviest to lightest, assign to a node n which minimizes the resulting load
Load Balancing • Performance • Tradeoff: Load movement vs. Load balancing • Load balancing: max node utilization • When T decreases • Max node utilization decreases • Load movement increases • Effective in achieving load balancing for • System utilization as high as 90% • Only transfer 8% of the load that arrives in the system • Emergency load balancing is necessary
Consistency Preserving • Neighbor table • A table of neighbor pointers • For efficient routing in a P2P system • Challenge • How to maintain consistent neighbor tables in a dynamic network where nodes may join, leave and fail concurrently and frequently?
Consistency Preserving • Consistent network • For every entry in neighbor tables, if there exists at least one qualified node in the network, then the entry stores at least one qualified node • Qualified node for an entry of a node’s neighbor table: the node whose ID has suffix same as the required suffix of that entry • Otherwise, the entry is empty
Consistency Preserving • K-consistent network • For every entry in neighbor tables, if there exist H qualified nodes in the network, then the entry stores at least min(K,H) qualified nodes • Otherwise, the entry is empty • For K>0, K-consistency => consistency • 1-consistency = consistency
Consistency Preserving • General strategy • Identify a consistent subnet as large as possible • Only replace a neighbor with a closer one if both of them belong to the subnet • Expand the consistent subnet after new nodes join • Maintain consistency of the subnet when nodes fail
Consistency Preserving • Approach of [LL2004b] • To design a join protocol such that • An initially K-consistent network remains K-consistent after a set of nodes join process terminate • The termination of join implies the node joined belong to this consistent subnet • To design a failure recovery protocol that • Recovers K-consistency of the subnet by repairing holes left by failed neighbors with qualified nodes in the subnet • Protocol is presented in the paper [LL2004a], but integrated with join in experiment of this paper
Consistency Preserving • Join protocol • Each node has a status • copying, waiting, notifying, cset_waiting, in_system • S-node: node in status in_system • T-node: otherwise • All S-nodes form a consistent subnet
Consistency Preserving Copy neighbor infor from S-nodes to fill in most entries of its table level by level. copying When cannot find a qualified S-node for a level i>=1 Try to find an S-node which shares at least the rightmost i-1 with x and stores x as a neighbor waiting When find such a node, say y Seek and notify nodes that share the rightmost j digits with it, where j is the lowest level that x is stored in y’s table notifying When finish notifying Wait for the nodes joining currently and are likely to be in the same consistent subnet cnet_wating When confirm all nodes have exited notifying status in_system
Consistency Preserving • Performance • p-ratio • In x’s table, the primary-neighbor of the entry is y, the true primary-neighbor should be z • p-ratio = delay from x to y / delay from x to z • K-consistency is always maintained in all experiments • When K increases, p-ratio decreases • More neighbor infor is stored => more messages • Even with massive joins and failures, tables are still optimized greatly
Outline • Hashed based techniques in P2P • Hashed based structured P2P system • Pastry • P-Grid • Two important issues • Load balancing • Neighbor table consistency preserving • Comparison of DHT techniques • Skip-list based system • SkipNet • Conclusion
Comparing DHTs [DGPR2003] • Each DHT Algorithm has many details making it difficult to compare. We will use a component-base analysis approach • Break DHT design into independent components • Analyze impact of each component choice separately • Two types of components • Routing-level : neighbor & route selection • System-level : caching, replication, querying policy, latency
Metrics Used • Metrics used in comparison • Flexibility – Options in choosing neighbors and routes • Resilience – Does it route when nodes goes down ? • Load balancing – Is the content distributed ? • Proximity & Latency – Is the content stored nearby ? • Aspects of DHT • Geometry - a structure that inspires a DHT design, • Distance function –distance between two nodes • Algorithm: rules for selecting neighbors and routes using the distance function
Algorithm & Geometry • What is routing algorithm & geometry ? • Routing Algorithm – refers to exact rules for selecting neighbors, routes. (eg. Chord, CAN, PRR, Tapestry, Pastry) • Geometries – refers to the algorithms’ underlying structure derived from the way in which neighbors and routes are chosen. (Eg. Chord routes on a ring). • Why is geometry important ? Geometry capture flexibility in selection of neighbors and routes. • Neighbor selection – Does the geometry choose neighbors based on proximity ? Leads to shorter paths. • Route selection – Number of options for selecting next hops. Leads to shorter, reliable paths.
011 111 010 110 0 7 1 001 101 6 2 000 100 5 3 4 root root 0 0 1 1 00 00 01 01 10 10 11 11 DHT Algorithms Analysis • The table summarizes the geometries & algorithms. • We will examine the metric flexibility in these two aspects • Flexibility in neighbor selection • Flexibility in route selection
root 0 1 00 01 10 11 Tree Geometry • PRR uses tree geometry. • Distance between two nodes is the depth of the binary tree (Well-balanced tree : log N) • Node selection flexibility - has 2(i-1) options of choosing neighbor at distance i. • No routing flexibility Height = 2 Leafset Height = 1
011 111 010 110 001 101 000 100 Hypercube Geometry • CAN uses a d-torus hypercube. • Each node has log n neighbor. • Routing greedily by correcting bits inany order. • Neighbors differ by exactly one bit.No flexibility in choosing neighbors. • Routing from source to destination at log n distance. First node has log n next hop choices, second hop has log (n – 1) choices. Hence (log n)! choices
Butterfly Geometry • Viceroy uses butterfly geometry. • Nodes organized in a series of log n “stages” where all the nodes at stage i are capable of correcting the ithbit. • Routing consists of 3 phases. Done in O(log N) hops • No flexibility in route selection and neighbor selection.
0 7 1 6 2 5 3 4 Ring Geometry • Chord uses the Ring • Maintain log n neighbors and routes to arbitrary destination in log n hops. Routing in O(log n) hops • Flexibility in neighbor selection, has 2(i-1) possible options to pick its ith neighborAn approx of nlog n / 2 possible routing tables for each node • Yields (log n)! possible routes to route from a source to destination of distance log n.