Efficient Routing in P2P Networks: Pastry, PAST, and SCRIBE

Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors

Outline • Background • Pastry • Pastry proximity routing • PAST • SCRIBE • Conclusions

Common issues • Organize, maintain overlay network • Resource allocation/load balancing • Object / resource location • Network proximity routing Pastry provides a generic p2p substrate

Architecture Event notification Network storage ? P2p application layer P2p substrate (self-organizing overlay network) Pastry TCP/IP Internet

Pastry: Object distribution 2128-1 O • Consistent hashing • [Karger et al. ‘97] • 128 bit circular id space • * nodeIds(uniform random) • objIds (uniform random) • Invariant: node with numerically closest nodeId maintains object. (Recall Chord) objId nodeIds

Log 16 N rows Leaf set Routing table for node 65a1fc (b=4)

Pastry: Leaf sets • Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIds, respectively. • routing efficiency/robustness • fault detection (keep-alive) • application-specific local coordination

Pastry: Routing Properties log16 N steps O(log N) state d471f1 d467c4 Prefix routing d462ba d46a1c d4213f Route(d46a1c) d13da3 65a1fc

Pastry: Routing procedure if destination is within the range of leaf set then forward to numerically closest member else {let l = length of shared prefix} {let d = value of l-th digit in D’s address} if (Rld exists) (Rld = entry at column d row l) then forward to Rld else {rare case} forward to a known node that (a) shares at least as long a prefix, and (b) is numerically closer than this node

Pastry: Performance Integrity of overlay/ message delivery: • guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: • No failures: < log16N expected, 128/b + 1 max • During failure recovery: • O(N) worst case, loose upper bound, average case much better

Self-organization How are the routing tables and leaf sets initialized and maintained? • Node addition • Node departure (failure)

Pastry: Node addition d471f1 d467c4 d462ba d46a1c d4213f New node: d46a1c The new node X asks node 65a1fc to route a message to it. Nodes in the route share their routing tables with X Route(d46a1c) d13da3 65a1fc

Node departure (failure) Leaf set members exchange heartbeat • Leaf set repair (eager): request the set from farthest live node • Routing table repair (lazy): get table from peers in the same row, then higher rows

Pastry: Average # of hops L=16, 100k random queries

Pastry: Proximity routing Proximity metric = time delay estimated by ping A node can probe distance to any other node Each routing table entry uses a node close to the local node (in the proximity space), among all nodes with the appropriate node Id prefix.

d467c4 d471f1 d467c4 Proximity space d462ba d46a1c d4213f Route(d46a1c) d13da3 d4213f 65a1fc 65a1fc d462ba d13da3 NodeId space Pastry: Routes in proximity space

Pastry: Proximity routing Assumption: scalar proximity metric • e.g. ping delay, # IP hops • a node can probe distance to any other node Proximity invariant Each routing table entry refers to a node close to the local node (in the proximity space), among all nodes with the appropriate nodeId prefix.

Pastry: Distance traveled L=16, 100k random queries, Euclidean proximity space

k=4 fileId Insert fileId PAST: File storage Storage Invariant: File “replicas” are stored on k nodes with nodeIds closest to fileId (k is bounded by the leaf set size)

PAST: File Retrieval C k replicas Lookup file located in log16 N steps (expected) usually locates replica nearest to client C fileId

SCRIBE: Large-scale, decentralized multicast • Infrastructureto support topic-based publish-subscribe applications • Scalable: large numbers of topics, subscribers, wide range of subscribers/topic • Efficient: low delay, low link stress, low node overhead

SCRIBE: Large scale multicast topicId Publish topicId Subscribe topicId

Scribe: Results • Simulation results • Comparison with IP multicast: delay, node stress and link stress • Experimental setup • 100,000 nodes randomly selected out of .5M • Zipf-like subscription distribution, 1500 topics

Summary Self-configuring P2P framework for topic-based publish-subscribe • Scribe achieves reasonable performance when compared to IP multicast • Scales to a large number of subscribers • Scales to a large number of topics • Good distribution of load

Efficient Routing in P2P Networks: Pastry, PAST, and SCRIBE

Efficient Routing in P2P Networks: Pastry, PAST, and SCRIBE

Presentation Transcript

PASTRY

Pastry

Pastry

Pastry

Pastry Making

Sweeting pastry

Pastry Chef

Pastry

Pastry Chefs

Pastry Baking

PASTRY

Pastry

Pastry

Pastry

PASTRY

Pastry

Pastry

Pastry

PASTRY

Indian Pastry

PASTRY

Pastry