1 / 66

Scalable peer-to-peer substrates: A new foundation for distributed applications?

Scalable peer-to-peer substrates: A new foundation for distributed applications?. Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge, UK Collaborators: Miguel Castro, Anne-Marie Kermarrec, MSR Cambridge

isolde
Download Presentation

Scalable peer-to-peer substrates: A new foundation for distributed applications?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalable peer-to-peer substrates: A new foundation for distributed applications? Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge, UK Collaborators: Miguel Castro, Anne-Marie Kermarrec, MSR Cambridge Y. Charlie Hu, Sitaram Iyer, Animesh Nandi, Atul Singh, Dan Wallach, Rice University

  2. Outline • Background • Pastry • Pastry proximity routing • PAST • SCRIBE • Conclusions

  3. Background Peer-to-peer systems • distribution • decentralized control • self-organization • symmetry (communication, node roles)

  4. Peer-to-peer applications • Pioneers: Napster, Gnutella, FreeNet • File sharing: CFS, PAST [SOSP’01] • Network storage: FarSite [Sigmetrics’00], Oceanstore [ASPLOS’00], PAST [SOSP’01] • Web caching: Squirrel[PODC’02] • Event notification/multicast: Herald [HotOS’01], Bayeux [NOSDAV’01], CAN-multicast [NGC’01], SCRIBE [NGC’01], SplitStream [submitted] • Anonymity:Crowds [CACM’99], Onion routing [JSAC’98] • Censorship-resistance: Tangler [CCS’02]

  5. Common issues • Organize, maintain overlay network • node arrivals • node failures • Resource allocation/load balancing • Resource location • Network proximity routing Idea: provide a generic p2p substrate

  6. Architecture Event notification Network storage ? P2p application layer P2p substrate (self-organizing overlay network) Pastry TCP/IP Internet

  7. Structured p2p overlays One primitive: route(M, X): route message M to the live node with nodeId closest to key X • nodeIds and keys are from a large, sparse id space

  8. Distributed Hash Tables (DHT) nodes k1,v1 k2,v2 k3,v3 P2P overlay network Operations: insert(k,v) lookup(k) k4,v4 k5,v5 k6,v6 • p2p overlay maps keys to nodes • completely decentralized and self-organizing • robust, scalable

  9. Why structured p2p overlays? • Leverage pooled resources (storage, bandwidth, CPU) • Leverage resource diversity (geographic, ownership) • Leverage existing shared infrastructure • Scalability • Robustness • Self-organization

  10. Outline • Background • Pastry • Pastry proximity routing • PAST • SCRIBE • Conclusions

  11. Chord [Sigcomm’01] CAN [Sigcomm’01] Tapestry [TR UCB/CSD-01-1141] PNRP [unpub.] Viceroy [PODC’02] Kademlia [IPTPS’02] Small World [Kleinberg ’99, ‘00] Plaxton Trees [Plaxton et al. ’97] Pastry: Related work

  12. Pastry: Object distribution • Consistent hashing[Karger et al. ‘97] • 128 bit circular id space • nodeIds(uniform random) • objIds (uniform random) • Invariant: node with numerically closest nodeId maintains object 2128-1 O objId nodeIds

  13. Pastry: Object insertion/lookup 2128-1 O Msg with key X is routed to live node with nodeId closest to X Problem: complete routing table not feasible X Route(X)

  14. Pastry: Routing Tradeoff • O(log N) routing table size • O(log N) message forwarding steps

  15. Pastry: Routing table (# 65a1fcx) Row 0 Row 1 Row 2 Row 3 log16 N rows

  16. Pastry: Routing Properties • log16 N steps • O(log N) state d471f1 d467c4 d462ba d46a1c d4213f Route(d46a1c) d13da3 65a1fc

  17. Pastry: Leaf sets • Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIds, respectively. • routing efficiency/robustness • fault detection (keep-alive) • application-specific local coordination

  18. Pastry: Routing procedure if (destination is within range of our leaf set) forward to numerically closest member else let l = length of shared prefix let d = value of l-th digit in D’s address if (Rld exists) forward to Rld else forward to a known node that (a) shares at least as long a prefix (b) is numerically closer than this node

  19. Pastry: Performance Integrity of overlay/ message delivery: • guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIds Number of routing hops: • No failures: < log16N expected, 128/b + 1 max • During failure recovery: • O(N) worst case, average case much better

  20. Pastry: Self-organization Initializing and maintaining routing tables and leaf sets • Node addition • Node departure (failure)

  21. Pastry: Node addition d471f1 d467c4 d462ba d46a1c d4213f New node: d46a1c Route(d46a1c) d13da3 65a1fc

  22. Node departure (failure) Leaf set members exchange keep-alive messages • Leaf set repair (eager): request set from farthest live node in set • Routing table repair (lazy): get table from peers in the same row, then higher rows

  23. Pastry: Experimental results Prototype • implemented in Java • emulated network • deployed testbed (currently ~25 sites worldwide)

  24. Pastry: Average # of hops L=16, 100k random queries

  25. Pastry: # of hops (100k nodes) L=16, 100k random queries

  26. 3 2.96 2.95 2.9 2.85 Average hops per lookup 2.8 2.74 2.75 2.73 2.7 2.65 2.6 No Failure Failure After routing table repair Pastry: # routing hops (failures) L=16, 100k random queries, 5k nodes, 500 failures

  27. Outline • Background • Pastry • Pastry proximity routing • PAST • SCRIBE • Conclusions

  28. Pastry: Proximity routing Assumption: scalar proximity metric • e.g. ping delay, # IP hops • a node can probe distance to any other node Proximity invariant: Each routing table entry refers to a node close to the local node (in the proximity space), among all nodes with the appropriate nodeId prefix. Locality-related route qualities: • Distance traveled • Likelihood of locating the nearest replica

  29. d467c4 d471f1 d467c4 Proximity space d462ba d46a1c d4213f Route(d46a1c) d13da3 d4213f 65a1fc 65a1fc d462ba d13da3 NodeId space Pastry: Routes in proximity space

  30. Pastry: Distance traveled L=16, 100k random queries, Euclidean proximity space

  31. Pastry: Locality properties 1) Expected distance traveled by a message in the proximity space is within a small constant of the minimum 2) Routes of messages sent by nearby nodes with same keys converge at a node near the source nodes 3) Among k nodes with nodeIds closest to the key, message likely to reach the node closest to the source node first

  32. d467c4 d471f1 d467c4 d462ba d46a1c d4213f Proximity space Route(d46a1c) d13da3 65a1fc d4213f New node: d46a1c 65a1fc NodeId space d462ba d13da3 Pastry: Node addition

  33. Pastry delay vs IP delay GATech top., .5M hosts, 60K nodes, 20K random messages

  34. Pastry: API • route(M, X): route message M to node with nodeId numerically closest to X • deliver(M): deliver message M to application • forwarding(M, X):message M is being forwarded towards key X • newLeaf(L): report change in leaf set L to application

  35. Pastry: Security • Secure nodeId assignment • Secure node join protocols • Randomized routing • Byzantine fault-tolerant leaf set membership protocol

  36. Pastry: Summary • Generic p2p overlay network • Scalable, fault resilient, self-organizing, secure • O(log N) routing steps (expected) • O(log N) routing table size • Network proximity routing

  37. Outline • Background • Pastry • Pastry proximity routing • PAST • SCRIBE • Conclusions

  38. PAST: Cooperative, archival file storage and distribution • Layered on top of Pastry • Strong persistence • High availability • Scalability • Reduced cost (no backup) • Efficient use of pooled resources

  39. PAST API • Insert - store replica of a file at k diverse storage nodes • Lookup - retrieve file from a nearby live storage node that holds a copy • Reclaim - free storage associated with a file Files are immutable

  40. PAST: File storage fileId Insert fileId

  41. k=4 fileId Insert fileId PAST: File storage Storage Invariant: File “replicas” are stored on k nodes with nodeIds closest to fileId (k is bounded by the leaf set size)

  42. PAST: File Retrieval C k replicas Lookup file located in log16 N steps (expected) usually locates replica nearest client C fileId

  43. PAST: Exploiting Pastry • Random, uniformly distributed nodeIds • replicas stored on diverse nodes • Uniformly distributed fileIds • e.g. SHA-1(filename,public key, salt) • approximate load balance • Pastry routes to closest live nodeId • availability, fault-tolerance

  44. PAST: Storage management • Maintain storage invariant • Balance free space when global utilization is high • statistical variation in assignment of files to nodes (fileId/nodeId) • file size variations • node storage capacity variations • Local coordination only (leaf sets)

  45. Experimental setup • Web proxy traces from NLANR • 18.7 Gbytes, 10.5K mean, 1.4K median, 0 min, 138MB max • Filesystem • 166.6 Gbytes. 88K mean, 4.5K median, 0 min, 2.7 GB max • 2250 PAST nodes (k = 5) • truncated normal distributions of node storage sizes, mean = 27/270 MB

  46. Need for storage management • No diversion (tpri = 1, tdiv = 0): • max utilization 60.8% • 51.1% inserts failed • Replica/file diversion (tpri = .1, tdiv = .05): • max utilization > 98% • < 1% inserts failed

  47. PAST: File insertion failures

  48. PAST: Caching • Nodes cache files in the unused portion of their allocated disk space • Files caches on nodes along the route of lookup and insert messages Goals: • maximize query xput for popular documents • balance query load • improve client latency

  49. PAST: Caching fileId Lookup topicId

  50. PAST: Caching

More Related