1 / 46

An Overview of Peer-to-Peer

An Overview of Peer-to-Peer. Yingwu Zhu. Outline. P2P Overview What is a peer? Example applications Benefits of P2P P2P Content Sharing Challenges Group management/data placement approaches Measurement studies. What is Peer-to-Peer (P2P)?. Napster? Gnutella?

rose-ramsey
Download Presentation

An Overview of Peer-to-Peer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Overview of Peer-to-Peer Yingwu Zhu

  2. Outline • P2P Overview • What is a peer? • Example applications • Benefits of P2P • P2P Content Sharing • Challenges • Group management/data placement approaches • Measurement studies

  3. What is Peer-to-Peer (P2P)? • Napster? • Gnutella? • Most people think of P2P as music sharing

  4. What is a peer? • Contrasted with Client-Server model • Servers are centrally maintained and administered • Client has fewer resources than a server

  5. What is a peer? • A peer’s resources are similar to the resources of the other participants • P2P – peers communicating directly with other peers and sharing resources

  6. Levels of P2P-ness • P2P as a mindset • Slashdot • P2P as a model • Gnutella • P2P as an implementation choice • Application-layer multicast • P2P as an inherent property • Ad-hoc networks

  7. P2P Application Taxonomy P2P Systems Distributed Computing SETI@home File Sharing Gnutella Collaboration Jabber Platforms JXTA

  8. P2P Goals/Benefits • Cost sharing • Resource aggregation • Improved scalability/reliability • Increased autonomy • Anonymity/privacy • Dynamism • Ad-hoc communication

  9. P2P File Sharing • Content exchange • Gnutella • File systems • Oceanstore • Application-level multicast • SplitStream

  10. P2P File Sharing Benefits • Cost sharing • Resource aggregation • Improved scalability/reliability • Anonymity/privacy • Dynamism

  11. Research Areas • Peer discovery and group management • Data location and placement • Reliable and efficient file exchange • Security/privacy/anonymity/trust

  12. Current Research • Group management and data placement • Chord, CAN, Tapestry, Pastry • Anonymity • Publius • Performance studies • Gnutella measurement study

  13. Management/Placement Challenges • Per-node state • Bandwidth usage • Search time • Fault tolerance/resiliency

  14. Approaches • Centralized • Flooding • Document Routing

  15. Centralized Bob Alice • Napster model • Benefits: • Efficient search • Limited bandwidth usage • No per-node state • Drawbacks: • Central point of failure • Limited scale Judy Jane

  16. Flooding Carl Jane • Gnutella model • Benefits: • No central point of failure • Limited per-node state • Drawbacks: • Slow searches • Bandwidth intensive Bob Alice Judy

  17. Document Routing 001 012 • FreeNet, Chord, CAN, Tapestry, Pastry model • Benefits: • More efficient searching • Limited per-node state • Drawbacks: • Limited fault-tolerance vs redundancy 212 ? 212 ? 332 212 305

  18. Document Routing – CAN • Associate to each node and item a unique id in an d-dimensional space • Goals • Scales to hundreds of thousands of nodes • Handles rapid arrival and failure of nodes • Properties • Routing table size O(d) • Guarantees that a file is found in at most d*n1/d steps, where n is the total number of nodes Slide modified from another presentation

  19. CAN Example: Two Dimensional Space • Space divided between nodes • All nodes cover the entire space • Each node covers either a square or a rectangular area of ratios 1:2 or 2:1 • Example: • Node n1:(1, 2) first node that joins  cover the entire space 7 6 5 4 3 n1 2 1 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  20. CAN Example: Two Dimensional Space • Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  21. CAN Example: Two Dimensional Space • Node n2:(4, 2) joins  space is divided between n1 and n2 7 6 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  22. CAN Example: Two Dimensional Space • Nodes n4:(5, 5) and n5:(6,6) join 7 6 n5 n4 n3 5 4 3 n2 n1 2 1 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  23. CAN Example: Two Dimensional Space • Nodes: n1:(1, 2); n2:(4,2); n3:(3, 5); n4:(5,5);n5:(6,6) • Items: f1:(2,3); f2:(5,1); f3:(2,1); f4:(7,5); 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  24. CAN Example: Two Dimensional Space • Each item is stored by the node who owns its mapping in the space 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  25. CAN: Query Example • Each node knows its neighbors in the d-space • Forward query to the neighbor that is closest to the query id • Example: assume n1 queries f4 • Can route around some failures • some failures require local flooding 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  26. CAN: Query Example • Each node knows its neighbors in the d-space • Forward query to the neighbor that is closest to the query id • Example: assume n1 queries f4 • Can route around some failures • some failures require local flooding 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  27. CAN: Query Example • Each node knows its neighbors in the d-space • Forward query to the neighbor that is closest to the query id • Example: assume n1 queries f4 • Can route around some failures • some failures require local flooding 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  28. CAN: Query Example • Each node knows its neighbors in the d-space • Forward query to the neighbor that is closest to the query id • Example: assume n1 queries f4 • Can route around some failures • some failures require local flooding 7 6 n5 n4 n3 f4 5 4 f1 3 n2 n1 2 f3 1 f2 0 0 2 3 4 6 7 5 1 Slide modified from another presentation

  29. Node Failure Recovery • Simple failures • know your neighbor’s neighbors • when a node fails, one of its neighbors takes over its zone • More complex failure modes • simultaneous failure of multiple adjacent nodes • scoped flooding to discover neighbors • hopefully, a rare event Slide modified from another presentation

  30. N5 N10 N110 K19 N20 N99 N32 N80 N60 Document Routing – Chord • MIT project • Uni-dimensional ID space • Keep track of log N nodes • Search through log N nodes to find desired key

  31. Doc Routing – Tapestry/Pastry 43FE 993E 13FE • Global mesh • Suffix-based routing • Uses underlying network distance in constructing mesh 73FE F990 04FE 9990 ABFE 239E 1290

  32. Comparing Guarantees Model Search State Chord Uni-dimensional log N log N Multi-dimensional CAN dN1/d 2d b logbN Tapestry Global Mesh logbN Pastry Neighbor map logbN b logbN + b

  33. Remaining Problems? • Hard to handle highly dynamic environments • Usable services • Methods don’t consider peer characteristics

  34. Measurement Studies • “Free Riding on Gnutella” • Most studies focus on Gnutella • Want to determine how users behave • Recommendations for the best way to design systems

  35. Free Riding Results • Who is sharing what? • August 2000

  36. Saroiu et al Study • How many peers are server-like…client-like? • Bandwidth, latency • Connectivity • Who is sharing what?

  37. Saroiu et al Study • May 2001 • Napster crawl • query index server and keep track of results • query about returned peers • don’t capture users sharing unpopular content • Gnutella crawl • send out ping messages with large TTL

  38. Results Overview • Lots of heterogeneity between peers • Systems should consider peer capabilities • Peers lie • Systems must be able to verify reported peer capabilities or measure true capabilities

  39. Measured Bandwidth

  40. Reported Bandwidth

  41. Measured Latency

  42. Measured Uptime

  43. Number of Shared Files

  44. Connectivity

  45. Points of Discussion • Is it all hype? • Should P2P be a research area? • Do P2P applications/systems have common research questions? • What are the “killer apps” for P2P systems?

  46. Conclusion • P2P is an interesting and useful model • There are lots of technical challenges to be solved

More Related