1 / 41

Brief Overview of Academic Research on P2P

Brief Overview of Academic Research on P2P. Pei Cao. Relevant Conferences. IPTPS (International Workshop on Peer-to-Peer Systems) ICDCS (IEEE Conference on Distributed Computer Systems) NSDI (USENIX Symposium on Network System Design and Implementation)

march
Download Presentation

Brief Overview of Academic Research on P2P

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brief Overview of Academic Research on P2P Pei Cao

  2. Relevant Conferences • IPTPS (International Workshop on Peer-to-Peer Systems) • ICDCS (IEEE Conference on Distributed Computer Systems) • NSDI (USENIX Symposium on Network System Design and Implementation) • PODC (ACM Symposium on Principles of Distributed Computing) • SIGCOMM

  3. Areas of Research Focus • Gnutella-Inspired • The “Directory Service” Problem • BitTorrent-Inspired • The “File Distribution” Problem • P2P Live Streaming • P2P and Net Neutrality

  4. Gnutella-Inspired Research Studies

  5. The Applications and The Problems • Napster, Gnutella, KaZaa/FastTrak, Skype • Look for a particular content/object, and find which peer has it  the “directory service” problem • Challenge: how to offer a scalable directory service in a fully decentralized fashion • Arrange direct transfer from the peer  the “punch a hole in the firewall” problem

  6. Decentralized Directory Services • Structured Networks • DHT (Distributed Hash Tables) • Very active research areas from 2001 to 2004 • Limitation: lookup by keys only • Multi-Attribute DHT • Limited support for query-based lookup • Unstructured Networks • Various improvements to basic flooding based schemes

  7. What Is a DHT? • Single-node hash table: key = Hash(name) put(key, value) get(key) -> value • How do I do this across millions of hosts on the Internet? • Distributed Hash Table

  8. Distributed Hash Tables • Chord • CAN • Pastry • Tapastry • Symphony • Koodle • etc.

  9. The Problem N2 N1 N3 Put (Key=“title” Value=file data…) Internet ? Client Publisher Get(key=“title”) N4 N6 N5 • Key Placement • Routing to find key

  10. Key Placement • Traditional hashing • Nodes numbered from 1 to N • Key is placed at node (hash(key) % N) • Why Traditional Hashing have problems

  11. Consistent Hashing: IDs • Key identifier = SHA-1(key) • Node identifier = SHA-1(IP address) • SHA-1 distributes both uniformly • How to map key IDs to node IDs?

  12. Key 5 K5 Node 105 N105 K20 Circular 7-bit ID space N32 N90 K80 Consistent Hashing: Placement A key is stored at its successor: node with next higher ID

  13. N120 N10 “Where is key 80?” N105 N32 “N90 has K80” N90 K80 N60 Basic Lookup

  14. “Finger Table” Allows log(N)-time Lookups ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80

  15. Finger i Points to Successor of n+2i N120 112 ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80

  16. Lookups Take O(log(N)) Hops N5 N10 N110 K19 N20 N99 N32 Lookup(K19) N80 N60

  17. Chord Lookup Algorithm Properties • Interface: lookup(key)  IP address • Efficient: O(log N) messages per lookup • N is the total number of servers • Scalable: O(log N) state per node • Robust: survives massive failures • Simple to analyze

  18. Related Studies on DHTs • Many variations of DHTs • Different ways to choose the fingers • Ways to make it more robust • Ways to make it more network efficient • Studies of different DHTs • What happens when peers leave aka churns? • Applications built using DHTs • Tracker-less BitTorrent • Beehive --- a P2P based DNS system

  19. Directory Lookups: Unstructured Networks • Example: Gnutella • Support more flexible queries • Typically, precise “name” search is a small portion of all queries • Simplicity • High resilience against node failures • Problems: Scalability • Flooding  # of messages ~ O(N*E)

  20. Flooding-Based Searches • Duplication increases as TTL increases in flooding • Worst case: a node A is interrupted by N * q * degree(A) messages 1 3 2 4 6 5 7 8 . . . . . . . . . . . .

  21. Problems with Simple TTL-Based Flooding • Hard to choose TTL: • For objects that are widely present in the network, small TTLs suffice • For objects that are rare in the network, large TTLs are necessary • Number of query messages grow exponentially as TTL grows

  22. Idea #1: Adaptively Adjust TTL • “Expanding Ring” • Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds • Success varies by network topology

  23. Idea #2: Random Walk • Simple random walk • takes too long to find anything! • Multiple-walker random walk • N agents after each walking T steps visits as many nodes as 1 agent walking N*T steps • When to terminate the search: check back with the query originator once every C steps

  24. Flexible Replication • In unstructured systems, search success is essentially about coverage: visiting enough nodes to probabilistically find the object => replication density matters • Limited node storage => what’s the optimal replication density distribution? • In Gnutella, only nodes who query an object store it => ri pi • What if we have different replication strategies?

  25. Optimal ri Distribution • Goal: minimize ( pi/ ri ), where  ri =R • Calculation: • introduce Lagrange multiplier , find ri and  that minimize: ( pi/ ri ) +  * ( ri - R) =>  - pi/ ri2 = 0 for all i => ri  pi

  26. Square-Root Distribution • General principle: to minimize ( pi/ ri ) under constraint  ri =R, make ri proportional to square root of pi • Other application examples: • Bandwidth allocation to minimize expected download times • Server load balancing to minimize expected request latency

  27. Achieving Square-Root Distribution • Suggestions from some heuristics • Store an object at a number of nodes that is proportional to the number of node visited in order to find the object • Each node uses random replacement • Two implementations: • Path replication: store the object along the path of a successful “walk” • Random replication: store the object randomly among nodes visited by the agents

  28. KaZaa • Use Supernodes • Regular Nodes : Supernodes = 100 : 1 • Simple way to scale the system by a factor of 100

  29. BitTorrent-Inspired Research Studies

  30. Modeling and Understanding BitTorrent • Analysis based on modeling • View it as a type of Gossip Algorithm • Usually do not model the Tit-for-Tat aspects • Assume perfectly connected networks • Statistical modeling techniques • Mostly published in PODC or SIGMETRICS • Simulation Studies • Different assumption of bottlenecks • Varying details of the modeling of the data transfer • Published in ICDCS and SIGCOMM

  31. Studies on Effect of BitTorrent on ISPs • Observation: P2P contributes to cross-ISP traffic • SIGCOMM 2006 publication on studies in Japan backbone traffic • Attempts to improve network locality of BitTorrent-like applications • ICDCS 2006 publicatoin • Academic P2P file sharing systems • Bullet, Julia, etc.

  32. Techniques to Alleviate the “Last Missing Piece” Problem • Apply Network Coding to pieces exchanged between peers • Pablo Rodriguez Rodriguez, Microsoft Research (recently moved to Telefonica Research) • Use a different piece-replication strategy • Dahlia Makhi, Microsoft Research • “On Collaborative Content Distribution Using Multi-Message Gossip” • Associate “age” with file segments

  33. Network Coding • Main Feature • Allowing intermediate nodes to encode packets • Making optimal use of the available network resources • Similar Technique: Erasure Codes • Reconstructing the original content of size n from roughly a subset of any n symbols from a large universe of encoded symbols

  34. Network Coding in P2P: The Model • Server • Dividing the file into k blocks • Uploading blocks at random to different clients • Clients (Users) • Collaborating with each other to assemble the blocks and reconstruct the original file • Exchanging information and data with only a small subset of others (neighbors) • Symmetric neighborhood and links

  35. Network Coding in P2P • Assume a node with blocks B1, B2, …, Bk • Pick random numbers C1, C2, …, Ck • Construct new block E = C1 * B1 + C2 * B2 + … + Ck * Bk • Send E and (C1, C2, …, Ck) to neighbor • Decoding: collect enough linearly independent E’s, solve the linear system • If all nodes pick vector C randomly, chances are high that after receiving ~K blocks, can recover B1 through Bk

  36. P2P Live Streaming

  37. Motivations • Internet Applications: • PPLive, PPStream, etc. • Challenge: QoS Issues • Raw bandwidth constraints • Example: PPLive utilizes the significant bandwidth disparity between “Univeristy nodes” and “Residential nodes” • Satisfying demand of content publishers

  38. P2P Live Streaming Can’t Stand on Its Own • P2P as a complement to IP-Multicast • Used where IP-Multicast isn’t enabled • P2P as a way to reduce server load • By sourcing parts of streams from peers, server load might be reduced by 10% • P2P as a way to reduce backbone bandwidth requirements • When core network bandwidth isn’t sufficient

  39. P2P and Net-Neutrality

  40. It’s All TCP’s Fault • TCP: per-flow fairness • Browsers • 2-4 TCP flows per web server • Contact a few web servers at a time • Short flows • P2P applications: • Much higher number of TCP connections • Many more endpoints • Long flows

  41. When and How to Apply Traffic Shaping • Current practice: application recognition • Needs: • An application ignostic way to trigger the traffic shaping • A clear statement to users on what happens

More Related