1 / 28

Search and Replication in Unstructured Peer-to-Peer Networks

Search and Replication in Unstructured Peer-to-Peer Networks. Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002. Outline. Brief survey of P2P architectures Evaluation Methodology Search Methods Replication Conclusions. Peer-to-Peer Networks.

homer
Download Presentation

Search and Replication in Unstructured Peer-to-Peer Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002

  2. Outline • Brief survey of P2P architectures • Evaluation Methodology • Search Methods • Replication • Conclusions

  3. Peer-to-Peer Networks • Peers are connected by an overlay network. • Users cooperate to share files (e.g., music, videos, etc.) • Dynamic: nodes join or leave frequently

  4. P2P Network Architectures I • Centralized: • Use of central directory server (CDS) • Peers query to the CSD to find other peers that hold the desired object Pros: very efficient Cons: poorly scales single point of failure

  5. P2P Network Architectures II • Decentralized: No central directory server • But structured: • P2P network topology is tightly controlled • Files are placed at specified locations • Unstructured: • No control in Network topology or file placement

  6. P2P Network Architectures III Decentralized but Structured • “loose structured” • Placement of files is based on hints • “tight structure” • Precisely declare • structure of P2P network and • file placement • Use of distributed hash table Pros: Efficient satisfaction of queries Good scaling Cons: No proof it works

  7. P2P Network Architectures IV Decentralized and Unstructured • Placement of files not based on topology knowledge • Finding files • Node queries neighbors (usually using flooding) Pros: extremely resilient to network changes Cons: extremely unscalable generates large loads

  8. Evaluation Methodology I Terminology • Network Topology: instant graph formed by nodes in the network • Query Distribution: frequency of lookups to files • Replication Distribution: percentage of nodes that have a particular file

  9. Evaluation Methodology II • Network Topologies • Powel-Law Random Graph (PLRG) • Max node degree: 1746, median: 1 average 4.46 • Normal Random Graph (Random) • Average and median node degree is 4 • Gnutella graph (Gnutella) • Oct 2000 snapshot • Max degree: 136, median: 2, average: 5.5 • Two-dimensional Grid • 100x100  10000 nodes

  10. Evaluation Methodology III • Object query distribution qi • Uniform • Zipf-like • Object replication density distribution ri • Uniform • Proportional: ri qi • Square-Root: ri  qi

  11. Evaluation Methodology IV • Metrics • User aspects • Pr(success) • #hops • Load aspects • Average #messages per node • #nodes visited • Peak #messages

  12. Limitation of Flooding I • Gnutella uses TTL to check #hops queries travel • Problem: • Hard to choose TTL: • For objects that are widely present in the network, small TTLs suffice • For objects that are rare in the network, large TTLs are necessary • Number of query messages grow exponentially as TTL grows

  13. Limitation of Flooding II • Node may receive the same messages more than once • Need for duplication detection mechanisms • Still duplication increases as TTL increases in flooding

  14. Limitation of Flooding Conclusion • Flooding increases per-node overhead • Need for more scalable search methods: • Expanding Ring • Random Walks

  15. Expanding Ring • Adaptively Adjust TTL • Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds Still have duplicate messages

  16. Random Walk • Simple random walk • Takes too long to find anything • Multiple-walker random walk • K walkers after each walking T steps visits as many nodes as 1 walker walking K*T steps • More messages  more overhead • When to terminate the search: • TTL • Checking: check back with query originator once every C steps

  17. Search Traffic Comparison

  18. Search Delay Comparison

  19. Lessons Learned about Search Methods • Key: Cover the right number of nodes as quickly as possible and with as little overhead as possible • Pay Attention to • Adaptive termination • Minimize message duplication • Small expansion in each step

  20. Replication • In unstructured P2P systems, search success is essentially about coverage: visiting enough nodes to find the object => replication density matters • Goal: minimize average search size (number of probes till query is satisfied) • Theoretical Optimal: copy everything everywhere • Limited node storage

  21. Replication Strategies • Uniform Replication • pi = 1/m • Simple, resources are divided equally • Proportional Replication • pi = qi • “Fair”, resources per item proportional to demand • Reflects current P2P practices

  22. Square-Root Replication • pi is proportional to square-root(qi) • Lies “In-between” Uniform and Proportional

  23. Achieving Square-Root Replication I • Assuming that each query keeps track the number of probes needed • Store an object at a number of nodes that is proportional to the number of probes • Two implementations: • Path replication: store the object along the path of a successful “walk” • Random replication: store the object randomly among nodes visited by the agents

  24. Achieving Square-Root Replication II

  25. Evaluation of Replication Methods I • Metrics • Overall message traffic • Search delay • Dynamic simulation • Assume Zipf-like object query probability • 5 query/sec Poisson arrival • Results are during 5000sec-9000sec • Search method: 32-walkers random walk with state keeping and check every 4 steps

  26. Evaluation of Replication Methods II Square-Root Replication reduces search traffic

  27. Evaluation of Replication Methods III

  28. Conclusions • Multi-walker random walk scales much better than flooding • Can find data more quickly • Reduces the traffic overload • Square-root replication distribution is desirable • Minimizes search delay • Minimizes the overall search traffic

More Related