280 likes | 431 Views
Search and Replication in Unstructured Peer-to-Peer Networks. Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002. Outline. Brief survey of P2P architectures Evaluation Methodology Search Methods Replication Conclusions. Peer-to-Peer Networks.
E N D
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002
Outline • Brief survey of P2P architectures • Evaluation Methodology • Search Methods • Replication • Conclusions
Peer-to-Peer Networks • Peers are connected by an overlay network. • Users cooperate to share files (e.g., music, videos, etc.) • Dynamic: nodes join or leave frequently
P2P Network Architectures I • Centralized: • Use of central directory server (CDS) • Peers query to the CSD to find other peers that hold the desired object Pros: very efficient Cons: poorly scales single point of failure
P2P Network Architectures II • Decentralized: No central directory server • But structured: • P2P network topology is tightly controlled • Files are placed at specified locations • Unstructured: • No control in Network topology or file placement
P2P Network Architectures III Decentralized but Structured • “loose structured” • Placement of files is based on hints • “tight structure” • Precisely declare • structure of P2P network and • file placement • Use of distributed hash table Pros: Efficient satisfaction of queries Good scaling Cons: No proof it works
P2P Network Architectures IV Decentralized and Unstructured • Placement of files not based on topology knowledge • Finding files • Node queries neighbors (usually using flooding) Pros: extremely resilient to network changes Cons: extremely unscalable generates large loads
Evaluation Methodology I Terminology • Network Topology: instant graph formed by nodes in the network • Query Distribution: frequency of lookups to files • Replication Distribution: percentage of nodes that have a particular file
Evaluation Methodology II • Network Topologies • Powel-Law Random Graph (PLRG) • Max node degree: 1746, median: 1 average 4.46 • Normal Random Graph (Random) • Average and median node degree is 4 • Gnutella graph (Gnutella) • Oct 2000 snapshot • Max degree: 136, median: 2, average: 5.5 • Two-dimensional Grid • 100x100 10000 nodes
Evaluation Methodology III • Object query distribution qi • Uniform • Zipf-like • Object replication density distribution ri • Uniform • Proportional: ri qi • Square-Root: ri qi
Evaluation Methodology IV • Metrics • User aspects • Pr(success) • #hops • Load aspects • Average #messages per node • #nodes visited • Peak #messages
Limitation of Flooding I • Gnutella uses TTL to check #hops queries travel • Problem: • Hard to choose TTL: • For objects that are widely present in the network, small TTLs suffice • For objects that are rare in the network, large TTLs are necessary • Number of query messages grow exponentially as TTL grows
Limitation of Flooding II • Node may receive the same messages more than once • Need for duplication detection mechanisms • Still duplication increases as TTL increases in flooding
Limitation of Flooding Conclusion • Flooding increases per-node overhead • Need for more scalable search methods: • Expanding Ring • Random Walks
Expanding Ring • Adaptively Adjust TTL • Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds Still have duplicate messages
Random Walk • Simple random walk • Takes too long to find anything • Multiple-walker random walk • K walkers after each walking T steps visits as many nodes as 1 walker walking K*T steps • More messages more overhead • When to terminate the search: • TTL • Checking: check back with query originator once every C steps
Lessons Learned about Search Methods • Key: Cover the right number of nodes as quickly as possible and with as little overhead as possible • Pay Attention to • Adaptive termination • Minimize message duplication • Small expansion in each step
Replication • In unstructured P2P systems, search success is essentially about coverage: visiting enough nodes to find the object => replication density matters • Goal: minimize average search size (number of probes till query is satisfied) • Theoretical Optimal: copy everything everywhere • Limited node storage
Replication Strategies • Uniform Replication • pi = 1/m • Simple, resources are divided equally • Proportional Replication • pi = qi • “Fair”, resources per item proportional to demand • Reflects current P2P practices
Square-Root Replication • pi is proportional to square-root(qi) • Lies “In-between” Uniform and Proportional
Achieving Square-Root Replication I • Assuming that each query keeps track the number of probes needed • Store an object at a number of nodes that is proportional to the number of probes • Two implementations: • Path replication: store the object along the path of a successful “walk” • Random replication: store the object randomly among nodes visited by the agents
Evaluation of Replication Methods I • Metrics • Overall message traffic • Search delay • Dynamic simulation • Assume Zipf-like object query probability • 5 query/sec Poisson arrival • Results are during 5000sec-9000sec • Search method: 32-walkers random walk with state keeping and check every 4 steps
Evaluation of Replication Methods II Square-Root Replication reduces search traffic
Conclusions • Multi-walker random walk scales much better than flooding • Can find data more quickly • Reduces the traffic overload • Square-root replication distribution is desirable • Minimizes search delay • Minimizes the overall search traffic