700 likes | 896 Views
Improving Search in P2P Networks. By Shadi Lahham. Purpose of This Lecture. General understanding of P2P systems Appreciating the need for efficient search Applying different search techniques to different scenarios. P2P Basics What Is P2P Advantages of P2P Types of P2P Systems
E N D
Improving Search in P2P Networks By Shadi Lahham
Purpose of This Lecture • General understanding of P2P systems • Appreciating the need for efficient search • Applying different search techniques to different scenarios Improving P2P Search
P2P Basics What Is P2P Advantages of P2P Types of P2P Systems Shortcomings Search Methods The Search Problem Current Methods Suggested Methods Experimental Setup Metrics Data Collection Calculating Costs Analysis of Results Conclusions Table Of Contents Improving P2P Search
Introduction P2P Basics
What is P2P • Distributed system • Peers (nodes) are servers and clients simultaneously • Peers are of equal roles • Resources shared across peers • No central server needed • Examples of P2P system Improving P2P Search
Key File f1 file1 f2 file2 f3 file3 P2P Overview Improving P2P Search
Advantages of P2P • P2P vs. Centralized Servers • Distributes disk space / bandwidth • Inexpensively scalable • Self organized (autonomous) • Load balancing • Adaptative / fault tolerant • Less susceptible to attacks • Allows for redundancy Improving P2P Search
Types of P2P Systems • Hybrid ( napster ) • Pure ( gnutella ) • Super Peers ( kaZaA ) Improving P2P Search
Hybrid ( napster ) Improving P2P Search
Pure ( gnutella ) Improving P2P Search
Super Peers ( kaZaA ) • Make use of heterogeneity • Powerful peers serve as super peers • Weaker peers act as clients • Super-peers index clients’ files • Requires updates on join/leave/update • Queries handled at super-peer level • Saves query costs Improving P2P Search
Super Peers ( kaZaA ) Improving P2P Search
Hybrid - Shortcomings • High cost on centralized index • Performance & scalability bottleneck • Needs maintenance • Vulnerable ! Highly visible target Improving P2P Search
Pure - Shortcomings • Inefficient search (flooding) • Heterogeneity of peers not considered • Bottlenecks (limited peers) • Fragmentation Improving P2P Search
Super Peers - Shortcomings • Super nodes might become bottlenecks for clients • requires redundancy • Bad selection of supernodes might cause even worse problems Improving P2P Search
The Search Problem • Connected graph • Might contain cycles • Individual node doesn’t know structure • Only knows its neighbors • No idea where data can be found Improving P2P Search
The Search Problem • Goal : Find as many occurrences of the data using min time and resources • Solution : • BFS ? • Bounded BFS ? • (naive approaches) Improving P2P Search
Bounded BFS Search TTL=2 TTL=1 TTL=0 Improving P2P Search
Bounded BFS Search • Messages get a global TTL (time to live) • Algorithm • Source broadcasts a message to a subset of neighbors • Neighbors search locally . Results are sent to source if found • TTL = TTL – 1; • As long as TTL > 0 Nodes forward message to neighbors • Downside : wastes bandwidth / processing Improving P2P Search
Current Methods • Gnutella - BFS • High cost • Gets complete results ( for depth D) • Relatively short time • Freenet - DFS • Poor response time • Minimizes BW costs Improving P2P Search
Suggested Methods • Iterative deepening • Directed BFS • Local Indices Improving P2P Search
Iterative Deepening • Idea: • Search at a small depth and increase if required • Aims to minimize the cost of BFS without detracting from it’s ability to satisfy queries • Notice that given enough iterations this method returns %100 results of BFS Improving P2P Search
Iterative Deepening (cont…) • Elements : • Policies P={a,b,c,..} define deepening behavior • BFS is run to depth a and frozen • If source is satisfied it stops the process • Otherwise it asks BFS to resume to depth b • Process is repeated until source satisfied or we reach the last policy item Improving P2P Search
Iterative Deepening (cont…) • Elements : • We can specify how long to wait between iterations • We need a system-wide message ID to identify individual messages Improving P2P Search
Example P={1,3,4} W=1 Improving P2P Search
Directed BFS • Idea: • Choose a subset of neighbors to query • Neighbors will BFS as usual • Aims to provide a balance between good response time and results • Minimize costs of full BFS • Notice that only a subset of possible results are returned so we might fail to satisfy query Improving P2P Search
Directed BFS Example TTL=2 TTL=1 TTL=0 Improving P2P Search
Directed BFS (cont…) • But which neighbors to pick ?? • Maintain simple statistics on neighbors to derive heuristics • Highest past results • Lowest average hops • (close to nodes containing useful data) • High message count • (stable - can handle large flow) • Shortest message queue • (long implies saturation) • More to come … Improving P2P Search
Local Indices • Idea: • Nodes hold metadata of all nodes at radius r • Can process query at a few nodes, but get same number of results • Aims to balance satisfaction / costs Improving P2P Search
Local Indices • Elements: • Policies P={a,b,c,..} define the depths at which we search • Example P={1,5,6} • Nodes at depth 1 process the query • Nodes at depth 2,3,4 forward without processing • Policy ends at depth 6 • System-wide Radius r(small ~ 50K metadata ) Improving P2P Search
Example P={1,4} Process Don’t process r = ? Improving P2P Search
Local Indices (cont…) • Notice that now there is an overhead • On Join • Send join message of TTL = r • Direct Exchange of metadata • On leave / timeout • remove metadata of gone / dead nodes • On Update • Send update message of TTL = r Improving P2P Search
Metrics • How to compare methods ? • Costs • Results • Time Improving P2P Search
Metrics 1. Costs • We do not base cost on a specific query but rather calculate the average cost on Q rep , a representative set of real queries submitted • It makes sense to discuss costs in aggregate (i.e., over all the nodes in the network) • Therefore our two cost metrics are • Average aggregate bandwidth • Average aggregate processing cost Improving P2P Search
Metrics 2. Results Quality • Number of results • Satisfaction 3. Time to satisfaction Improving P2P Search
Data Collection • Data gathered from Gnutella network • Directly measured • Iterative deepening • Directed BFS • Performance data & analysis • Local indices Improving P2P Search
Data Collection Collected Data Improving P2P Search
Data Collection Extracted Data Improving P2P Search
Calculating Costs • We’ve seen two types of costs • Bandwidth (BW) costs • Processing costs • Calculations should take into account • Costs of sending a query • Costs of sending replies • A example of calculating BW costs Improving P2P Search
D BWbfs (Q) = ∑ ( a(Q) · (N(Q,n) + C(Q,n)) n=1 + n · ( c · R(Q,n) + d · M(Q,n)) Calculating Costs Improving P2P Search
Analysis of Results Iterative Deepening
Symbols Used Improving P2P Search
Results – Iterative Deepening • Recall that iterative deepening policies P={a,b,c,..} define deepening behavior • In order to have the same level of satisfaction as BFS a policy must have D as the last depth • Also note the degenerate case policy {D} which is the bounded BFS we presenter earlier Improving P2P Search
Results – Iterative Deepening • Variables • Define : Pd = { d , d+1 , … , D } P = { Pd for d = 1,2,…,D } = { {1,2,…D}, {2,3,…D},…, {D-1,…D},{D} } W (waiting time) can take the values 1,2,4,6,150 (seconds) Improving P2P Search
Results – Iterative Deepening • Fixed values Z = 50 , Ng = 8 • Increasing Z • Lower probability of satisfaction • Higher costs • More results • Decreasing Ng • Slightly Lower probability of satisfaction • Significantly Lower costs Improving P2P Search
Results – Iterative Deepening Improving P2P Search
Results – Iterative Deepening • BW costs same for P7 for all W’s • As d increases costs increase. the larger d is the more likely the policy will “overshoot” • As W decreases costs increase on a small W premature determination of un-satisfaction again leads to overshooting Improving P2P Search
Results – Iterative Deepening Improving P2P Search