130 likes | 261 Views
Improving Search in Peer-to-Peer Networks. Beverly Yang Hector Garcia-Molina. Presented by Shreeram Sahasrabudhe (sas4@lehigh.edu). Goals. Basically - just trying to reduce nodes that handle a query. Three search techniques: Iterative Deepening Directed BFS Local Indices
E N D
Improving Search in Peer-to-Peer Networks Beverly Yang Hector Garcia-Molina Presented by Shreeram Sahasrabudhe (sas4@lehigh.edu)
Goals Basically - just trying to reduce nodes that handle a query. • Three search techniques: • Iterative Deepening • Directed BFS • Local Indices • Evaluation and extensive measurements of these techniques on the Gnutella network. • Ready-to-use results and recommendations.
Current Techniques • Gnutella –Breadth First Search (BFS) with depth limit D (typically 7). • Disadvantages • Wastage of resources • Inefficient • Freenet: Depth First Search (DFS) • Disadvantages • Poor Response Time
(TTL b-a) … Iterative Deepening • Required • System Wide policy P={a,b,c} • Time between successive iterations W. P = {a,b ,c} F r e e z e Resend [(TTL a) + query_id] S 1 a b Wait = W
Directed BFS • Send queries to a subset of nodes • Subset nodes selected by heuristics like : Select node … • That has highest number of results for provided queries • Whose response messages have taken lowest avg number of hops. • Who has forwarded most messages to our client • Who has the shortest messages queue
1 5 process process Local Indices • Each node n maintains an index of data for nodes within r hops • So a node can process a query on behalf of every node within r hops • small r = less storage. (e.g. for r(1)=70KB) P= {1,5} S 2 3 4
More work • Node Join • Sends join message with TTL of r, containing metadata over its collection • A node receiving a join messages sends a return join message with its metadata • Periodic refreshes • Cost ?? • QueryJoinRatio = Average ratio of queries to join messages • QueryUpdateRatio = Average ratio of queries to update messages
Experiment • Data Collection • Observed Gnutella network traffic for 1 month • Determined some general statistics like average number of files shared /user, query strings etc. • Iterative Deepening • For each query Q sent: log response message arriving in 2min. • Ping messages to all neighbors: hops and IP addr. • Same data used for Local Indices • Directed BFS • Same as above, but each query sent to single node.
Cost Nodes at depth N Size of query message Redundant edges between n-1 and n • Bandwidth Cost in BFS: • Processing Cost Size of Record Response messages from nodes n Total Records Size of header
Results • Iterative Deepening • Neighbors = 8 • Desired number of results Z=50 • Policies P={Pd = {d, d+1, … D} for d=1,2,3..D} COST • d = cost • W = cost • “overshooting” • W = time • d = time
Directed BFS • Studied 8 heuristics • ‘Random neighbor’ is baseline for comparison COST
Conclusions • Three new search systems specified and tested. • Recommend: Local Indices with r=1. Savings: 61% bandwidth 49% processing