Improving Search in P2P Networks

Improving Search in P2P Networks By Shadi Lahham

Purpose of This Lecture • General understanding of P2P systems • Appreciating the need for efficient search • Applying different search techniques to different scenarios Improving P2P Search

P2P Basics What Is P2P Advantages of P2P Types of P2P Systems Shortcomings Search Methods The Search Problem Current Methods Suggested Methods Experimental Setup Metrics Data Collection Calculating Costs Analysis of Results Conclusions Table Of Contents Improving P2P Search

Introduction P2P Basics

What is P2P • Distributed system • Peers (nodes) are servers and clients simultaneously • Peers are of equal roles • Resources shared across peers • No central server needed • Examples of P2P system Improving P2P Search

Key File f1 file1 f2 file2 f3 file3 P2P Overview Improving P2P Search

Advantages of P2P • P2P vs. Centralized Servers • Distributes disk space / bandwidth • Inexpensively scalable • Self organized (autonomous) • Load balancing • Adaptative / fault tolerant • Less susceptible to attacks • Allows for redundancy Improving P2P Search

Types of P2P Systems • Hybrid ( napster ) • Pure ( gnutella ) • Super Peers ( kaZaA ) Improving P2P Search

Hybrid ( napster ) Improving P2P Search

Pure ( gnutella ) Improving P2P Search

Super Peers ( kaZaA ) • Make use of heterogeneity • Powerful peers serve as super peers • Weaker peers act as clients • Super-peers index clients’ files • Requires updates on join/leave/update • Queries handled at super-peer level • Saves query costs Improving P2P Search

Super Peers ( kaZaA ) Improving P2P Search

Hybrid - Shortcomings • High cost on centralized index • Performance & scalability bottleneck • Needs maintenance • Vulnerable ! Highly visible target Improving P2P Search

Pure - Shortcomings • Inefficient search (flooding) • Heterogeneity of peers not considered • Bottlenecks (limited peers) • Fragmentation Improving P2P Search

Super Peers - Shortcomings • Super nodes might become bottlenecks for clients • requires redundancy • Bad selection of supernodes might cause even worse problems Improving P2P Search

Search Methods

The Search Problem • Connected graph • Might contain cycles • Individual node doesn’t know structure • Only knows its neighbors • No idea where data can be found Improving P2P Search

The Search Problem • Goal : Find as many occurrences of the data using min time and resources • Solution : • BFS ? • Bounded BFS ? • (naive approaches) Improving P2P Search

Bounded BFS Search TTL=2 TTL=1 TTL=0 Improving P2P Search

Bounded BFS Search • Messages get a global TTL (time to live) • Algorithm • Source broadcasts a message to a subset of neighbors • Neighbors search locally . Results are sent to source if found • TTL = TTL – 1; • As long as TTL > 0 Nodes forward message to neighbors • Downside : wastes bandwidth / processing Improving P2P Search

Current Methods • Gnutella - BFS • High cost • Gets complete results ( for depth D) • Relatively short time • Freenet - DFS • Poor response time • Minimizes BW costs Improving P2P Search

Suggested Methods • Iterative deepening • Directed BFS • Local Indices Improving P2P Search

Iterative Deepening • Idea: • Search at a small depth and increase if required • Aims to minimize the cost of BFS without detracting from it’s ability to satisfy queries • Notice that given enough iterations this method returns %100 results of BFS Improving P2P Search

Iterative Deepening (cont…) • Elements : • Policies P={a,b,c,..} define deepening behavior • BFS is run to depth a and frozen • If source is satisfied it stops the process • Otherwise it asks BFS to resume to depth b • Process is repeated until source satisfied or we reach the last policy item Improving P2P Search

Iterative Deepening (cont…) • Elements : • We can specify how long to wait between iterations • We need a system-wide message ID to identify individual messages Improving P2P Search

Example P={1,3,4} W=1 Improving P2P Search

Directed BFS • Idea: • Choose a subset of neighbors to query • Neighbors will BFS as usual • Aims to provide a balance between good response time and results • Minimize costs of full BFS • Notice that only a subset of possible results are returned so we might fail to satisfy query Improving P2P Search

Directed BFS Example TTL=2 TTL=1 TTL=0 Improving P2P Search

Directed BFS (cont…) • But which neighbors to pick ?? • Maintain simple statistics on neighbors to derive heuristics • Highest past results • Lowest average hops • (close to nodes containing useful data) • High message count • (stable - can handle large flow) • Shortest message queue • (long implies saturation) • More to come … Improving P2P Search

Local Indices • Idea: • Nodes hold metadata of all nodes at radius r • Can process query at a few nodes, but get same number of results • Aims to balance satisfaction / costs Improving P2P Search

Local Indices • Elements: • Policies P={a,b,c,..} define the depths at which we search • Example P={1,5,6} • Nodes at depth 1 process the query • Nodes at depth 2,3,4 forward without processing • Policy ends at depth 6 • System-wide Radius r(small ~ 50K metadata ) Improving P2P Search

Example P={1,4} Process Don’t process r = ? Improving P2P Search

Local Indices (cont…) • Notice that now there is an overhead • On Join • Send join message of TTL = r • Direct Exchange of metadata • On leave / timeout • remove metadata of gone / dead nodes • On Update • Send update message of TTL = r Improving P2P Search

Experimental Setup

Metrics • How to compare methods ? • Costs • Results • Time Improving P2P Search

Metrics 1. Costs • We do not base cost on a specific query but rather calculate the average cost on Q rep , a representative set of real queries submitted • It makes sense to discuss costs in aggregate (i.e., over all the nodes in the network) • Therefore our two cost metrics are • Average aggregate bandwidth • Average aggregate processing cost Improving P2P Search

Metrics 2. Results Quality • Number of results • Satisfaction 3. Time to satisfaction Improving P2P Search

Data Collection • Data gathered from Gnutella network • Directly measured • Iterative deepening • Directed BFS • Performance data & analysis • Local indices Improving P2P Search

Data Collection Collected Data Improving P2P Search

Data Collection Extracted Data Improving P2P Search

Calculating Costs • We’ve seen two types of costs • Bandwidth (BW) costs • Processing costs • Calculations should take into account • Costs of sending a query • Costs of sending replies • A example of calculating BW costs Improving P2P Search

D BWbfs (Q) = ∑ ( a(Q) · (N(Q,n) + C(Q,n)) n=1 + n · ( c · R(Q,n) + d · M(Q,n)) Calculating Costs Improving P2P Search

Analysis of Results Iterative Deepening

Symbols Used Improving P2P Search

Results – Iterative Deepening • Recall that iterative deepening policies P={a,b,c,..} define deepening behavior • In order to have the same level of satisfaction as BFS a policy must have D as the last depth • Also note the degenerate case policy {D} which is the bounded BFS we presenter earlier Improving P2P Search

Results – Iterative Deepening • Variables • Define : Pd = { d , d+1 , … , D } P = { Pd for d = 1,2,…,D } = { {1,2,…D}, {2,3,…D},…, {D-1,…D},{D} } W (waiting time) can take the values 1,2,4,6,150 (seconds) Improving P2P Search

Results – Iterative Deepening • Fixed values Z = 50 , Ng = 8 • Increasing Z • Lower probability of satisfaction • Higher costs • More results • Decreasing Ng • Slightly Lower probability of satisfaction • Significantly Lower costs Improving P2P Search

Results – Iterative Deepening Improving P2P Search

Results – Iterative Deepening • BW costs same for P7 for all W’s • As d increases costs increase. the larger d is the more likely the policy will “overshoot” • As W decreases costs increase on a small W premature determination of un-satisfaction again leads to overshooting Improving P2P Search

Results – Iterative Deepening Improving P2P Search

Improving Search in P2P Networks

Improving Search in P2P Networks

Presentation Transcript

Overlay/P2P Networks

Improving Search

Unstructured P2P Networks

P2P Networks

Node Lookup in P2P Networks

P2P Networks (Continue)

P2P Networks Introduction

Improving Search in P2P Networks

Search and Replication in Unstructured P2P Networks

Improving Search in Peer-to-Peer Networks

P2P Concept Search

Node Lookup in P2P Networks

Improving Data Access in P2P Systems

Structured P2P Networks

Unstructured P2P Networks

P2P Networks

P2P Search

Search in P2P architecture

P2P Networks

P2P Networks Introduction