1 / 43

Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

“Hybrid Search Schemes for Unstructured Peer-to-Peer Networks” “Random Walks in Peer-to-Peer Networks”. Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007. “Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”.

yin
Download Presentation

Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28 th , 2007

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Hybrid Search Schemes for Unstructured Peer-to-Peer Networks”“Random Walks in Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi Presented by Paul Bogdan February 28th, 2007

  2. “Hybrid Search Schemes for Unstructured Peer-to-Peer Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi

  3. Outline • Random Graph Models • Flooding and Normalization • Random Walks and Replication • Generalized Search Schemes • Experimental evaluation

  4. Motivation • Flooding + small time-to-live (TTL) performs well in regular graphs • Performance metric: number of exchanged messages/distinct response • Its performance decreases: when TTL increases or for irregular networks • Random Walk performs better than flooding • scalability, granularity • Hybrid + Generalized search schemes: • Random Walks with lookahead, Random Walks with 1-step replication

  5. Contribution • Random walks (RW) with shallow flooding offer good performance (analytic justification) R1: In a random graph model with O(n) nodes of constant degree and O(n1/2) nodes of degree O(n1/2) the expected time to discover Ω(n) is O(n1/2). R2: Random Walks with look-ahead 1 or 1-step replication perform better when there is discrepancy on the degrees of the underlying topology. • Normalized Flooding (NF) solution R3: NF achieves comparable performance to flooding in regular graphs. R4: NF with 1-step replication achieves performance comparable to RW with 1-step replication. R5: Local information of the network (nodes degree) offers global benefit. • Generalized Search Schemes

  6. Random Graph Models • Random Regular Graphs – Gn,d Gn,d represents a graph with n nodes and each node is of degree d. Gn,d has a sum of degree D = nd . • Random Graphs with super-nodes - Gn,d,α,β Given α and βconstants, Gn,d,α,βdenotes a graphs with αn1/2 of degreeβn1/2 (i.e. large vertices) and the remaining nodes of degree d (i.e. small vertices). Gn,d,α,βhas a sum of degree D = (αβ+d)n.

  7. Flooding and Normalization • Theorem 3.1.: Let us consider Gn,drandom regular graph, flooding scenario from node v with time-to-live τ, S – the number of distinct nodes queried by flooding with |S| ≤ |V| / 2 Claims: (1) (2) (3)

  8. (1) • Proof:

  9. (2) • Proof:

  10. (3) • Proof:

  11. Theorem 3.2.: Let Gn,d,α,β be a random graph with supernodes and a flooding scenario from node v of degree d with time-to-live τ. Claim: For some τ = O(log log n), the number of distinct responses isΩ(n). Proof: Consider flooding with τ = c logd-1(log n)+1 and vertices visited with TTL τ-1. Assumption: this set (of visited nodes) doesn’t contain a large degree vertex. From d-regular graphs we know that this set contains at least (d - 1)τ-1 edges. The probability that no vertex in Γ(Sτ-1(v)) is bounded by (d/(d+αβ))(d - 1)^(τ-1) = (d/(d+αβ))clog n so within the first O(loglog n) steps we see a large vertex. Flooding and Normalization

  12. Flooding and Normalization • Theorem 3.3. : Let Gn,d,α,β be a random graph with supernodes, a normalized flooding scenario from node v with TTL . Then the number of distinct responses is Ω((d - 1)τ-1) and the number of messages per response is O(1). Proof: From Theorem 3.1. the number of minigroups seen is (d - 1)τ-1 The expected number of small vertices is Q = (d *(d - 1)τ-1)/(d+αβ) LetXi, i = 1,…,N be random variables with P[ Xi=1]=pi and P[Xi=0]=1-pi Using the above Chernoff bound the probability that less than Q/2 are seen is vanishingly small.

  13. Random Walks and Replication • Random Walk with Look-Ahead: • a random walk with shallow flooding on each step of the walk • RW with lookahead 1 visits Ω(n) nodes with response O(n^(1/2)) • Theorem 4.2.: Let Gn,d,α,β be a random graph with supernodes and consider a random walk from a node v. Then, in 1-step replication scenario, the expected number of messages and response time to obtain distinct responses is

  14. Theorem 4.3.: Let Gn,d,α,β be a random graph with supernodes and consider Normalized flooding from v with TTL τ≈ (log n)/(2*log(d-1)). Then, in 1-step replication scenario, the number of distinct responses is at least and the number of messages is at most Proof: The number of minigroups seen is(d - 1)τ– 1 and using the Chernoff bounds there will be minigroups corresponding to large vertices.

  15. Generalized Search Schemes • Searching procedure: • A node of degreedinitiates a search based on a budgetk budget = number of messages that are propageted in the network • Among its d neighbors the node picks certain quantities k1,k2,…,kd such that k1 + k2 + … + kd = k • For every neighbor i the master node forwards the message with budget ki (forki = 0 the message is not transmitted) • Each neighbor i reduces the budget by 1 unit and repeat the process until the budget is greater than 0 • Every node that receives the message for the second yime from another neighbor forwards the message with the corresponding budget • Random Walks + Flooding

  16. Experimental Evaluation • Methodology • Performance Metrics • Median and Mean number of distinct peers discovered (hits) • Minimum, Maximum, Standard Deviation of the number of hits • Number of messages • Granularity of number of messages • Response time • Topologies • Random d-Regular Graphs • Power Law Graphs • Bimodal topologies • Clustered topologies

  17. Normalized Flooding (NF) • Mean number of unique peers discovered as a function of the initial TTL • NF and Standard Flooding behave similarly in Regular Graphs • NF controls the number of messages and provides higher efficiency

  18. Normalized Flooding (NF) • The number of unique peers increases exponentially with TTL in NF case • The number of peers increases faster than exponentially with TTL in topologies with high degrees

  19. Random Walk with 1-step replication

  20. Random Walk with LookAhead (RWLA) • RWLA performance is similar to long RW without lookahead (in terms of unique peers discovered) • RWLA response time is much smaller compared to standard RW

  21. Edge Criticality & Searching with weights • Generalized Searching performs similarly to Standard Flooding in regular graphs • Generalized Searching behaves similarly to Standard Flooding in other topologies if normalized edge criticality is used.

  22. Conclusions • Normalized Flooding (NF) could substitute the Standard Flooding in irregular graphs • RW with 1-step replication performs better than RW and NF in irregular graphs • Open for improvements: • Generalized schemes (analytic investigation) • Quantifying Directional flooding

  23. “Random Walks in Peer-to-Peer (P2P) Networks” Christos Gkantsidis, Milena Mihail, Amin Saberi

  24. Outline • Motivation • Statistical Estimation and Random Walks (RW) • Searching • Methodology and Topologies importance • Construction and Summary

  25. Motivation • Random Walks (RW) were proposed for constructing searching and topology maintenance protocols in P2P networks • RW improve searching performance as compared to flooding (Cao et al., 2002) • A RW approach to constructing and maintaining unstructured topologies provides good connectivity properties (i.e. constant degree, constant expansion) • Claim: RW approach is a good candidate • to simulate uniform sampling • the number of simulation steps required can be as low as the number of samples in independent uniform sampling • Searching and Overlay Topology Construction • RW searching performs better than flooding for the same number of messages and for cluster and slow dynamic topologies • Construction of P2P networks by random walks

  26. Statistical Estimation & Random Walks • Coupon collection and Chernoff bounds • n - type of coupons & each time one is drawn (uniformly distributed) • Tn - time by which we extracted coupons belonging to all n types • Tαn - time by which we encountered αn distinct types, 0 < α < 1 • X1,…,Xk independent Bernoulli trials, P[Xi=1]=piand P[Xi=0]=1-pi • p -probability that a random drawn object has a particular property • the probability that the property is found in substantially fewer draws than its frequency in the search space and the quality of the estimator X/k are bounded by

  27. Statistical Estimation & Random Walks • Random Walks (RW), Convergence and Cover Time • G = (V,E) undirected graph, |V| = n, and di- degree of vertex I • Aij -adjacency matrix, P -transition matrix which satisfies • f: V→{0,1} which satisfies • Convergence rate metric - the rate at which the RW approaches the stationary distribution • Cover time metric - the time by which all nodes were visited • Trajectory sample average - the rate at which the value of f averaged over successive vertices of the RW trajectory approaches p

  28. Statistical Estimation & Random Walks • Convergence rate is related to the second eigenvalue of P (1) • yt – the vertex that the RW visited at time t • Cover time (2) • Trajectory sample average (3) (1) :[ 11], (2) :[ 12, 13] , (3) :[ 3, 4, 5, 6]

  29. Statistical Estimation & Random Walks • Second Eigenvalue, Expansion and Conductance • S subset of V, C(S) cutset of V (i.e. edges with one point in S and the other one in V\S), vol(S) (i.e. the sum of degrees of vertices in S) • Expansion • Conductance • Known bound [ 11, 14, 15, 16, 17, 18, 19]

  30. Searching • Performance metrics for Flooding and RW • average number of distinct copies of an item located in the search • number of messages used by the searching algorithm • RW performs better than flooding if • multiple search requests for the same item with slow-changing topology • peer clustering ( see [20, 21, 22, 23, 24, 25] for details) • Searching analysis • Methodology • Flat topologies with Uniformly Distributed Content • Topologies with Peer Clustering • Re-issuing the Same Query • Real topologies

  31. Searching - Methodology • Performance Metrics • mean of the number of distinct copies (i.e. Mean) • discrepancy around the mean (i.e. Std) and the failure probability • Cost • number of messages or queries performed during search • Peer-to-peer topologies ( ≈ 1 million nodes) • Flat regular expanders, Two tier topologies with clustering, Power law graphs, Samples from real topologies • Dynamic topologies • rewiring • Content placement • Content clustering affects the performance of searching

  32. Searching – Flat Topologies • Experiment: • one request in a network of 500K peers • Mean hits,Minimum # of hits and Std are similar for Flooding and RW • the entire distribution of hits is similar for Flooding and RW

  33. Searching -Topologies with Peer Clustering • Cluster topology consists of • 5 flat regular graphs of size 40K; from each one pick randomly 1000 nodes to construct another flat regular graph • Number of hits for RW is more concentrated around the mean compared to Flooding

  34. Searching - Reissuing the Same Query • Experiment setup – repeat 4 times the below procedure • each peer sends a request and waits for response • between requests 2% of the links are rewired • each peer initiates a new searching • RW have better performance than Flooding • Mean Hits and Failure Probability

  35. Searching - Reissuing the Same Query • Performance of successive searches depends • on the number of topology changes considered between consecutive searches • Performance of Flooding increases as the rate of topological changes increases • RW Performance remains the same for small variations

  36. Searching – Real Topologies • The number of hits for RW is more concentrated around the mean than in Flooding • P2P have good expansion properties

  37. Construction • P2P network construction concerns with: • peers arrive and leave the network dynamically • strong and weak decentralization • low network overhead per addition or deletion

  38. Baseline Construction of Expander Graphs • ABASE (undirected graph) consists of: • n vertices where each one chooses randomly d vertices • total number of edges = nd and expected vertex degree = 2d • Theorem 4.1. Let G(V,E) a graph constructed by ABASE. Then, G is an expander with high probability and for positive constant α < 1

  39. Baseline Construction of Expander Graphs with Constant Overhead in Random Bits • A’BASE constructionalgorithm: • start a RW at a random vertex on H (constant degree expander graph) • when ABASE needs a random number this is taken from the RW on H • Theorem 4.2. Let G(V,E) a graph constructed by A’BASE. There are positive constants α, 0 < β < 0.5 such that any subset S of at least β|V| and at most 0.5|V| has cutset expansionαalmost surely.

  40. Distributed Construction of Expanders with Constant Overhead on Network Resources • A’H – construction • d daemons , one for each Hamilton cycle • a new arriving node, it contacts the daemon associated with the i-th Hamilton cycle • it attaches after c number of steps between the peer that currently hosts daemon iand one of its neighbors in the cycle i

  41. Distributed Construction of Expanders with Constant Overhead on Network Resources • A’M – construction • d daemons , one for each Hamilton cycle • the arrival of a new arriving node consists of two X and Y nodes; X and Y contact the central server to discover the location of the d daemons • X becomes the neighbor of daemon i and Y the neighbor of the initial daemon’s neighbor

  42. Summary • For Searching • Random Walks (RW) are superior to Flooding • For Construction • RW add new peers with constant overhead • Open Problems • Strong Decentralized Construction algorithm • Can we handle better deletions and expansions of small sets? • How the P2P network parameters (e.g. capacities) affect the performance of RW?

More Related