1 / 37

Gossip Algorithms and Mixing Times

Gossip Algorithms and Mixing Times. based on collaborations with Florence Benezit, Kannan Ramchandran Anand Sarwate, Patrick Thiran Martin Vetterli, Martin Wainwright. Alex Dimakis. problem: distributed averaging. 2. 2. Every node has a number (i.e. sensing temperature)

kdelarosa
Download Presentation

Gossip Algorithms and Mixing Times

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gossip Algorithms and Mixing Times based on collaborations with Florence Benezit, Kannan Ramchandran Anand Sarwate, Patrick Thiran Martin Vetterli, Martin Wainwright Alex Dimakis

  2. problem: distributed averaging 2 2 • Every node has a number (i.e. sensing temperature) • Every node wants access to global average • Want a randomized, distributed, localized, algorithm to compute averages. 3 5 12

  3. gossip algorithms for aggregation 2 2 2.5 • Start with initial measurement as an estimate for the average and update • Each node interacts with a random neighbor and both compute pairwise average • Converges to true average • Fundamental building block for numerous problems (distributed detection, localization, randomized projections for compressive sensing) • How many messages? 3 2.5 3.75 5 3.75 7.87 12 7.87 ( Related work: Tsitsiklis, Boyd et al, Alanyali et al. Spanos et al, Sundaram et al., Nedich et al.)

  4. Standard gossip W(1) x(0) 2 x(1) 2.5 2 2 2 3 5 12 2.5 2 2.5 5 12 = 2.5 3 5 x(t)= W(t) x(t-1) = ΠtW(t) x(0) W(t) iid random matrices 12

  5. How many messages • ε-averaging time: First time where x(t) is ε-close to the normalized true average with probability greater than 1-ε. • x(t)= W(t) x(t-1) = ΠtW(t) x(0). • Define W= E W(t) • Theorem: ε-averaging time can be bounded using the spectral gap of W: Scaling laws? or how does the number of messages scale if I double my network (Boyd, Gosh, Prabhakar and Shah, IEEE Trans. On Information Theory, June 2006)

  6. cost of standard gossip • Depends on graph and the gossip probabilities: • Complete graph: Tave=Θ(n log1/ε) (Kempe et al. FOCS’03) • Small World/Expander: Tave=Θ(n logn) • Random Geometric Graphs: Tave=Θ(n2) Note that flooding everything everywhere costs Θ(n2) messages

  7. random geometric graphs • Realistic sensor network model (Gupta & Kumar, Penrose): • Random Geometric Graph G(n,r): n random points, connect if distance<r • Connectivity threshold: r

  8. cost of standard Gossip • Standard Gossip algorithms require a lot of energy. (For realistic sensor network topologies) • Why: useful information performs random walks, diffuses slowly • Can we save energy with extra information? • Idea: gossip in random directions, diffuse faster. • Assume each node knows its location and locations of 1-hop neighbors.

  9. Random Target Routing: How to find an (almost) random node • Node picks a random location (=“target”) • Greedy routing towards the target • Probability to receive ~ Voronoi cell area

  10. Geographic Gossip 2.5 2 • Nodes use random routing to gossip with nodes far away in the network • Each interaction costs • But faster mixing • Number of messages 2.5 3 (D, Sarwate, and Wainwright. IPSN’06, IEEE Trans. on Signal Processing)

  11. Recent related work • Wenjun, Huaiyu - O(n1.5 log1/ε) using partial location lifting (L.A.D.A. ISIT’2007) • Shah, Jung, Shin - O(n1.5log1/ε) by expander liftings • (related to liftings of Markov chains for faster mixing) • Different target distribution? (say ~1/r2)? Rabbat (Statistical Signal Processing ’07) showed it still scales like O(n1.5 log1/ε) • Can n1.5 be improved?

  12. Why not average on the path? • Averaging on the routed path? • The routed packet computes the sum of all the nodes it visits, and a hop-count. The average is propagated backwards to all the nodes on the path. 2 2 2 4 1 2 2 2 1

  13. Path averaging is optimal • Theorem: Geographic gossip with path averaging on G(n, r) requires expected number of messages Tave=Θ(nlog1/ε) (with high probability over random graphs) • Optimal number of messages (since averaging n numbers). • Analysis bounds eigenvalues of a random matrix through the Poincare inequality. (Benezit, D., Thiran, and Vetterli, Allerton 2007)

  14. Proof ingredients: appetizers • ε-averaging time: First time where x(k) is ε-close to the normalized true average with probability greater than 1-ε. • x(t)= W(t) x(t-1) = ΠtW(t) x(0). • Define W= E W(t) • The ε-averaging time can be bounded using the spectral gap of W:

  15. Proof ingredients:Poincare inequality • Capacity of an edge e=(i,j),C(e)=π(i)Pij • Each pair of states x,y has demand D(x,y)=π(x) π(y) • A flow is a way to route D(x,y) units from x to y, for all pairs. • Cost of a flow is ρ(f)= maxef(e)/ C(e). Length of a flow is the length of the longest flow-carrying path. • Poincare inequality: j i Diaconis, Strook, Geometric bounds on eigenvalues, Ann. of Applied Probability, 1991 Sinclair, Improved bounds for mixing rates of markov chains and multicommodity flow. STOC’91, Combinatorics, Probability and Computing, volume 1, 1992

  16. Proof ingredients: Counting paths W(i,j)= 1/length of path * Prob(i,j) averaged together) Key lemma: Typical pairs of nodes i,j are averaged in Ω(n) paths. W(i,j)=Ω(n1.5) if d(i,j)<= const n0.5

  17. Mixing ingredients • For each pair of nodes (i,j) we construct a flow that does not overload any edge. • Can be done by going through one intermediate node for each pair i,j • D(x,y)= 1/n2 , C(e)1/n 1/n1.5=1/n2.5 (for e typical) • Cost= ρ(f)=n1/2 , (f)=2 • Poincare coefficient O( n1/2) • Averaging time O(n1/2), but each averaging round costs O(n1/2) messages (= typical path length).

  18. gossip and mobility

  19. gossip and mobility Refresh:Every node randomly and independently chooses a new site

  20. gossip and mobility • Grid with no mobility Tave=Θ(n2log1/ε) • Grid with full mobility Tave=Θ(nlog1/ε) • What if only horizontal mobility?

  21. gossip and mobility • Grid- horizontal mobility Tave=Θ(n2log1/ε) (useless) • Bidirectional mobility Tave=Θ(nlog1/ε) (as good as full) • Adding m mobile agents: cuts time by m factor Sarwate,D: The impact of mobility on gossip algorithms (Infocom 2009)

  22. gossip and mobility

  23. Open problem: gossip and mobility Random walk: Every node does random walk step on grid.

  24. Open problem: gossip and mobility

  25. Open problem: gossip and mobility • The random walk model tends to the refresh model if enough moving steps m are allowed between each gossip step • Conjecture: the convergence is monotonically decreasing in the number m • What is the scaling of messages for m=1 ? • More realistic mobility models (e.g. random waypoint)? • Gossip algorithms are especially interesting for scenarios with mobility and unknown topology

  26. Research directions • Showed that random path averaging is optimal. • Random path idea can be useful for message passing/distributed optimization algorithms more generally. • This scheduling avoids diffusive nature of random walks-spreads spatial information fast. • Impact of mobility? • Exploiting the physical layer? • Impact of quantization, message errors? • Scaling laws for distributed optimization problems?

  27. fin

  28. 1/n2 Poincare inequality: an example • Capacity C(e)=π(i)Pij= 1/n 1/2 • D(x,y)=π(x) π(y)= 1/n2 • A flow is a way to route D(x,y) units from x to y, for all pairs. (no choice!) • Cost of a flow is ρ(f)= maxef(e)/ C(e) • ρ(f)=1/ (1/2n)=2n • Length of a flow is the length of the longest flow-carrying path. (f)=n • Poincare inequality:

  29. t1 t3 t2 F t4 Statistical inference x3 x1 Consider an undirected graphical model with potential functions i , ij The probability of an observation (x1..5) is 12 x5 x4 x2 2 What is the distribution of x5 given some (noisy) observations?

  30. Inference in sensor networks through message passing y2 y1 x4 x2 x3 x1 x9 y5 x8 x7 x5 x6 y6 y8

  31. Message passing for sensor networks • Optimal mappings from nodes to motes are NP-hard. • There exist graphical models which require routing. • Theorem: Any graphical model can be modified so that no routing is required. • Reweighted belief propagation is very robust and practical. (packet drops, node failures, dynamic changes) (Schiff, Antonelli, D, Chu, and Wainwright. IPSN’07)

  32. Motes 1 7 A B C 13 D E F 19 25 G H I 31 Deployment • Two space heaters for temperature gradient • Empirical model to generate potential functions • Sample at red dots, predict at blue dots

  33. Estimate temperature as mean of marginal Good accuracy Within 3 degrees! Deployment Results 8 11 22

  34. Simulation results • Errors in Sensor Readings

  35. Resilience to Communication Failures (simulation results) • Dead Motes

  36. Undirected Links Directed Links Resilience to Communication Failure II

  37. 1/n2 Poincare inequality: an example • Capacity C(e)=π(i)Pij= 1/n 1/2 • D(x,y)=π(x) π(y)= 1/n2 • A flow is a way to route D(x,y) units from x to y, for all pairs. (no choice!) • Cost of a flow is ρ(f)= maxef(e)/ C(e) • ρ(f)=1/ (1/2n)=2n • Length of a flow is the length of the longest flow-carrying path. (f)=n • Poincare inequality:

More Related