Gossip Algorithms and Mixing Times

Gossip Algorithms and Mixing Times based on collaborations with Florence Benezit, Kannan Ramchandran Anand Sarwate, Patrick Thiran Martin Vetterli, Martin Wainwright Alex Dimakis

problem: distributed averaging 2 2 • Every node has a number (i.e. sensing temperature) • Every node wants access to global average • Want a randomized, distributed, localized, algorithm to compute averages. 3 5 12

gossip algorithms for aggregation 2 2 2.5 • Start with initial measurement as an estimate for the average and update • Each node interacts with a random neighbor and both compute pairwise average • Converges to true average • Fundamental building block for numerous problems (distributed detection, localization, randomized projections for compressive sensing) • How many messages? 3 2.5 3.75 5 3.75 7.87 12 7.87 ( Related work: Tsitsiklis, Boyd et al, Alanyali et al. Spanos et al, Sundaram et al., Nedich et al.)

Standard gossip W(1) x(0) 2 x(1) 2.5 2 2 2 3 5 12 2.5 2 2.5 5 12 = 2.5 3 5 x(t)= W(t) x(t-1) = ΠtW(t) x(0) W(t) iid random matrices 12

How many messages • ε-averaging time: First time where x(t) is ε-close to the normalized true average with probability greater than 1-ε. • x(t)= W(t) x(t-1) = ΠtW(t) x(0). • Define W= E W(t) • Theorem: ε-averaging time can be bounded using the spectral gap of W: Scaling laws? or how does the number of messages scale if I double my network (Boyd, Gosh, Prabhakar and Shah, IEEE Trans. On Information Theory, June 2006)

cost of standard gossip • Depends on graph and the gossip probabilities: • Complete graph: Tave=Θ(n log1/ε) (Kempe et al. FOCS’03) • Small World/Expander: Tave=Θ(n logn) • Random Geometric Graphs: Tave=Θ(n2) Note that flooding everything everywhere costs Θ(n2) messages

random geometric graphs • Realistic sensor network model (Gupta & Kumar, Penrose): • Random Geometric Graph G(n,r): n random points, connect if distance<r • Connectivity threshold: r

cost of standard Gossip • Standard Gossip algorithms require a lot of energy. (For realistic sensor network topologies) • Why: useful information performs random walks, diffuses slowly • Can we save energy with extra information? • Idea: gossip in random directions, diffuse faster. • Assume each node knows its location and locations of 1-hop neighbors.

Random Target Routing: How to find an (almost) random node • Node picks a random location (=“target”) • Greedy routing towards the target • Probability to receive ~ Voronoi cell area

Geographic Gossip 2.5 2 • Nodes use random routing to gossip with nodes far away in the network • Each interaction costs • But faster mixing • Number of messages 2.5 3 (D, Sarwate, and Wainwright. IPSN’06, IEEE Trans. on Signal Processing)

Recent related work • Wenjun, Huaiyu - O(n1.5 log1/ε) using partial location lifting (L.A.D.A. ISIT’2007) • Shah, Jung, Shin - O(n1.5log1/ε) by expander liftings • (related to liftings of Markov chains for faster mixing) • Different target distribution? (say ~1/r2)? Rabbat (Statistical Signal Processing ’07) showed it still scales like O(n1.5 log1/ε) • Can n1.5 be improved?

Why not average on the path? • Averaging on the routed path? • The routed packet computes the sum of all the nodes it visits, and a hop-count. The average is propagated backwards to all the nodes on the path. 2 2 2 4 1 2 2 2 1

Path averaging is optimal • Theorem: Geographic gossip with path averaging on G(n, r) requires expected number of messages Tave=Θ(nlog1/ε) (with high probability over random graphs) • Optimal number of messages (since averaging n numbers). • Analysis bounds eigenvalues of a random matrix through the Poincare inequality. (Benezit, D., Thiran, and Vetterli, Allerton 2007)

Proof ingredients: appetizers • ε-averaging time: First time where x(k) is ε-close to the normalized true average with probability greater than 1-ε. • x(t)= W(t) x(t-1) = ΠtW(t) x(0). • Define W= E W(t) • The ε-averaging time can be bounded using the spectral gap of W:

Proof ingredients:Poincare inequality • Capacity of an edge e=(i,j),C(e)=π(i)Pij • Each pair of states x,y has demand D(x,y)=π(x) π(y) • A flow is a way to route D(x,y) units from x to y, for all pairs. • Cost of a flow is ρ(f)= maxef(e)/ C(e). Length of a flow is the length of the longest flow-carrying path. • Poincare inequality: j i Diaconis, Strook, Geometric bounds on eigenvalues, Ann. of Applied Probability, 1991 Sinclair, Improved bounds for mixing rates of markov chains and multicommodity flow. STOC’91, Combinatorics, Probability and Computing, volume 1, 1992

Proof ingredients: Counting paths W(i,j)= 1/length of path * Prob(i,j) averaged together) Key lemma: Typical pairs of nodes i,j are averaged in Ω(n) paths. W(i,j)=Ω(n1.5) if d(i,j)<= const n0.5

Mixing ingredients • For each pair of nodes (i,j) we construct a flow that does not overload any edge. • Can be done by going through one intermediate node for each pair i,j • D(x,y)= 1/n2 , C(e)1/n 1/n1.5=1/n2.5 (for e typical) • Cost= ρ(f)=n1/2 , (f)=2 • Poincare coefficient O( n1/2) • Averaging time O(n1/2), but each averaging round costs O(n1/2) messages (= typical path length).

gossip and mobility

gossip and mobility Refresh:Every node randomly and independently chooses a new site

gossip and mobility • Grid with no mobility Tave=Θ(n2log1/ε) • Grid with full mobility Tave=Θ(nlog1/ε) • What if only horizontal mobility?

gossip and mobility • Grid- horizontal mobility Tave=Θ(n2log1/ε) (useless) • Bidirectional mobility Tave=Θ(nlog1/ε) (as good as full) • Adding m mobile agents: cuts time by m factor Sarwate,D: The impact of mobility on gossip algorithms (Infocom 2009)

gossip and mobility

Open problem: gossip and mobility Random walk: Every node does random walk step on grid.

Open problem: gossip and mobility

Open problem: gossip and mobility • The random walk model tends to the refresh model if enough moving steps m are allowed between each gossip step • Conjecture: the convergence is monotonically decreasing in the number m • What is the scaling of messages for m=1 ? • More realistic mobility models (e.g. random waypoint)? • Gossip algorithms are especially interesting for scenarios with mobility and unknown topology

Research directions • Showed that random path averaging is optimal. • Random path idea can be useful for message passing/distributed optimization algorithms more generally. • This scheduling avoids diffusive nature of random walks-spreads spatial information fast. • Impact of mobility? • Exploiting the physical layer? • Impact of quantization, message errors? • Scaling laws for distributed optimization problems?

fin

1/n2 Poincare inequality: an example • Capacity C(e)=π(i)Pij= 1/n 1/2 • D(x,y)=π(x) π(y)= 1/n2 • A flow is a way to route D(x,y) units from x to y, for all pairs. (no choice!) • Cost of a flow is ρ(f)= maxef(e)/ C(e) • ρ(f)=1/ (1/2n)=2n • Length of a flow is the length of the longest flow-carrying path. (f)=n • Poincare inequality:

t1 t3 t2 F t4 Statistical inference x3 x1 Consider an undirected graphical model with potential functions i , ij The probability of an observation (x1..5) is 12 x5 x4 x2 2 What is the distribution of x5 given some (noisy) observations?

Inference in sensor networks through message passing y2 y1 x4 x2 x3 x1 x9 y5 x8 x7 x5 x6 y6 y8

Message passing for sensor networks • Optimal mappings from nodes to motes are NP-hard. • There exist graphical models which require routing. • Theorem: Any graphical model can be modified so that no routing is required. • Reweighted belief propagation is very robust and practical. (packet drops, node failures, dynamic changes) (Schiff, Antonelli, D, Chu, and Wainwright. IPSN’07)

Motes 1 7 A B C 13 D E F 19 25 G H I 31 Deployment • Two space heaters for temperature gradient • Empirical model to generate potential functions • Sample at red dots, predict at blue dots

Estimate temperature as mean of marginal Good accuracy Within 3 degrees! Deployment Results 8 11 22

Simulation results • Errors in Sensor Readings

Resilience to Communication Failures (simulation results) • Dead Motes

Undirected Links Directed Links Resilience to Communication Failure II

1/n2 Poincare inequality: an example • Capacity C(e)=π(i)Pij= 1/n 1/2 • D(x,y)=π(x) π(y)= 1/n2 • A flow is a way to route D(x,y) units from x to y, for all pairs. (no choice!) • Cost of a flow is ρ(f)= maxef(e)/ C(e) • ρ(f)=1/ (1/2n)=2n • Length of a flow is the length of the longest flow-carrying path. (f)=n • Poincare inequality:

Gossip Algorithms and Mixing Times

Gossip Algorithms and Mixing Times

Presentation Transcript

HANDLING GOSSIP AND RUMORS

Gossip

Gossip

Gossip and its application

HANDLING GOSSIP AND RUMORS

Gossip

Gossip

Mixing Times of Self-Organizing Lists and Biased Permutations

Gossip Algorithms and Implementing a Cluster/Grid Information service MsSys Course

Slander and Gossip

HANDLING GOSSIP AND RUMORS

Gossip Algorithms

GOSSIP

…and Mixing

Celebrity News And Gossip

Celebrity Gossip and News

Gossip Algorithms and Emergent Shape

HANDLING GOSSIP AND RUMORS

HANDLING GOSSIP AND RUMORS

HANDLING GOSSIP AND RUMORS