Fast Regression Algorithms Using Spectral Graph Theory

Fast Regression Algorithms Using Spectral Graph Theory Richard Peng

Outline • Regression: why and how • Spectra: fast solvers • Graphs: tree embeddings

Learning / Inference Find (hidden) pattern in (noisy) data Input signal, s: Output:

Regression • p ≥ 1: convex • Convex constraints e.g. linear equalities minimize Mininimize: |x|pSubject to: constraints on x

Application 0: LASSO Ax Widely used in practice: • Structured output • Robust to noise [Tibshirani `96]:Min |x|1s.t.Ax = s

Application 1: ImageS • MinΣi~j∈E(xi-xj-si~j)2 • Poisson image processing • No bears were harmed in the making of these slides

Application 2: Min cut 0 0 1 1 s t • Min Σij∈E|xi-xj| • s.t.xs=0, xt=1 0 0 1 1 • Fractional solution = integral solution • Remove fewest edges to separate vertices s and t

Regression Algorithms Convex optimization • 1940~1960: simplex, tractable • 1960~1980: ellipsoid, poly time • 1980~2000: interior point, efficient minimize • m = # non-zeros • Õ hides log factors • Õ(m1/2) interior steps

Efficiency Matters • m > 106 for most images • Even bigger (109): • Videos • 3D medical data

Key Subroutine Ax=b minimize • Õ(m1/2) Each step of interior point algorithms finds a step direction • Linear system solves

More Reasons for Fast Solvers [Boyd-Vanderberghe `04], Figure 11.20: The growth in the average number of Newton iterations (on randomly generated SDPs)… is very small

Linear System Solvers • [1st century CE] Gaussian Elimination: O(m3) • [Strassen `69] O(m2.8) • [Coppersmith-Winograd `90] O(m2.3755) • [Stothers `10] O(m2.3737) • [Vassilevska Williams`11] O(m2.3727) • Total: > m2

Not fast  not used: • Preferred in practice: coordinate descent, subgradient methods • Solution quality traded for time

Fast Graph Based L2 Regression[Spielman-Teng ‘04] More in 12 slides Ax=b Input: Linear system where A is related to graphs, b Output: Solution to Ax=b Runtime: Nearly Linear, Õ(m)

Graphs Using Algebra Ax=b Fast convergence + Low cost per step = state of the art algorithms

Laplacian Paradigm Ax=b [Daitch-Spielman `08]: mincostfow [Christiano-Kelner-Mądry-Spielman-Teng `11]: approx maximum flow /min cut

Extension 1 [Chin-Mądry-Miller-P `12]: regression, image processing, grouped L2

Extension 2 [Kelner-Miller-P `12]: k-commodity flow Dual: k-variate labeling of graphs t s

Extension 3 [Miller-P `13]: faster for structured images / separable graphs

Need: Fast Linear System Solvers Ax=b minimize Implication of fast solvers: • Fast regression routines • Parallel, work efficient graph algorithms

Other Applications • [Tutte `66]: planar embedding • [Boman-Hendrickson-Vavasis`04]: PDEs • [Orecchia-Sachedeva-Vishnoi`12]: balanced cut / graph separator

Outline • Regression: why and how • Spectra: Linear system solvers • Graphs: tree embeddings

Problem Given: matrix A, vector b Size of A: • n-by-n • m non-zeros Ax=b

Special Structure of A • A = Deg – Adj • Deg: diag(degree) • Adj: adjacency matrix Aij= deg(i) if i=j w(ij) otherwise ` [Gremban-Miller `96]: extensions to SDD matrices

Unstructured Graphs • Social network • Intermediate systems of other algorithms are almost adversarial

Nearly Linear Time Solvers[Spielman-Teng ‘04] Input: n by n graph LaplacianA with m non-zeros, vector b Where: b = Ax for some x Output: Approximate solution x’ s.t. |x-x’|A<ε|x|A Runtime: Nearly Linear. O(m logcn log(1/ε)) expected • runtime is cost per bit of accuracy. • Error in the A-norm: |y|A=√yTAy.

How Many Logs Runtime: O(mlogcnlog(1/ ε)) Value of c: I don’t know  [Spielman]: c≤70 [Miller]: c≤32 [Koutis]: c≤15 [Teng]: c≤12 [Orecchia]: c≤6 When n = 106, log6n > 106

Practical Nearly Linear Time Solvers[Koutis-Miller-P `10] Input: n by n graph LaplacianA with m non-zeros, vector b Where: b = Ax for some x Output: Approximate solution x’ s.t. |x-x’|A<ε|x|A Runtime: O(mlog2n log(1/ ε)) • runtime is cost per bit of accuracy. • Error in the A-norm: |y|A=√yTAy.

Practical Nearly Linear Time Solvers[Koutis-Miller-P `11] Input: n by n graph LaplacianA with m non-zeros, vector b Where: b = Ax for some x Output: Approximate solution x’ s.t. |x-x’|A<ε|x|A Runtime: O(mlognlog(1/ ε)) • runtime is cost per bit of accuracy. • Error in the A-norm: |y|A=√yTAy.

Stages of The Solver • Iterative Methods • Spectral Sparsifiers • Low Stretch Spanning Trees

Iterative Methods Numerical analysis: Can solve systems in A by iteratively solving spectrally similar, but easier, B

What is Spectrally Similar? A ≺ B ≺ kA for some small k • Ideas from scalars hold! • A ≺ B: for any vector x, |x|A2 < |x|B2 [Vaidya `91]: Since A is a graph, B should be too! [Vaidya `91]: Since G is a graph, H should be too!

`Easier’ H • Ways of easier: • Fewer vertices • Fewer edges Can reduce vertex count if edge count is small Goal: H with fewer edges that’s similar to G

Graph Sparsifiers Sparse equivalents of graphs that preserve something • Spanners: distance, diameter. • Cut sparsifier: all cuts. • What we need: spectrum

What we need: ultraSparsifiers [Spielman-Teng `04]: ultrasparsifiers with n-1+O(mlogpn/k) edges imply solvers with O(mlogpn) running time. ` • Given: G with n vertices, m edgesparameter k • Output: H with n vertices, n-1+O(mlogpn/k) edges • Goal: G ≺ H ≺ kG `

Example: Complete Graph O(nlogn) random edges (with scaling) suffice w.h.p.

General Graph Sampling Mechanism • For edge e, flip coin Pr(keep) = P(e) • Rescale to maintain expectation Number of edges kept: ∑e P(e) Also need to prove concentration

Effective Resistance • View the graph as a circuit • R(u,v) = Pass 1 unit of currentfrom u to v, measure resistance of circuit `

EE101 • Effective resistance in general:solve Gx = euv, where euv is indicator vector, R(u,v) = xu – xv. `

(Remedial?) EE101 • w1 • R(u, v) = 1/w1 ` • u • v • w1 • w2 • R(u, v) = 1/w1 + 1/w2 ` • u • v • Single edge: R(e) = 1/w(e) • Series: R(u, v) = R(e1) + … + R(el)

Spectral Sparsification by Effective REsistance [Spielman-Srivastava `08]: Setting P(e) to W(e)R(u,v)O(logn) gives G ≺ H ≺ 2G* • [Foster `49]:∑e W(e)R(e) = n-1 • Spectral sparsifier with O(nlogn) edges • Ultrasparsifier? Solver??? • *Ignoring probabilistic issues

The Chicken and Egg Problem How to find effective resistance? • [Spielman-Srivastava `08]: use solver • [Spielman-Teng `04]: need sparsifier

Our Work Around • Use upper bounds of effective resistance, R’(u,v) • Modify the problem

Rayleigh’s Monotonicity Law ` • Rayleigh’s Monotonicity Law: R(u,v) only increase when edges are removed Calculate effective resistance w.r.t. a tree T

Sampling Probabilities According to Tree ` • Sample Probability: edge weight times effective resistance of tree path • stretch • Goal: small total stretch

Good Trees Exist More in 12 slides (again!) Every graph has a spanning tree with total stretch O(mlogn) • Hiding loglogn • ∑e W(e)R’(e) = O(mlogn) • O(mlog2n) edges, too many!

‘Good’ Tree??? ` • Stretch = 1+1 = 2 Unit weight case:stretch ≥ 1 for all edges

What Are We Missing? • Haven’t used k! ` • Need: • G≺ H ≺ kG • n-1+O(mlogpn/k) edges • Generated: • G≺ H ≺ 2G • n-1+O(mlog2n) edges `  

Use k, somehow ` • G ≺ G’≺ kG • Tree is good! • Increase weights of tree edges by factor of k

Result ` • Stretch = 1/k+1/k = 2/k • Tree heavier by factor of k • Tree effective resistance decrease by factor of k

Fast Regression Algorithms Using Spectral Graph Theory

Fast Regression Algorithms Using Spectral Graph Theory

Presentation Transcript

Spectral Graph Theory and Applications Advanced Course WS2011

Graph Algorithms

Algorithm Design Using Spectral Graph Theory

Graph Theory and Spectral Methods for Pattern Recognition

Graph Algorithms

Graph Algorithms

Spectral Graph Theory (Basics)

SDE: Graph Drawing Using Spectral Distance Embedding

Spectral Graph Theory

Graph Algorithms

Graph Algorithms

Graph Algorithms

Graph Algorithms

Knight’s Tour using Graph Theory

Graph Algorithms

Graph Algorithms

Graph Algorithms

Graph Algorithms

Graph Algorithms

Graph Algorithms