Discrepancy and SDPs

Discrepancy and SDPs Nikhil Bansal (TU Eindhoven, Netherlands ) August 24, ISMP 2012, Berlin

Outline Discrepancy Theory • What is it • Applications • Basic Results (non-constructive) SDP connection • Algorithms • Lower Bounds

Discrepancy Theory: What is it? Study of discrepancy between self-perception and reality

Discrepancy: What is it? Study of irregularitiesin approximating the continuousby the discrete. Historical motivation: Numerical Integration/ Sampling How well can you approximate a region by discrete points ?

Discrepancy: What is it? Problem: How uniformly can you distribute points in a grid. “Uniform” : For every axis-parallel rectangle R | (# points in R) - (Area of R) | should be low. Discrepancy: Max over rectangles R |(# points in R) – (Area of R)| R n1/2 n1/2

Distributing points in a grid Problem: How uniformly can you distribute points in a grid. “Uniform” : For every axis-parallel rectangle R | (# points in R) - (Area of R) | should be low. n= 64 points Uniform Random Van der Corput Set n1/2 discrepancy n1/2 (loglog n)1/2 O(log n) discrepancy!

Quasi-Monte Carlo Methods n randomsamples (Monte Carlo) : Error Quasi-Monte Carlo Methods* : Extensive research area. *Different constant of proportionality

Discrepancy: Example 2 Input: n points placed arbitrarily in a grid. Color them red/blue such that each axis-parallel rectangle is colored as evenly as possible Discrepancy: max over rect. R ( | # red in R - # blue in R | ) Continuous: Color each element 1/2 red and 1/2 blue (0 discrepancy) Discrete: Random has about O(n1/2 log1/2 n) Can achieve O(log2.5 n) Why do we care?

S3 S4 S1 S2 Combinatorial Discrepancy Universe: U= [1,…,n] Subsets: S1,S2,…,Sm Color elements red/blue so each set is colored as evenly as possible. Find : [n] ! {-1,+1} to Minimize |(S)|1 = maxS| i 2 S(i) | Example: U={1,2,3} disc = 0 For disc = 2

Combinatorial Discrepancy If A is a matrix. Disc(A) = Set system: A = {0,1} incidence matrix

Applications CS: Computational Geometry, Comb. Optimization, Monte-Carlo simulation, Machine learning, Complexity, Pseudo-Randomness, … Math: Dynamical Systems, Combinatorics, Mathematical Finance, Number Theory, Ramsey Theory, Algebra, Measure Theory, …

Hereditary Discrepancy Discrepancy a useful measure of complexity of a set system Hereditary discrepancy: herdisc (U,S) = maxdisc (U’, S|U’) Robust version of discrepancy 1 2 … n 1’ 2’ … n’ A1 A2 … But not so robust A’1 A’2 … Discrepancy = 0

Two Applications

Rounding Lovasz-Spencer-Vesztermgombi’86: Given any matrix A, and can round x to s.t. Proof: Round the bits of x one by one. : blah .0101101 : blah .1101010 … : blah .0111101 Error = herdisc(A) ( ) x Ax=b Key Point: Low discrepancy coloring guides our updates! (-1) A (+1)

Rounding LSV’86 result guarantees existence of good rounding. How to find it efficiently? Nothing known until recently. Thm [B’10]. Can round efficiently so that

Discrepancy and optimization Corollary(LSV’86): A is integer matrix, herdisc(A) =1, then A is TU. (Totally unimodular: Ax b polytope integral for all integer vectors b.) Ghouila-Houri test for TU matrices. Open: Can you characterize matrices with herdisc(A) = 2? Bin Packing:OPT LP + O(1) ? [Eisenbrand, Palvolgyi, Rothvoss’11]:Yes. For constant item sizes, if k-permutation conjecture is true. (Recently, Newman-Nikolov’11 disproved the k-permutation conjecture) Refined further by Rothvoss’12.(Entropy rounding method)

Dynamic Data Structures N weighted points in a 2-d region. Weights updated over time. Query: Given an axis-parallel rectangle R, determine the total weight on points in R. Goal: Preprocess (in a data structure) • Low query time • Low updatetime (upon weight change)

Example Line:Interval queries Trivial: Query Time= O(n) Update Time = 1 Query time= 1 Update time= O() (Table of entries W[a,b] ) Query time = 2 Update time= O(n) (W[a,b] = W[0,b] – W[0,a]) Query = O(log n) Update = O(log n) Recursively for 2-d.

What about other queries? Circles arbitrary rectangles aligned triangle Turns out Reason: Set system S formed by query sets & points has large discrepancy (about ) Larsen-Green’11

Bounding Discrepancy

General set system What is the discrepancy of a general system on m sets? Useful Fact:After n coin tosses E[# Heads] = n/2 # Heads with prob. In general: n independent “nice” random variables Then ] with prob.

(Previous) Best Algorithm Random: Color each element i independently x(i) = +1 or -1 with prob. ½ each. Thm: Discrepancy = O (n log m)1/2 Pf: For each set, expect O(n1/2) discrepancy Standard tail bounds: Pr[ | i 2 S x(i) | ¸c n1/2 ] ¼e-c2 Union bound + Choose c ¼ (log m)1/2 Tight: Random cannot do better. For m=n case: Random

Better Colorings Exist! [Spencer 85]: (Six standard deviations suffice) Any system with n sets has discrepancy ·6n1/2 (In general for arbitrary m, discrepancy = O(n1/2log(m/n)1/2) Tight: For m=n, cannot beat 0.5 n1/2 (Hadamard Matrix) Inherently non-constructive proof (counting) Powerful Entropy Method. Question: Can we find it algorithmically ? Certain algorithms do not work [Spencer] Conjecture[Alon-Spencer]: May not be possible. Space of colorings

Results Thm: Can get Spencer’s bound constructively. That is, O(n1/2) discrepancy for m=n sets. Thm: For any set system, can find coloring with Discrepancy·Hereditary discrepancy. Corollary: Rounding w/ error=Herdisc(A). General Technique:k-permutation problem [Spencer, Srinivasan,Tetali] geometric problems , Beck Fiala setting (Srinivasan’s bound) …

SDPs Vector Program View: Variables:Vectors (in arbitrary dimension) Constraints:Arbitrary linear constraints on e.g.

Relaxations: LPs and SDPs Not clear how to use. Linear Program is useless. Can color each element ½ red and ½ blue. Discrepancy of each set = 0! In general, if x is a good coloring, then so is –x. But SDPs: | i 2 S vi |2· n 8 S |vi|2 = 1 8 i Intended solution vi = (+1,0,…,0) or (-1,0,…,0). Trivially feasible: vi = ei (all vi’s orthogonal) Yet, SDPs will be a major tool.

Punch line SDP very helpful if “tighter” () bounds for some sets. But why does it work for Spencer’s setting? An additional idea needed. Algorithm constructs coloring over time, using severalSDPs.

start finish Algorithm (at high level) Each dimension: An Element Each vertex: A Coloring Cube: {-1,+1}n Algorithm: “Sticky” random walk Each step generated by rounding a suitable SDP Move in various dimensions correlated, e.g. t1 + t2¼ 0 Analysis: Few steps to reach a vertex (walk has high variance) Disc(Si) does a random walk (with low variance)

An SDP Hereditary disc. ) the following SDP is feasible SDP: Low discrepancy |i 2 Sj vi |2 ·2 for each set . |vi|2 = 1 for each element i. Obtain vi2 Rn Perhaps can guide us how to update element i ? Trouble: is a vector. Need a real number. Perhapsproject onsome vector g?(i.e. for each i, consider i= g¢vi) Seems promising:

Idea Which vector g to project on? Pick a randomGaussian vector g in g = () each is i.i.d. N(0,1) Lemma: If g 2 Rn is a random Gaussian, for any v 2 Rn, g ¢ v is distributed as N(0, |v|2). Pf: N(0,a2) + N(0,b2) = N(0,a2+b2) g¢v= iv(i) gi»N(0, i v(i)2)

Properties of Rounding Lemma: If g 2 Rn is a random Gaussian, for any v 2 Rn, g ¢ v is distributed as N(0, |v|2) Recall: i = g ¢ vi • Each i» N(0,1) • For each set S, • i 2 Si = g ¢ (i2 S vi) » N(0, ·2) • (std deviation ·) SDP: |vi|2 = 1 |i2S vi|2·2 ’s will guide our updates to x.

+1 time -1 Algorithm Overview Construct coloring iteratively. Initially: Start with coloring x0 = (0,0,0, …,0) at t = 0. At Time t: Update coloring as xt = xt-1 +  (t1,…,tn) ( tiny: 1/n suffices) xt(i) = (1i + 2i + … + ti) Color of element i: Does random walk over time with step size ¼ N(0,1) x(i) Fixed if reaches -1 or +1. Set S: xt(S) = i 2 S xt(i) does a random walk w/ step N(0,·2)

Analysis Consider time T = O(1/2) Claim 1: With prob. ½, an element reaches -1 or +1. Pf: Each element doing random walk (martingale) with size ¼. Recall: Random walk with step 1, is ¼t1/2away in t steps. Claim 2: Each set has O() discrepancy in expectation. Pf: For each S, xt(S) doing random walk with step size ¼. At time T = O((log n)/) Prob. that an element still floating < 1/(10 n). Expected discrepancy of set = (By Chernoff, all have discrepancy O( )

start finish Recap At each step of walk, formulate SDP on floatingvariables. SDP solution -> Guides the walk Properties of walk: High Variance -> Quick convergence Low variance for discrepancy on sets -> Low discrepancy

Refinements Spencer’s six std deviations result: Recall: Want O(n1/2)discrepancy, but random coloring givesn1/2(log n)1/2 Previous approach seems useless: Expected discrepancy for a set O(n1/2), but some random walks will deviate by up to (log n)1/2 factor Tune down the variance of dangerous sets (not too many) Entropy Method -> SDP still feasible. Danger 3 … Danger 1 Danger 2 … 35n1/2 0 30n1/2 20n1/2

Further Developments Can be derandomized[Bansal-Spencer’11] Our algorithm still uses the Entrpoy method. Gives nonew proof of Spencer’s result. Is there a purely constructive proof ? Lovett Meka’12: Yes. Gaussian random walks + Linear Algebra

Matousek Lower Bound Thm(Lovasz Spencer Vesztergombi’86): (A) detlb(A): Conjecture (LSV’86): Herdisc O(1) detlb Remark: For TU Matrices, Herdisc(A) =1, detlb = 1 (every submatrix has det -1,0 or +1)

Detlb Hoffman: Detlb(A) 2 Palvolgyi’11: gap Matousek’11: herdisc(A) O(log n ) detlb. Idea:Our Algorithm -> SDP relaxation is not too weak SDP Duality -> Dual Witness for large herdisc(A). Dual Witness -> Submatrixwith large determinant. Other implications:

In Conclusion Various basic questions remain open in discrepancy. Algorithmic questions: • Conjecture (Matousek’11): disc(A) hervecdisc(A) (would imply tight bound herdisc(A) = O(log m) detlb(A)) • Constructive Banaszczyk bound ( ) for Beck Fiala. • Approximation for hereditary discrepancy? Various other non-constructive methods: Counting, topological, fixed points, … What can be made constructive, not so well understood ?

Thank you for your attention

Discrepancy and SDPs