Data-Powered Algorithms

Data-Powered Algorithms Bernard Chazelle Princeton University

Tools

Linear Programming

N constraints and d variables

Dimension Reduction 25 10000 Images (face recognition) Signals (voice recognition) Text (NLP) . . . Nearest neighbor searching Clustering . . .

Dimension reduction All pairwise distances nearly preserved

Johnson-Lindenstrauss Transform (JLT) d v Random Orthogonal Matrix c log n 2 d

Friendly JLT d N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) c log n 2 N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1)

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - Friendlier JLT d c log n 2 d log n 2 = ( )

1 1 1 1 1 1 1 + + + + + + + - - - - - - - Sparse JLT ? d 0 . . . 0 0 0 0 0 0 c log n 2 1 d 0 0 0 0 0 . . . o(1)-Fraction non-zeros 0

Main Tool: Uncertainty Principle Heisenberg Time Frequency

1 1 1 1 + + + + - - - - c log n 2 log3 n 2 = O( + d log d + d ) Fast Johnson-Lindenstrauss Transform (FJLT) d d d Discrete Fourier Transform 0 N(0,1) d . . . Optimal ??

Data-Powered Algorithms

theory experimentation

theory experimentation computation

theory experimentation 1950... computation

input output Most interesting problems are too hard !!

input output So, we change the model… randomization approximation

input output PTAS for ETSP randomization approximation

input output Impossible to approximate chromatic number within a factor of… randomization approximation

input output Berkeley “school” (program checking & probabilistic proofs) randomization Property Testing [RS’96, GGR’96] approximation

Property Testing

Distance is 3

Distance is 4 edit distance

no bipartite yes

no anything bipartite yes [GR’97]

Mixingcase 18 17 7 62 bipartite! non-bipartite! polylog cycles Birthday paradox

Non-mixingcase Nonmixing implies small cuts [M’89]

Dense graphs Hofstadter. Godel, Escher, Bach. Is graph k-colorable? [GGR98, AK99]

Main tool Szemerédi’s Regularity Lemma Far from k-colorable Lots of witnesses

Property Testing http://www.cs.princeton.edu/~chazelle/ • Graph algorithms • connectivity • acyclicity • k-way cuts • clique • Distributions • independence • entropy • monotonicity • distances • Geometry • convexity • disjointness • delaunay • planeEMST

Data-Powered Algorithms