420 likes | 522 Views
Data-Powered Algorithms. Bernard Chazelle Princeton University. Tools. Linear Programming. N constraints and d variables. N constraints and d variables. Dimension Reduction. 25. 10000. Images (face recognition) Signals (voice recognition) Text (NLP) . . .
E N D
Data-Powered Algorithms Bernard Chazelle Princeton University
Dimension Reduction 25 10000 Images (face recognition) Signals (voice recognition) Text (NLP) . . . Nearest neighbor searching Clustering . . .
Dimension reduction All pairwise distances nearly preserved
Johnson-Lindenstrauss Transform (JLT) d v Random Orthogonal Matrix c log n 2 d
Friendly JLT d N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) c log n 2 N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - Friendlier JLT d c log n 2 d log n 2 = ( )
1 1 1 1 1 1 1 + + + + + + + - - - - - - - Sparse JLT ? d 0 . . . 0 0 0 0 0 0 c log n 2 1 d 0 0 0 0 0 . . . o(1)-Fraction non-zeros 0
Main Tool: Uncertainty Principle Heisenberg Time Frequency
1 1 1 1 + + + + - - - - c log n 2 log3 n 2 = O( + d log d + d ) Fast Johnson-Lindenstrauss Transform (FJLT) d d d Discrete Fourier Transform 0 N(0,1) d . . . Optimal ??
Data-Powered Algorithms
theory experimentation
theory experimentation computation
theory experimentation 1950... computation
input output Most interesting problems are too hard !!
input output So, we change the model… randomization approximation
input output PTAS for ETSP randomization approximation
input output Impossible to approximate chromatic number within a factor of… randomization approximation
input output Berkeley “school” (program checking & probabilistic proofs) randomization Property Testing [RS’96, GGR’96] approximation
Distance is 4 edit distance
no bipartite yes
no anything bipartite yes [GR’97]
Mixingcase 18 17 7 62 bipartite! non-bipartite! polylog cycles Birthday paradox
Non-mixingcase Nonmixing implies small cuts [M’89]
Dense graphs Hofstadter. Godel, Escher, Bach. Is graph k-colorable? [GGR98, AK99]
Main tool Szemerédi’s Regularity Lemma Far from k-colorable Lots of witnesses
Property Testing http://www.cs.princeton.edu/~chazelle/ • Graph algorithms • connectivity • acyclicity • k-way cuts • clique • Distributions • independence • entropy • monotonicity • distances • Geometry • convexity • disjointness • delaunay • planeEMST