1 / 42

Data-Powered Algorithms

Data-Powered Algorithms. Bernard Chazelle Princeton University. Tools. Linear Programming. N constraints and d variables. N constraints and d variables. Dimension Reduction.  25.  10000. Images (face recognition) Signals (voice recognition) Text (NLP) . . .

mihaly
Download Presentation

Data-Powered Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data-Powered Algorithms Bernard Chazelle Princeton University

  2. Tools

  3. Linear Programming

  4. N constraints and d variables

  5. N constraints and d variables

  6. Dimension Reduction 25 10000 Images (face recognition) Signals (voice recognition) Text (NLP) . . . Nearest neighbor searching Clustering . . .

  7. Dimension reduction All pairwise distances nearly preserved

  8. Johnson-Lindenstrauss Transform (JLT) d v Random Orthogonal Matrix c log n 2 d

  9. Friendly JLT d N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) c log n 2 N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1) N(0,1)

  10. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + + + + + + + + + + + + + + + + - - - - - - - - - - - - - - - - Friendlier JLT d c log n 2 d log n 2 = ( )

  11. 1 1 1 1 1 1 1 + + + + + + + - - - - - - - Sparse JLT ? d 0 . . . 0 0 0 0 0 0 c log n 2 1 d 0 0 0 0 0 . . . o(1)-Fraction non-zeros 0

  12. Main Tool: Uncertainty Principle Heisenberg Time Frequency

  13. 1 1 1 1 + + + + - - - - c log n 2 log3 n 2 = O( + d log d + d ) Fast Johnson-Lindenstrauss Transform (FJLT) d d d Discrete Fourier Transform 0 N(0,1) d . . . Optimal ??

  14. Data-Powered Algorithms

  15. theory experimentation

  16. theory experimentation computation

  17. theory experimentation 1950... computation

  18. input output Most interesting problems are too hard !!

  19. input output So, we change the model… randomization approximation

  20. input output PTAS for ETSP randomization approximation

  21. input output Impossible to approximate chromatic number within a factor of… randomization approximation

  22. input output Berkeley “school” (program checking & probabilistic proofs) randomization Property Testing [RS’96, GGR’96] approximation

  23. Property Testing

  24. Distance is 3

  25. Distance is 4 edit distance

  26. no bipartite yes

  27. no anything bipartite yes [GR’97]

  28. Mixingcase 18 17 7 62 bipartite! non-bipartite! polylog cycles Birthday paradox

  29. Non-mixingcase Nonmixing implies small cuts [M’89]

  30. Dense graphs Hofstadter. Godel, Escher, Bach. Is graph k-colorable? [GGR98, AK99]

  31. Main tool Szemerédi’s Regularity Lemma Far from k-colorable Lots of witnesses

  32. Property Testing http://www.cs.princeton.edu/~chazelle/ • Graph algorithms • connectivity • acyclicity • k-way cuts • clique • Distributions • independence • entropy • monotonicity • distances • Geometry • convexity • disjointness • delaunay • planeEMST

More Related