610 likes | 683 Views
So Much Data. So Little Time. Bernard Chazelle Princeton University. So Many Slides. (before lunch). So Little Time. Bernard Chazelle Princeton University. math. algorithms. experimentation. 2006. computation.
E N D
So Much Data So Little Time Bernard Chazelle Princeton University
So Many Slides (before lunch) So Little Time Bernard Chazelle Princeton University
math algorithms experimentation 2006 computation
2. End of Moore’s Law 2020 party’s over !
algorithms experimentation computation
This is not me 32 x 17 224 32 = 544
FFT RSA
The Era of the Algorithm
unevenly priced noisy Data big uncertain low entropy
unevenly priced noisy Data big uncertain low entropy
Sloan Digital Sky Survey 4 petabytes (~1MG) 10 petabytes/yr Biomedical imaging 150 petabytes/yr
My A(9,9)-th paper Collected works of Micha Sharir
massive input output Sample tiny fraction Sublinear Algorithms
Shortest Paths [C-Liu-Magen ’03] New York Delphi
Ray Shooting Optimal! Volume Intersection Point location
Approximate MST [C-Rubinfeld-Trevisan ’01] Optimal!
E = no. connected components 2 var << (no. connected components) is a good estimator whp, of # connected components
input space worst case average case (uniform)
“ OK, if you elect NOT to have the surgery, the insurance company offers 6 days and 7 nights in Barbados. “
arbitrary, unknown random source Self-Improving Algorithms
0110101100101000101001001010100010101001 time T1 time T2 time T3 time T4 E Tk Optimal expected time for random source
Clustering [ Ailon-C-Liu-Comandur ’05 ] K-median over Hamming cube
minimize sum of distances NP-hard
[ Kumar-Sabharwal-Sen ’04 ] ) ( 1 + COST OPT
How to achieve linear limiting time? dn Input space {0,1} Identify core Use KSS prob < O(dn)/KSS Tail:
Store sample of precomputed KSS Nearest neighbor Incremental algorithm
011010110110101010110010101010110100111001101010010100010 011010110***110101010110010101010***10011100**10010***010 Bring in da noise !
011010110110101010110010101010110100111001101010010100010 011010110***110101010110010101010***10011100**10010***010 encode
011010110110101010110010101010110100111001101010010100010 011010110***110101010110010101010***10011100**10010***010 decode
011010110110101010110010101010110100111001101010010100010 error correcting codes
011010110110101010110010101010110100111001101010010100010 Data inaccessible before noise What makes you think it’s wrong?
011010110110101010110010101010110100111001101010010100010 Data inaccessible before noise must satisfy some property (eg, convex, bipartite) but does not quite
f(x) = ? f =access function x data f(x)
f(x) = ? f =access function x f(x)
f(x) = ? x f(x) But life being what it is…
f(x) = ? x f(x)
Humans Define distance from any object to data class
no undo f(x) = ? filter x x1, x2,… g(x) f(x1), f(x2),… g is access function for:
Online Data Reconstruction early decisions are crucial !
d Monotone function: [n] R Filter requires polylog (n) lookups [ Ailon-C-Liu-Comandur ’04 ]