1 / 50

Big Data

Big Data. Lecture 5: Estimating the second moment, dimension reduction, applications . The second moment. A , B , A ,C, D , D , A , A , E , B , E , E ,F,…. The second moment: . Alon , Matias , Szegedy 96. Gödel Prize 2005. Maintain: . Alon , Matias , Szegedy 96. Gödel Prize 2005.

boyd
Download Presentation

Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Big Data Lecture 5: Estimating the second moment, dimension reduction, applications

  2. The second moment A,B,A,C,D,D,A,A,E,B,E,E,F,… The second moment:

  3. Alon, Matias, Szegedy 96 Gödel Prize 2005 Maintain:

  4. Alon, Matias, Szegedy 96 Gödel Prize 2005 Maintain:

  5. AMS Analysis

  6. 2-wise independent hash family Suppose h : [d]  [T] Fix 2 values t1 and t2 in the range of h Fix 2 values x1x2 in the domain of h What is the probability that h(x1) = t1 and h(x2) = t2 ? t1 x1 ? x2 t2

  7. 2-wise independent hash family H, a family of hash functions h, is 2-wise independent iff  x1x2 t1 t2 PrhH (h(x1) = t1 and h(x2) = t2) = 1/|T|2 t1 x1 ? x2 t2

  8. 2-wise independent hash family H={(ax+b) mod T | 0  a,b < T} is 2-wise independent if T is a prime > d H={2((ax+b) mod T mod 2) - 1| 0  a,b < T} is approximately 2-wise independent from [d] to {-1,1} We can get an exact 2-wise ind. by more complicated constructions

  9. Draw h from 2-wise ind. family Z2 is an unbiased estimator for F2 !

  10. What is the variance of Z2 ? Here we will assume that h is drawn from a 4-wise inde. family H

  11. What is the variance of Z2 ?

  12. Chebyshev’s Inequality  

  13. Chebyshev’s Inequality If  is small this is meaningless… We need to reduce the variance How ?

  14. Averaging Draw k ind. hash functions h1, h2, …. , hk Use

  15. Chebyshev’s Inequality Pick

  16. Boosting the confidence – Chernoff bounds Pick 1/4 1/4

  17. Boosting the confidence – Chernoff bounds Now repeat the experiment s = O(log(1/)) times We get A1,…..,As (assume they are sorted) Return their median Why is this good ?

  18. Boosting the confidence – Chernoff bounds Each of A1,…..,As is bad ((1  ) far from F2) with probability ≤ ¼ For the median to be bad we need more than ½ of A1,…..,As to be bad (remove the pair consisting of the largest and smallest and repeat... If both components of some pair are good then median is good…) A1, A2 , ……. ,As-1,As

  19. Boosting the confidence – Chernoff bounds What is the probability that more than ½ are bad ? Chernoff: Let X = X1 + …..+ Xs where each Xi is Bernoulli with p = ¼ then  s = O(log(1/)) with a large enough constant

  20. Recap =

  21. This is a random projection.. = Preserve distances in the sense:

  22. Make it look more familiar.. = Preserve distances in the sense:

  23. Dimension reduction (A random orthonormal k  d) = We project into a random k-dim. subspace

  24. Dimension reduction (A random orthonormal k  d) = We project into a random k-dim. subspace JL: ε[0,1]

  25. Dimension reduction (A random orthonormal k  d) = We project into a random k-dim. subspace JL: ε[0,1]

  26. Johnson-Lindenstrauss JL: Project the vectors x1,….,xn into a random k-dimensional subspace for k=O(log(n)/2) then with probability 1-1/nc :

  27. The proof (A random orthonormal k  d) = Obs1: Its enough to prove for vectors such that ||x||2=1 JL:

  28. The proof (A random orthonormal k  d) = Obs1: Its enough to prove for vectors such that ||x||2=1 JL:

  29. The proof (A random orthonormal k  d) = Obs2: Instead of projecting into a random k-dim subspace, look at the first k coordinates of a random unit vector JL:

  30. The proof Random unit vec = Obs2: Instead of projecting into a random k-dim subspace, look at the first k coordinates of a random unit vector JL:

  31. The case k=1 Random unit vec = Obs2: Instead of projecting into a random k-dim subspace, look at the first k coordinates of a random unit vector JL:

  32. The case k=1 Random unit vec = JL:

  33. The case k=1 1 ε[0,1]

  34. An application: approximate period m 10,3,20,1,10,3,18,1,11,5,20,2,12,1,19,1,......... Find r such that is minimized

  35. An application, approximate period 10,3,20,1,10,3,18,1,11,5,20,2,12,1,19,1,......... Find r such that is minimized

  36. An application, approximate period 10,3,20,1,10,3,18,1,11,5,20,2,12,1,19,1,......... Find r such that is minimized

  37. An exact algorithm Find r such that For each value of r takes linear time  O(m2) is minimized

  38. An exact algorithm Find r such that For each value of r takes linear time  O(m2) is minimized We can sketch/project all windows of length r and compare the sketches … but O(m2k) just for sketching…

  39. Obs1: We can sketch faster.. B h A A running inner-product with a unit vector This is similar to a convolution of two vectors

  40. Convolution 4 5 0 2 1 3 3 1 2 0

  41. Convolution 4 5 0 2 1 3 3 1 2 0

  42. Convolution 4 5 0 2 1 3 3 1 2 0

  43. Convolution 4 5 0 2 1 3 3 1 2 0

  44. Convolution 4 5 0 2 1 3 3 1 2 0 We can compute the convolution in O(mlog(r)) time using the FFT

  45. Obs1: We can sketch faster h We can compute the first coordinate of all sketches in O(mlog(r)) time  We can sketch all positions in O(mlog(r)k) But we still have many possible values for r…

  46. Obs2: Sketch only in powers of 2 We compute all sketches in O(log(m)mlog(r)k)

  47. When r is not a power of 2 ? z x y S(x) S(y) Use S(x) + S(y) as S(z)

  48. The algorithm z x y S(x) S(y) Compute sketches in powers of 2 in O(log(m)mlog(r)k) time For a fixed r we can approximate in O((m/r)*k) time Summing over r we get O(mlog(m) * k)

  49. The algorithm z x y S(x) S(y) Total running time is O(mlog3m)

  50. Bibliography • Noga Alon, YossiMatias, Mario Szegedy: The Space Complexity of Approximating the Frequency Moments. J. Comput. Syst. Sci. 58(1) (1999), 137-147 • W. B. Johnson and J. Lindenstrauss, Extensions of Lipschitz maps into a Hilbert space, Contemp Math 26 (1984), 189–206. • JiríMatousek: On variants of the Johnson-Lindenstrauss lemma. Random Struct. Algorithms 33(2): 142-156 (2008) • PiotrIndyk, Nick Koudas, S. Muthukrishnan: Identifying Representative Trends in Massive Time Series Data Sets Using Sketches. VLDB 2000: 363-372

More Related