110 likes | 198 Views
Slides are available at http://grigory.us/big-data.html. Sublinear Algorihms for Big Data. Lecture 4. Grigory Yaroslavtsev http://grigory.us. Today. Dimensionality reduction AMS as dimensionality reduction Johnson- Lindenstrauss transform Sublinear time algorithms
E N D
Slides are available at http://grigory.us/big-data.html SublinearAlgorihms for Big Data Lecture 4 GrigoryYaroslavtsev http://grigory.us
Today • Dimensionality reduction • AMS as dimensionality reduction • Johnson-Lindenstrauss transform • Sublinear time algorithms • Definitions: approximation, property testing • Basic examples: approximating diameter, testing properties of images • Testing sortedness • Testing connectedness
-norm Estimation • Stream: updates that define vector where . • Example: For • -norm:
-norm Estimation • -norm: • Two lectures ago: • -moment • -moment (via AMS sketching) • Space: • Technique: linear sketches • for random set s • for random signs
AMS as dimensionality reduction • Maintain a “linear sketch” vector , where • Estimator for : • “Dimensionality reduction”: “heavy” tail
Normal Distribution • Normal distribution • Range: • Density: • Mean = 0, Variance = 1 • Basic facts: • If and are independent r.v. with normal distribution then has normal distribution • If are independent, then
Johnson-Lindenstrauss Transform • Instead of let be i.i.d. random variables from normal distribution • We still have because: • “variance of “ • Define and define: • JL Lemma: There exists s.t. for small enough
Proof of JL Lemma • JL Lemma: s.t. for small enough • Assume . • We have and • Alternative form of JL Lemma:
Proof of JL Lemma • Alternative form of JL Lemma: • Let and • For every we have: • By Markov and independence of : • We have , hence:
Proof of JL Lemma • Alternative form of JL Lemma: • For every we have: • Let and recall that • A calculation finishes the proof:
Johnson-Lindenstrauss Transform • Single vector: • Tight: [Woodruff’10] • vectors simultaneously: • Tight: [Molinaro, Woodruff, Y. ’13] • Distances between vectors = vectors: