Sublinear Algorihms for Big Data

Slides are available at http://grigory.us/big-data.html SublinearAlgorihms for Big Data Lecture 4 GrigoryYaroslavtsev http://grigory.us

Today • Dimensionality reduction • AMS as dimensionality reduction • Johnson-Lindenstrauss transform • Sublinear time algorithms • Definitions: approximation, property testing • Basic examples: approximating diameter, testing properties of images • Testing sortedness • Testing connectedness

-norm Estimation • Stream: updates that define vector where . • Example: For • -norm:

-norm Estimation • -norm: • Two lectures ago: • -moment • -moment (via AMS sketching) • Space: • Technique: linear sketches • for random set s • for random signs

AMS as dimensionality reduction • Maintain a “linear sketch” vector , where • Estimator for : • “Dimensionality reduction”: “heavy” tail

Normal Distribution • Normal distribution • Range: • Density: • Mean = 0, Variance = 1 • Basic facts: • If and are independent r.v. with normal distribution then has normal distribution • If are independent, then

Johnson-Lindenstrauss Transform • Instead of let be i.i.d. random variables from normal distribution • We still have because: • “variance of “ • Define and define: • JL Lemma: There exists s.t. for small enough

Proof of JL Lemma • JL Lemma: s.t. for small enough • Assume . • We have and • Alternative form of JL Lemma:

Proof of JL Lemma • Alternative form of JL Lemma: • Let and • For every we have: • By Markov and independence of : • We have , hence:

Proof of JL Lemma • Alternative form of JL Lemma: • For every we have: • Let and recall that • A calculation finishes the proof:

Johnson-Lindenstrauss Transform • Single vector: • Tight: [Woodruff’10] • vectors simultaneously: • Tight: [Molinaro, Woodruff, Y. ’13] • Distances between vectors = vectors:

Sublinear Algorihms for Big Data