1 / 25

Sublinear Algorihms for Big Data

Sublinear Algorihms for Big Data. Lecture 3. Grigory Yaroslavtsev http://grigory.us. SOFSEM 2015. URL: http://www.sofsem.cz 41 st International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM’15) When and where?

lyris
Download Presentation

Sublinear Algorihms for Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SublinearAlgorihms for Big Data Lecture 3 GrigoryYaroslavtsev http://grigory.us

  2. SOFSEM 2015 • URL: http://www.sofsem.cz • 41st International Conference on Current Trends in Theory and Practice of Computer Science (SOFSEM’15) • When and where? • January 24-29, 2015. Czech Republic, Pec pod Snezkou • Deadlines: • August 1st (tomorrow!): Abstract submission • August 15th: Full papers (proceedings in LNCS) • I am on the Program Committee ;)

  3. Today • Count-Min (continued), Count Sketch • Sampling methods • -sampling • -sampling • Sparse recovery • -sparse recovery • Graph sketching

  4. Recap • Stream: elements from universe , e.g. • = frequency of in the stream = # of occurrences of value

  5. Count-Min • are -wise independent hash functions • Maintain counters with values: elements in the stream with • For every the value and so: • If and then: .

  6. More about Count-Min • Authors: Graham Cormode, S. Muthukrishnan [LATIN’04] • Count-Min is linear: Count-Min(S1 + S2) = Count-Min(S1) + Count-Min(S2) • Deterministic version: CR-Precis • Count-Min vs. Bloom filters • Allows to approximate values, not just 0/1 (set membership) • Doesn’t require mutual independence (only 2-wise) • FAQ and Applications: • https://sites.google.com/site/countminsketch/home/ • https://sites.google.com/site/countminsketch/home/faq

  7. Fully Dynamic Streams • Stream: updates that define vector where . • Example: For • Count Sketch: Count-Min with random signs and median instead of min:

  8. Count Sketch • In addition to use random signs • Estimate: • Parameters:

  9. -Sampling • Stream: updates that define vector where . • -Sampling: Return random and :

  10. Application: Social Networks • Each of people in a social network is friends with some arbitrary set of other people • Each person knows only about their friends • With no communication in the network, each person sends a postcard to Mark Z. • If Mark wants to know if the graph is connected, how long should the postcards be?

  11. Optimal estimation • Yesterday: -approximate • space for • space for • New algorithm: Let be an -sample. Return , where is an estimate of • Expectation:

  12. Optimal estimation • New algorithm: Let be an -sample. Return , where is an estimate of • Variance: • Exercise: Show that • Overall: • Apply average + median to copies

  13. -Sampling: Basic Overview • Assume Weight by where where • For some value return if there is a unique such that • Probability is returned if is large enough: • Probability some value is returned repeat times.

  14. -Sampling: Part 1 • Use Count-Sketch with parameters to sketch • To estimate and • Lemma: With high probability if • Corollary: With high probability if and • Exercise: for large

  15. Proof of Lemma • Let • By the analysis of Count Sketch and by Markov: • If then • If , then where the last inequality holds with probability • Take median over repetitions high probability

  16. -Sampling: Part 2 • Let if and otherwise • If there is a unique with then return • Note that if then and so , therefore • Lemma: With probability there is a unique such that . If so then • Thm: Repeat times. Space:

  17. Proof of Lemma • Let We can upper-bound Similarly, • Using independence of , probability of unique with :

  18. Proof of Lemma • Let We can upper-bound Similarly, • We just showed: • If there is a unique i, probability is:

  19. -sampling • Maintain and -approximation to • Hash items using for • For each , maintain: • Lemma: At level there is a unique element in the streams that maps to (with constant probability) • Uniqueness is verified if . If so, then output as the index and as the count.

  20. Proof of Lemma • Let and note that • For any • Probability there exists a unique such that , • Holds even if are only 2-wise independent

  21. Sparse Recovery • Goal: Find such that is minimized among s with at most non-zero entries. • Definition: • Exercise: where are indices of largest • Using space we can find such that and

  22. Count-Min Revisited • Use Count-Min with • For let for some row • Let be the indices with max. frequencies. Let be the event there doesn’t exist with • Then for • Because w.h.p. all ’s approx . up to

  23. Sparse Recovery Algorithm • Use Count-Min with • Let be frequency estimates: • Let be with all but the k-th largest entries replaced by 0. • Lemma:

  24. Let be indices corresponding to largest value of and . • For a vector and denote as the vector formed by zeroing out all entries of except for those in . + 2 + 2

  25. Thank you! • Questions?

More Related