1 / 18

Lecture 7: Dynamic sampling Dimension Reduction

Lecture 7: Dynamic sampling Dimension Reduction. Plan. Admin: PSet 2 released later today, due next Wed Alex office hours: Tue 2:30-4:30 Plan: Dynamic streaming graph algorithms S2 : Dimension Reduction & Sketching Scriber?. Sub-Problem: dynamic sampling.

hruska
Download Presentation

Lecture 7: Dynamic sampling Dimension Reduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 7:Dynamic samplingDimension Reduction

  2. Plan • Admin: • PSet 2 released later today, due next Wed • Alex office hours: Tue 2:30-4:30 • Plan: • Dynamic streaming graph algorithms • S2: Dimension Reduction & Sketching • Scriber?

  3. Sub-Problem: dynamic sampling • Stream: general updates to a vector • Goal: • Output with probability

  4. Dynamic Sampling • Goal: output with probability • Let • Intuition: • Suppose • How can we sample with ? • Each is a -heavy hitter • Use CountSketch recover all of them • space total • Suppose • Downsample: pick a random set s.t. • Focus on substream on only (ignore the rest) • What’s ? • In expectation = 10 • Use CountSketch on the downsampled stream … • In general: prepare for all levels

  5. Basic Sketch • Hash function • Let #tail zeros in • for and • Partition stream into substreams • Substreamfocuses on elements with • Sketch: for each , • Store : CountSketch for • Store : distinct count sketch for approx= • would be sufficient here! • Both for success probability

  6. Estimation AlgorithmDynSampleBasic: Initialize: hash function # tail zeros in CountSketch sketches , DistinctCount sketches , Process(int, real ): Let Add to and Estimator: Let j be s.t. If no such j, FAIL random heavy hitter from Return • Find a substream s.t. output • If no such stream, then FAIL • Recover all with (using ) • Pick any of them at random

  7. Analysis AlgorithmDynSampleBasic: Initialize: hash function # tail zeros in CountSketch sketches , DistinctCount sketches , Process(int, real ): Let Add to and Estimator: Let j be s.t. If no such j, FAIL random heavy hitter from Return • If • then for some • Suppose • Let be such that • Chebyshev: deviates from expectation by with probability at most • Ie., probability of FAIL is at most 0.7

  8. Analysis (cont) AlgorithmDynSampleBasic: Initialize: hash function # tail zeros in CountSketch sketches , DistinctCount sketches , Process(int, real ): Let Add to and Estimator: Let j be s.t. If no such j, FAIL random heavy hitter from Return • Let with • All heavy hitters = • will recover a heavy hitter, i.e., • By symmetry, once we output some , it is random over • Randomness? • We just used Chebyshev pairwise is OK !

  9. Dynamic Sampling: overall • DynSampleBasic guarantee: • FAIL: with probability • Otherwise, output a random • Modulo a negligible probability of failing • Reduce FAIL probability? • DynSample-Full: • Take independent DynSampleBasic • Will not FAIL in at least one with probability at least • Space: words for: • repetitions • substreams • for each

  10. Back to Dynamic Graphs • Graph with edges inserted/deleted • Define node-edge incidence vectors: • For node , we have vector: • where • For : if edge exists • For : if edge exists • Idea: • Use Dynamic-Sample-Full to sample an edge from each vertex • Collapse edges • How to iterate? • Property: • For a set of nodes • Consider: • Claim: has non-zero in coordinate iff edge crosses from to outside (i.e., ) • Sketch enough for: for any set , can sample an edge from !

  11. Dynamic Connectivity where for : if for : if • Sketching algorithm: • Dynamic-Sample-Full for each • Check connectivity: • Sample an edge from each node • Contract all sampled edges • partitioned the graph into a bunch of components (each is connected) • Iterate on the components • How many iterations? • – each time we reduce the number of components by a factor • Issue: iterations not independent! • Can use a fresh Dynamic-Sampling-Full for each of the iterations

  12. A little history • [Ahn-Guha-McGregor’12]: the above streaming algorithm • Overall space • [Kapron-King-Mountjoy’13]: • Data structure for maintaining graph connectivity under edge inserts/deletes • First algorithm with time for update/connectivity ! • Open since ‘80s

  13. Section 2: Dimension Reduction & Sketching

  14. Why? Application: Nearest Neighbor Search in high dimensions Preprocess: a set of points Query: given a query point , report a point with the smallest distance to

  15. Motivation 000000 011100 010100 000100 010100 011111 000000 001100 000100 000100 110100 111111 • Generic setup: • Points model objects (e.g. images) • Distance models (dis)similarity measure • Application areas: • machine learning: k-NN rule • speech/image/video/music recognition, vector quantization, bioinformatics, etc… • Distance can be: • Euclidean, Hamming

  16. Low-dimensional: easy • Compute Voronoi diagram • Given query , perform point location • Performance: • Space: • Query time:

  17. High-dimensional case All exact algorithms degrade rapidly with the dimension

  18. Dimension Reduction • Reduce high dimension?! • “flatten” dimension into dimension • Not possible in general: packing bound • But can if: for a fixedsubset of

More Related