1 / 15

Lecture 17: Sublinear-time algorithms

This lecture discusses sublinear-time algorithms and their applications, including embeddings, distribution testing, and approximation. It covers the concept of uniformity testing and introduces the UNIFORM algorithm.

ollis
Download Presentation

Lecture 17: Sublinear-time algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 17:Sublinear-time algorithms COMS E6998-9F15

  2. Administrivia, Plan • Admin: • My office hours after class (CSB517) • Plan: • Finalize embeddings • Sublinear-time algorithms • Projects • Scriber?

  3. Embeddings of various metrics into edit( banana , ananas ) = 2 edit(1234567, 7123456) = 2

  4. Non-embeddabilityinto Distortion implies sketch (decision version) with approximation and size! (implies NNS) OPEN to get better for pretty much all these distances!

  5. Sublinear-time algorithms

  6. Setup • Can we get away with not even looking at all data? • Just use a sample… • Where do we get samples? • stored on disk, passing through a router, etc • Data comes as a sample • Observation of a “natural” phenomenon 2 5 7 5 5

  7. Two types of algorithms • Classic: • Output an answer, approximately • E.g.: number of triangles in a graph! • Property testing: • Does object have property or not • E.g.: does graph have a triangle or not • Distribution testing: =distribution • Need a new notion of approximation

  8. Distribution Testing • Problem: • given samples , from • do they come from a uniform distribution? • Hard to solve precisely: • Uniform except 6 has probability higher than normal • Do we care about ? • Use approximation… 1 2 3 4 5 6 7 8

  9. Approximation: total variation • Goal: distinguish between • exactly uniform • sufficiently non-uniform: • -far: • Why distance? • Equivalent to Total Variation distance: • How to distinguish distributions with 1 sample? • a test: is a set • Check whether a sample • Distinguishing probability: • We want the best such test: • Claim: • means: • sampling up to ~ times nearly-equivalent to sampling from a uniform distribution

  10. Algorithm attempt • How shall we test uniformity? • Estimate distribution empirically, • Compute … • How many samples do we need? • At least : if half the coordinates are zero, far from uniform! • test: also samples • Can we do better? • Theorem: can test uniformity with samples

  11. Algorithm for Uniformity Algorithm UNIFORM: Input: = 0; for(i=0; i<m; i++) for(j=i+1; j<m; j++) if () ++; if () return “Uniform”; else return “Not uniform”; // : constant dependent on • Counts the number of collisions • Intuition: • If not uniform, more likely to have more collisions 1 2 3 4 5 6 7 8

  12. Algorithm intuition Algorithm UNIFORM: Input: = 0; for(i=0; i<m; i++) for(j=i+1; j<m; j++) if () ++; if () return “Uniform”; else return “Not uniform”; // : constant dependent on • Uses samples • as long as all distinct, no way to tell apart • first collisions appear at - the birthday paradox! 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8

  13. Analysis Algorithm UNIFORM: Input: = 0; for(i=0; i<m; i++) for(j=i+1; j<m; j++) if () ++; if () return “Uniform”; else return “Not uniform”; // : constant dependent on • Consider distance! • If • If • Claim: • Hence, enough to distinguish: • (unif) • (non-unif) • Compute up to additive ?

  14. Analysis Algorithm UNIFORM: Input: = 0; for(i=0; i<m; i++) for(j=i+1; j<m; j++) if () ++; if () return “Uniform”; else return “Not uniform”; // : constant dependent on • New goal: distinguish • Lemma: [# collisions] is a good enough as long as • where

  15. Projects

More Related