Lecture 17: Sublinear-time algorithms

Lecture 17:Sublinear-time algorithms COMS E6998-9F15

Administrivia, Plan • Admin: • My office hours after class (CSB517) • Plan: • Finalize embeddings • Sublinear-time algorithms • Projects • Scriber?

Embeddings of various metrics into edit( banana , ananas ) = 2 edit(1234567, 7123456) = 2

Non-embeddabilityinto Distortion implies sketch (decision version) with approximation and size! (implies NNS) OPEN to get better for pretty much all these distances!

Sublinear-time algorithms

Setup • Can we get away with not even looking at all data? • Just use a sample… • Where do we get samples? • stored on disk, passing through a router, etc • Data comes as a sample • Observation of a “natural” phenomenon 2 5 7 5 5

Two types of algorithms • Classic: • Output an answer, approximately • E.g.: number of triangles in a graph! • Property testing: • Does object have property or not • E.g.: does graph have a triangle or not • Distribution testing: =distribution • Need a new notion of approximation

Distribution Testing • Problem: • given samples , from • do they come from a uniform distribution? • Hard to solve precisely: • Uniform except 6 has probability higher than normal • Do we care about ? • Use approximation… 1 2 3 4 5 6 7 8

Approximation: total variation • Goal: distinguish between • exactly uniform • sufficiently non-uniform: • -far: • Why distance? • Equivalent to Total Variation distance: • How to distinguish distributions with 1 sample? • a test: is a set • Check whether a sample • Distinguishing probability: • We want the best such test: • Claim: • means: • sampling up to ~ times nearly-equivalent to sampling from a uniform distribution

Algorithm attempt • How shall we test uniformity? • Estimate distribution empirically, • Compute … • How many samples do we need? • At least : if half the coordinates are zero, far from uniform! • test: also samples • Can we do better? • Theorem: can test uniformity with samples

Algorithm for Uniformity Algorithm UNIFORM: Input: = 0; for(i=0; i<m; i++) for(j=i+1; j<m; j++) if () ++; if () return “Uniform”; else return “Not uniform”; // : constant dependent on • Counts the number of collisions • Intuition: • If not uniform, more likely to have more collisions 1 2 3 4 5 6 7 8

Algorithm intuition Algorithm UNIFORM: Input: = 0; for(i=0; i<m; i++) for(j=i+1; j<m; j++) if () ++; if () return “Uniform”; else return “Not uniform”; // : constant dependent on • Uses samples • as long as all distinct, no way to tell apart • first collisions appear at - the birthday paradox! 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8

Analysis Algorithm UNIFORM: Input: = 0; for(i=0; i<m; i++) for(j=i+1; j<m; j++) if () ++; if () return “Uniform”; else return “Not uniform”; // : constant dependent on • Consider distance! • If • If • Claim: • Hence, enough to distinguish: • (unif) • (non-unif) • Compute up to additive ?

Analysis Algorithm UNIFORM: Input: = 0; for(i=0; i<m; i++) for(j=i+1; j<m; j++) if () ++; if () return “Uniform”; else return “Not uniform”; // : constant dependent on • New goal: distinguish • Lemma: [# collisions] is a good enough as long as • where

Projects

Lecture 17: Sublinear-time algorithms

Lecture 17: Sublinear-time algorithms

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7