140 likes | 153 Views
This work explores testing the closeness, independence, and other properties of distributions such as shopping trends in different locations. Sample complexity and pass/fail criteria for various tests are discussed, including uniformity, identity, and support size. The study delves into the concept of clustering distributions based on their properties and poses open questions about additional properties and testing methods.
E N D
Testing Collections of Properties Reut Levi Dana Ron Ronitt Rubinfeld ICS 2011
Shopping distribution What properties do your distributions have?
Testing closeness of two distributions: Transactions in California Transactions in New York trend change?
Testing Independence: Shopping patterns: Independent of zip code?
One distribution: D • D is arbitrary black-box distribution over [n],generates iid samples. • Sample complexity in terms of n? (can it be sublinear?) samples Test Pass/Fail?
Some answers… • Uniformity(n1/2)[Goldreich, Ron 00] [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] [Paninski 08] • Identity (n1/2) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] • Closeness (n2/3) [Batu, Fortnow, Rubinfeld, Smith, White], [Valiant 08] • Independence O(n12/3 n21/3), (n12/3 n21/3) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] , this work • Entropy n1/β^2+o(1)[Batu, Dasgupta, Kumar, Rubinfeld 05], [Valiant 08] • Support Size (n/logn)[Raskhodnikova, Ron, Shpilka, Smith 09], [Valiant, Valiant 10] • Monotonicity on total order (n1/2)[Batu, Kumar, Rubinfeld 04] • Monotonicity on posetn1-o(1)[Bhattacharyya, Fischer, Rubinfeld, Valiant 10]
Collection of distributions: Further refinement: Known or unknown distribution on i’s? D1 D2 Dm • Two models: • Sampling model: • Get (i,x) for random i, xDi • Query model: • Get (i,x)for query i and xDi • Sample complexity in terms of n,m? … samples Test Pass/Fail?
Properties considered: • Equivalence • All distributions are equal • ``Clusterability’’ • Distributions can be clustered into k clusters such that within a cluster, all distributions are close
Equivalence vs. independence • Process of drawing pairs: • Draw i [m], x Di output (i,x) • Easy fact: (i,x) independent iff Di‘s are equal
Also yields “tight” lower bound for independence testing Results Def:(D1,…Dm) has the Equivalence property if Di = Di' for all 1 ≤ i, i’ ≤ m.
Clusterability • Can we cluster distributions s.t. in each cluster, distributions (very) close? • Sample complexity of test is • O(kn2/3) for n = domain size, k = number of clusters • No dependence on number of distributions • Closeness requirement is very stringent
Open Questions • Clusterability in the sampling model, less stringent notion of close • Other properties of collections? • E.g., all distributions are shifts of each other?