1 / 14

Testing Collections of Properties

This work explores testing the closeness, independence, and other properties of distributions such as shopping trends in different locations. Sample complexity and pass/fail criteria for various tests are discussed, including uniformity, identity, and support size. The study delves into the concept of clustering distributions based on their properties and poses open questions about additional properties and testing methods.

bpennington
Download Presentation

Testing Collections of Properties

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Testing Collections of Properties Reut Levi Dana Ron Ronitt Rubinfeld ICS 2011

  2. Shopping distribution What properties do your distributions have?

  3. Testing closeness of two distributions: Transactions in California Transactions in New York trend change?

  4. Testing Independence: Shopping patterns: Independent of zip code?

  5. This work: Many distributions

  6. One distribution: D • D is arbitrary black-box distribution over [n],generates iid samples. • Sample complexity in terms of n? (can it be sublinear?) samples Test Pass/Fail?

  7. Some answers… • Uniformity(n1/2)[Goldreich, Ron 00] [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] [Paninski 08] • Identity (n1/2) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] • Closeness (n2/3) [Batu, Fortnow, Rubinfeld, Smith, White], [Valiant 08] • Independence O(n12/3 n21/3), (n12/3 n21/3) [Batu, Fortnow, Fischer, Kumar, Rubinfeld, White 01] , this work • Entropy n1/β^2+o(1)[Batu, Dasgupta, Kumar, Rubinfeld 05], [Valiant 08] • Support Size (n/logn)[Raskhodnikova, Ron, Shpilka, Smith 09], [Valiant, Valiant 10] • Monotonicity on total order (n1/2)[Batu, Kumar, Rubinfeld 04] • Monotonicity on posetn1-o(1)[Bhattacharyya, Fischer, Rubinfeld, Valiant 10]

  8. Collection of distributions: Further refinement: Known or unknown distribution on i’s? D1 D2 Dm • Two models: • Sampling model: • Get (i,x) for random i, xDi • Query model: • Get (i,x)for query i and xDi • Sample complexity in terms of n,m? … samples Test Pass/Fail?

  9. Properties considered: • Equivalence • All distributions are equal • ``Clusterability’’ • Distributions can be clustered into k clusters such that within a cluster, all distributions are close

  10. Equivalence vs. independence • Process of drawing pairs: • Draw i [m], x  Di output (i,x) • Easy fact: (i,x) independent iff Di‘s are equal

  11. Also yields “tight” lower bound for independence testing Results Def:(D1,…Dm) has the Equivalence property if Di = Di' for all 1 ≤ i, i’ ≤ m.

  12. Clusterability • Can we cluster distributions s.t. in each cluster, distributions (very) close? • Sample complexity of test is • O(kn2/3) for n = domain size, k = number of clusters • No dependence on number of distributions • Closeness requirement is very stringent

  13. Open Questions • Clusterability in the sampling model, less stringent notion of close • Other properties of collections? • E.g., all distributions are shifts of each other?

  14. Thank you

More Related