460 likes | 598 Views
Learning and testing k-modal distributions. Rocco A. Servedio Columbia University. Joint work (in progress) with. Costis Daskalakis MIT. Ilias Diakonikolas UC Berkeley. What this talk is about. Probability distributions over [N] = {1,2,…,N}. N. 1. 2.
E N D
Learning and testing k-modal distributions Rocco A. Servedio Columbia University Joint work (in progress) with Costis Daskalakis MIT Ilias Diakonikolas UC Berkeley
What this talk is about Probability distributions over [N] = {1,2,…,N} N 1 2 Monotone increasing distribution: for all ` (Whole talk: “increasing” means “non-decreasing”)
k-modal distributions k-modal: k peaks and valleys Monotone distribution: 0-modal A unimodal distribution: Another one: A 3-modal distribution:
The learning problem Target distribution p is an unknown k-modal distribution over [N] Algorithm gets samples from p N 1 2 Goal: output a hypothesis h that’s -close to p in total variation distance Want algorithm that uses few samples & is computationally efficient.
The testing problem q is a known k-modal distribution over [N]. N 1 2 p is an unknown k-modal distribution over [N]. Algorithm gets samples from p. N 1 2 Goal: output “yes” w.h.p. if “no” w.h.p. if
Please note Testing problem is not: given samples from an unknown distribution p, determine if p is k-modal versus -far from every k-modal distribution. This problem requires samples, even for k=0. hard to distinguish vs 1 N 1 N uniform over random uniform over
Why study these questions? • k-modal distributions seem natural • would be nice if k-modal structure were exploitable by efficient learning / testing algorithms • post hoc justification: solutions exhibit interesting connections between testing and learning
The general case: learning If we drop k-modal assumption, learning problem becomes: Learn an arbitrary distribution over [N] to total variation distance N 1 samples are necessary and sufficient
The general case: testing If we drop k-modal assumption, testing problem becomes: q is a known, arbitrary distribution over [N]. p is an unknown, arbitrary distribution over [N]. Algorithm gets samples from p. Goal: output “yes” if “no” if samples are necessary and sufficient [GR00, BFFKRW02, P08]
This work: main learning result We give an algorithm that learns any k-modal distribution over [N] to accuracy . It uses samples and runs in time. Close to optimal: -sample lower bound for any algorithm.
Main testing result We give an algorithm that solves the k-modal testing problem over [N] to accuracy . It uses samples and runs in time. Any testing algorithm must use samples. Testing is easier than learning!
Prior work k=0,1: [BKR04] gave -sample efficient algorithm for testing problem (p,q both available via sample access) k=0,1: [Birge87, Birge87a] gave -sample efficient algorithm for learning, and matching lower bound We’ll use this algorithm as a black box in our results
Outline of rest of talk • Background: some tools • Learning k-modal distributions • Testing k-modal distributions
First tool: Learning monotone distributions Theorem [B87] There is an efficient algorithm that learns any monotone decreasing distribution over to accuracy . It uses samples and runs in time linear in its input size. [B87b] also gave lower bound for learning a monotone distribution.
Second tool: Learning a CDF – the Dvoretsky-Kiefer-Wolfowitz inequality Theorem: [DKW56] Let be any distribution over with CDF . Let be empirical estimate of obtained from samples. Then with probability . true CDF empirical CDF Note: samples suffice (by easyChernoff bound argument) Morally, means you can partition into intervals each of mass under , using samples.
The problem Learn an unknown k-modal distribution over [N]. N 1 2
What should we shoot for? Easy lower bound: need samples. (have to solve monotone-distribution-learning problems over to accuracy ) Want an algorithm that uses roughly this many samples and takes time
The problem, again Goal: learn an unknown k-modal distribution over [N]. We know how to efficiently learn an unknown monotone distribution… X X X Would be easy if we knew the k peaks/valleys… Guessing them exactly: infeasible Guessing them approximately: not too great either
A first approach Break up [N] into many intervals: … is not monotone for at most k of the intervals So running monotone distribution learner on each interval will usually give a good answer.
First approach in more detail • Use [DKW] to divide [N] into intervals & obtain estimates such that • (Assumes each point has mass at most or so; heavier points are easy to detect and deal with.) • Run monotone distribution learner on each to get • (Actually run it twice: once for increasing, once for decreasing. • Do hypothesis testing to pick one as .) • Combine hypotheses in obvious way: and
Sketch of analysis • Use [DKW] to divide [N] into intervals & obtain estimates such that • Takes samples • Run monotone distribution learner on each to get • Takes samples • Combine hypotheses in obvious way: • Total error from k non-monotone intervals • from scaling factors • from estimating ’s with ’s and
Improving the approach came from running monotone distribution learner times rather than just times If we could somehow check – more cheaply than learning – whether an interval is monotone before running the learner, could run the learner fewer times and save… …this is a property testing problem! More sophisticated algorithm: two new ingredients.
First ingredient: testing k-modal distributions for monotonicity Consider the following property testing problem: Algorithm gets samples from unknown k-modal distribution p over [N]. Goal: output “yes” w.h.p. if p is monotone increasing “no” w.h.p. if p is -far from monotone increasing Note: k-modal promise for p might save us from lower bound… hard to distinguish 1 n 1 n
Efficiently testing k-modal distributions for monotonicity Algorithm gets samples from unknown k-modal distribution p over [N]. Goal: output “yes” w.h.p. if p is monotone increasing “no” w.h.p. if p is -far from monotone increasing Theorem: There is a -sample tester for this problem. close to v We’ll use this to identify sub-intervals of [N] where p is monotone …can we efficiently learn close-to-monotone distributions?
Second ingredient: agnostically learning monotone distributions Consider the following “agnostic learning” problem: Algorithm gets samples from unknown distribution p over [N] that is -close to monotone. Goal: output hypothesis distribution h such that If opt=0, this is the original “learn a monotone distribution” problem Want to handle general case as efficiently as opt=0 case
agnostically learning monotone distributions Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone. Goal: output hypothesis distribution h such that Theorem: There is a computationally efficient learning algorithm for this problem that uses samples.
agnostically learning monotone distributions Semi- Algorithm gets samples from unknown distribution p over [N] that isopt-close to monotone. Goal: output hypothesis distribution h such that Theorem: There is a computationally efficient learning algorithm for this semi-agnostic problem that uses samples. The [Birge87] monotone distribution learner does the job. We will take , , so versus doesn’t matter.
The learning algorithm: first phase • Use [DKW] to divide [N] into intervals & obtain estimates such that • Run testers on then etc., until first time both say “no” at Mark and continue. invocations of tester in total(Alternative: use binary search: invocations of tester in total.) …
The algorithm • Run testers on then etc., until first time both say “no” at Mark and continue. … • Each time an interval is marked, • the block of unmarked intervals right before it is close-to-monotone; call this a superinterval • (at least) one of the k peaks/valleys of p is “used up”
The learning algorithm: second phase • After this step, [N] is partitioned into • superintervals each -close to monotone • “marked” intervals, each of weight • Rest of algorithm: • Run semi-agnostic monotone distribution learner on each superinterval to get -accurate hypothesis for • Output final hypothesis
Analysis of the algorithm • Sample complexity: • runs of tester: each uses samples • runs of semi-agnostic monotone learner: each uses • samples. • Error rate: • error from marked intervals • total error from estimating ’s with ’s • total error from scaling factors
I owe you a tester Algorithm gets samples from unknown k-modal distribution p over [N]. Goal: output “yes” w.h.p. if p is monotone increasing “no” w.h.p. if p is -far from monotone increasing Theorem: There is a -sample tester for this problem.
The testing algorithm • Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that • then output “no”; otherwise output “yes” average value of over [a,b] • Completeness: p monotone increasing test passes w.h.p.
Soundness • Algorithm: • Run [DKW] with accuracy Let be resulting empirical PDF. • If such that • then output “no”; otherwise output “yes” • Soundness lemma: If is k-modal and have • then is -close to monotone increasing. To prove soundness lemma: show that under lemma’s hypothesis, can “correct” each peak/valley of by “spending” at most in variation distance.
Correcting a peak of p • Lemma: If is k-modal and have • then is -close to monotone increasing. Consider a peak of p: Draw a line at height such that (mass of “hill” above line) = (missing mass of “valley” below line): Correct the peak by bulldozing the hill into the valley:
Why it works • Lemma: If is k-modal and have • then is -close to monotone increasing. n correction So and so so
Summary • Sample- and time- efficient algorithms for learning and testingk-modal distributions over [N]. • Upper bounds pretty close to lower bounds for these problems. • Testing is easier than learning • Learning algorithms have a testing component
Future work • More efficient algorithms for restricted classes of -modal distributions? • [DDS11]: any sum of Bernoulli random variables is learnable using samples independent of special type of unimodaldistribution: “Poisson Binomial Distribution”
Key ingredient: oblivious decomposition Decompose into intervals whose widths increase as powers of . Call these the oblivious buckets. … …
Flattening a monotone distributionusing the oblivious decomposition Given a monotone decreasing distribution , the flattened version of , denoted , spreads ’s weight uniformly within each bucket of the oblivious decomposition: true pdf flattened version … … … … Lemma:[B87] For any monotone decreasing distribution , have
Learning monotone distributionsusing oblivious decomposition [B87] Reduce learning monotone distributions over to accuracy learning arbitrary distributions over to accuracy Algorithm: • Draw samples from • Output hypothesis is the flattened empirical distribution • - • View as arbitrary distribution over -element set: Analysis:
Testing monotone distributionsusing oblivious decomposition Can use learning algorithm to get -sample algorithm for testing problem. But, can do better by using oblivious decomposition directly: testing equality of monotone distributions over to accuracy testing equality of arbitrary distributions over to accuracy : known monotone distribution over : unknown monotone distribution over : known distribution over : unknown distribution over Using [BFFKRW02], get -sample testing algorithm Can show lower bound for any tester.
[BKR04] implicitly gave log^2(n)loglog(n)/eps^5-sample algorithm for learning monotone distribution