1 / 15

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers. ICML 2005 François Laviolette and Mario Marchand Université Laval. PLAN. The “traditional” PAC-Bayes theorem (for the usual data-independent setting )

cuyler
Download Presentation

PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PAC-Bayes Risk Bounds for Sample-Compressed Gibbs Classifiers ICML 2005 François Laviolette and Mario Marchand Université Laval

  2. PLAN • The “traditional” PAC-Bayes theorem (for the usual data-independent setting ) • The “generalized” PAC-Bayes theorem (for the more general sample compression setting) • Implications and follow-ups

  3. A result from folklore :

  4. In particular, for Gibbs classifiers: What if we choose Pafter observing the data?

  5. The “traditional” PAC-Bayes Theorem

  6. The Gibbs and the majority vote • We have a bound for GQ but we normally use instead the Bayes classifier BQ (which is the Q-weighted majority vote classifier) • Consequently R(BQ) · 2R(GQ) (can be improved with the “de-randomization” technique of Langford-Shaw-Taylor 2003) • So the PAC-Bayes theorem also gives a bound on the Majority vote classifier.

  7. The sample compression setting • Theorem 1 is valid in the usual data-independent setting where H is defined without reference to the training data • Example: H = the set of all linear classifiers h: Rn!{-1,+1} • In the more general sample compression setting, each classifier is identified by 2 different sources of information: • The compression set: an (ordered) subset of the training set • A message string of additional information needed to identify a classifier • Theorem 1 is not valid in this more general setting

  8. To be more precise: • In the sample compression setting, there exists a “reconstruction” function R that gives a classifier h = R(, Si) when given a compression set Si and a message string . • Recall that Si is an ordered subset of the training set S where the order is specified by i=(i1, i2, … , i|i|).

  9. Examples • Set Covering Machines (SCM) [Marchand and Shaw-Taylor JMLR 2002] • Decision List Machines (DLM) [Marchand and Sokolova JMLR 2005] • Support Vector Machines (SVM) • Nearest neighbour classifiers (NNC) • …

  10. Priors in the sample compression setting • The priors must be Data-independent • We will thus use priors defined over the set of all the parameters (i,) needed by the reconstruction function R, once a training set S is given.The priors should be written as:

  11. The “generalized” PAC-Bayes Theorem

  12. a (the rescaled ) incorporates Occam’s principle of parsimony • The new PAC-Bayes theorem states that the risk bound for is lower than the risk bound for any .

  13. The PAC-Bayes theorem for bounded compression set size

  14. Conclusion • The new PAC-Bayes bound • is valid in the more general sample compression setting. • incorporates automatically the Occam’s principle of parsimony • A sample compressed Gibbs classifier can have a smaller risk bound than any of its member.

  15. The next steps • Finding derived bounds for particular sample compressed classifiers like: • majority votes of SCMs and DLMs, • SVMs • NNCs. • Developing new learning algorithms based on the theoretical information given by the bound. • A tight Risk bound for Majority vote classifiers ?

More Related