1 / 17

A New Boosting Algorithm Using Input-Dependent Regularizer

A New Boosting Algorithm Using Input-Dependent Regularizer. Rong Jin 1 , Yan Liu 2 , Luo Si 2 , Jamie Carbonell 2 , Alex G. Hauptmann 2 1. Michigan State University, 2. Carnegie Mellon University. Outline. Introduction of AdaBoost algorithm Problems with AdaBoost

vondra
Download Presentation

A New Boosting Algorithm Using Input-Dependent Regularizer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Boosting Algorithm Using Input-Dependent Regularizer Rong Jin1, Yan Liu2, Luo Si2, Jamie Carbonell2, Alex G. Hauptmann2 1. Michigan State University, 2. Carnegie Mellon University

  2. Outline • Introduction of AdaBoost algorithm • Problems with AdaBoost • New boosting algorithm: input-dependent regularizer • Experiment • Conclusion and future work

  3. AdaBoost Algorithm (I) • Boost a weak classifier into a strong classifier by linearly combine an ensemble of weak classifiers • AdaBoost • Given:A weak classifier h(x) with a large classification error E(x,y)~P(x,y)(h(x)y) • Output: HT(x)= 1h1(x) + 2h2(x) +…+ThT(x) with a low classification error E(x,y)~P(x,y)(H(x)y)

  4. Sampling distribution Only focus on the examples that are misclassified or weakly classified by previous weak classifiers Combining Weak Classifiers Combination constants are computed in order to minimize the training error Choice of t: AdaBoosting Algorithm (II)

  5. Problems 1: Overfitting • AdaBoost seldom overfits • Not only minimizes the training error but also tends to maximize the classification margin (Ondar & Muller, 1998; Friedman et al., 1998) • AdaBoost does overfit when the data are noisy (Dietterich, 2000; Ratsch & Muller, 2000; Grove & Schuurmans, 1998) • Sampling distribution Dt(x) can have overly emphasis on noisy patterns • Due to the “hard margin” criteria (Ratsch et al., 2000)

  6. Problems 1: Overfitting • Introduce regularization • Not only just minimize the training error • Typical solutions • Smooth the combination constant (Schapire & Singer, 1998) • Epsilon boosting: equal to L1 regularization (Friedman & Tibshirani, 1998) • Boosting with soft margin (Ratsch et. al, 2000) • BrownBoost: a non monotonic cost function (Freund, 2001)

  7. Problem 2: Why Linear Combination? • Each weak classifier ht(x) is trained on a different sampling distribution Dt(x) • only good for particular types of input patterns • {ht(x)} is a diverse ensemble • Linear combination is not able to take full strength of the diverse ensemble {ht(x)} • Solution: combination constants should be input dependent

  8. Input Dependent Regularizer • Solve the two problems • overfitting and constant combination • Input dependent regularizer • Main idea: different combination form

  9. Role of • Regularizer • Prevent |HT(x)| from growing too fast • Theorem: if all t are bounded max, |HT(x)| a ln(bT+c) • For the of linear combination in AdaBoost, |HT(x)|~O(T) • Router • Input dependent combination constant • The prediction of ht(x) is used only when Ht-1(x) is small • Consistent with the training procedure • ht(x) is trained on the examples that Ht-1(x) is uncertain

  10. WeightBoost Algorithm (1) • Similar to AdaBoost: minimize the exponential cost function • Training setup • hi(x): x{1,-1}; a basis (weak) classifier • HT(x): a linear combination of basic classifiers • Goal: minimize training error

  11. Emphasize misclassified data patterns Avoid overemphasis on noisy data patterns WeightBoost Algorithm (2) As Simple As AdaBoost ! Choice of t:

  12. Empirical studies • Datasets: eight different UCI datasets with only binary classes • Methods to compare with • AdaBoost algorithm • WeightDecay Boost algorithm: close to L2 regularization • Epsilon Boosting: related to L1 regularization

  13. Experiment 1: Effectiveness • Compare to AdaBoost • The WeightBoost performs better than AdaBoost algorithm. • In many cases, the WeightBoost performs substantially better than AdaBoost algorithm

  14. Experiment 2: Beyond Regularization • Compare to other regularized boosting • WeightDecay Boost and Epsilon Boost • The WeightBoost performs slightly better than other regularized boosting algorithms • In several cases, the WeightBoost performs better than the other two regularized boosting algorithms

  15. Results for 10% Noise Experiment 3: Resistance to Noise • Randomly select 10%, 20%, and 30% of training data and set the labels of training data to be random value • The WeightBoost is more resistant to training noise than AdaBoost algorithm • In several cases, when AdaBoost overfits the training noises, WeightBoost is still able to perform well

  16. Experiments with Text Categorization • Reuter-21578 corpus with 10 most popular categories: WeightBoost improves 7 out of 10 categories

  17. Conclusion and Future Work • Introduce an input dependent regularizer into the combination form • Prevent |H(x)| from increasing too fast  resistant to training noise • ‘Route’ a testing data pattern to it’s appropriate classifier  improve the classification accuracy even further than standard regularization • Future research issues • How to determine the constant ? • Other input dependent regularizer?

More Related