350 likes | 494 Views
A speech about Boosting. Presenter: Roberto Valenti. The Paper*. *R.Schapire. The boosting approach to Machine Learning An Overview, 2001. I want YOU… . …TO UNDERSTAND. Overview. Introduction Adaboost How Does it work? Why does it work? Demo Extensions Performance & Applications
E N D
A speech about Boosting Presenter: Roberto Valenti
The Paper* *R.Schapire. The boosting approach to Machine Learning An Overview, 2001
I want YOU… …TO UNDERSTAND
Overview • Introduction • Adaboost • How Does it work? • Why does it work? • Demo • Extensions • Performance & Applications • Summary & Conclusions • Questions
Introduction to Boosting Let’s start
Introduction • An example of Machine Learning: Spam classifier • Highly accurate rule: difficult to find • Inaccurate rule: ”BUY NOW” • Introducing Boosting: “An effective method of producing an accurate prediction rule from inaccurate rules”
Introduction • History of boosting: • 1989: Schapire • First provable polynomial time boosting • 1990: Freund • Much more efficient, but practical drawbacks • 1995: Freund & Schapire • Adaboost: Focus of this Presentation • …
Introduction • The Boosting Approach • Lots of Weak Classifiers • One Strong Classifier • Boosting key points: • Give importance to misclassified data • Find a way to combine weak classifiers in general rule.
Adaboost How does it work?
Adaboost – How does it work? Base Learner Job: • Find a base Hypothesis: • Minimize the error: • Choose at
Adaboost Why does it work?
Adaboost – Why does it work? • Basic property: reduce the training error • On binary Distributions: e =1/2 - gt • Training error bounded by: • Is at most e-2Tg2 ->drops exponentially!
Adaboost – Why does it work? • Generalization Error bounded by: • T= number of iterations • m=sample size • d= Vapnik-Chervonenkis dimension2 • Pr [.]= empirical probability • Õ = Logarithmic and constant factors • Overfitting in T!
Adaboost – Why does it work? • Margins of the training examples margin(x,y)= • Positive only if correctly classified by H • Confidence in prediction: • Qualitative Explanation of Effectiveness • Not Quantitative.
Adaboost – Other View • Adaboost as a zero-sum Game • Game matrix M • Row Player: Adaboost • Column Player: Base Learner • Row player plays rows with distribution P • Column player plays with distribution Q • Expected Loss: PTMQ • Play a Repeated game Matrix
Adaboost – Other View • Von Neumann’s minmax theorem: • If exist a classifier with e <1/2 - g • Then exist a combination of base classifiers with margin > 2g • Adaboost has potential of success • Relations with Linear Programming and Online Learning
Adaboost Demo
Adaboost Extensions
Adaboost - Extensions • History of Boosting: • … • 1997: Freund & Schapire • Adaboost.M1 • First Multiclass Generalization • Fails if weak learner achieves less than 50% • Adaboost.M2 • Creates a set of binary problems • For x, better l1 or l2? • 1999: Schapire & Singer • Adaboost.MH • For x, better l1 or one of the others?
Adaboost - Extensions • 2001: Rochery, Schapire et al. • Incorporating Human Knowledge • Adaboost is data-driven • Human Knowledge can compensate lack of data • Human expert: • Chose rule p mapping x to p(x) Є [0,1] • Difficult! • Simple rules should work..
Adaboost - Extensions • To incorporate human knowledge • Where RE(p||q)=p ln(p/q)+(1-p) ln((1-p)/(1-q))
Adaboost Performance and Applications
Adaboost - Performance & Applications Error Rates on Text categorization Reuters newswire articles AP newswire headlines
Adaboost - Performance & Applications Six Class Text Classification (TREC) Test Error Training Error
Adaboost - Performance & Applications Spoken Language Classification “How may I help you” “Help desk”
Adaboost - Performance & Applications OCR: Outliers Rounds: 4 12 25 class, label1/weight1,label2/weight2
Adaboost - Applications • Text filtering • Schapire, Singer, Singhal. Boosting and Rocchio applied to text filtering.1998 • Routing • Iyer, Lewis, Schapire, Singer, Singhal. Boosting for document routing.2000 • “Ranking” problems • Freund, Iyer, Schapire, Singer. An efficient boostingalgorithm for combining preferences.1998 • Image retrieval • Tieu, Viola. Boosting image retrieval.2000 • Medical diagnosis • Merler, Furlanello, Larcher, Sboner. Tuning costsensitive boosting and its application to melanoma diagnosis.2001
Adaboost - Applications • Learning problems in natural language processing • Abney, Schapire, Singer. Boosting applied to tagging and PP attachment.1999 • Collins. Discriminative reranking for natural language parsing.2000 • Escudero, Marquez, Rigau. Boosting applied to word sense disambiguation.2000 • Haruno, Shirai, Ooyama. Using decision trees to construct a practical parser.1999 • Moreno, Logan, Raj. A boosting approach for confidence scoring.2001 • Walker, Rambow, Rogati. SPoT: A trainable sentence planner.2001
Summary and Conclusions At last…
Summary • Boosting takes a weak learner and converts it to a strong one • Works by asymptotically minimizing the training error • Effectively maximizes the margin of the combined hypothesis • Adaboost is related to other many topics • It Works!
Conclusions • Adaboost advantages: • Fast, simple and easy to program • No parameter required • Performance Dependency: • (Skurichina, 2001) Boosting is only useful for large sample size. • Choice of weak classifier • Incorporation of classifier weights • Data distribution
Questions ? (don’t be mean)