Study on Ensemble Learning

Study on Ensemble Learning By Feng Zhou

Content • Introduction • A Statistical View of M3 Network • Future Works

Introduction • Ensemble learning: • To combine a group of classifiers rather than to design a new one. • The decisions of multiple hypotheses are combined to produce more accurate results. • Problems in traditional learning algorithms • Statistical Problem • Computational Problem • Representation Problem • Related Works • Resampling techniques: Bagging, Boosting • Approaches for extending to multi-class problem: One-vs-One, One-vs-All.

Min-Max-Modular (M3) Network(Lu, IEEE TNN 1999) • Steps • Dividing training sets. (Chen, IJCNN 2006; Wen, ICONIP 2005) • Training pair-wise classifiers • Integrating the outcomes (Zhao, IJCNN 2005) • Min process • Max process

A Statistical View • Assumption • The pair-wise classifier outputs a probabilistic value. Sigmoid function (J.C. Platt, ALMC 1999): • Bayesian decision theory

A Simple Discrete Example

A Simple Discrete Example (II) Pc0(w+|x=x2) = 1/3 Pc1(w+|x=x2) = 1/2 Pc2(w+|x=x2) = 1/2 Pc0 < min(Pc1,Pc2) Classifier 0 (w+:w-) Classifier 2 (w+:w2-) Classifier 1 (w+:w1-)

A More Complicated Example • When consider a new more classifier, the evidence that x belong to w+ is getting shrinking. • Pglobal(w+) < min(Ppartial(w+)) • The one reporting the minimum value contains the most information about w-(Minimization principle) • If Ppartial(w+)=1, no information about w- iscontained. Information about w- is increasing …… Classifier 1 (w+:w1-) Classifier 2 (w+:w2-)

Analysis • For each classifier cij • For each sub-positive class wi+ • For positive class w+

Analysis (II) • Decomposition of a complex problem • Restoration to the original resoluation

Composition of Training Sets Have been used Not used yet Trivial set, useless

Another Way of Combination Training and testing Time:

Experiments - Synthesis Data

Experiments – Text Categorization(20 Newsgroup copus) • Experiments Setup • Removing words : • stemming • stop • words < 30 • Using Naïve Bayes as the elementary classifier • Estimating the probability with a sigmod function

Future Work • Situation with consideration of noise • The virtue of the problem: To access the underlying distribution • Independent parameters for the model: • Constraints we get: • To obtain the best estimation. Kullback-Leibler Distance (T. Hastie, Ann Statist 1998)

References [1] T. Hastie & R. Tibshirani, Classification by pairwise coupling, Ann Statist 1998. [2] J. C. Platt, (Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, ALMC 1999 [3] B. Lu & , Task decomposition and module combination based on class relations a modular neural network for pattern classification, IEEE Tran. Neural Networks, 1999 [4] Y. M. Wen & B. Lu, Equal Clustering Makes Min-Max Modular Support Vector Machines More Efficient, ICONIP 2005 [5] H. Zhao & B. Lu, On efficient selection of binary classifiers for min-max modular classifier, IJCNN 2005 [6] K. Chen & B. Lu, Efficient classification of multi-label and imbalanced data using min-max modular classifiers, IJCNN 2006

Study on Ensemble Learning

Study on Ensemble Learning

Presentation Transcript

Ensemble Learning

AREA OF STUDY 3 ENSEMBLE PERFORMANCE

Ensemble Learning Better Predictions Through Diversity

Popular Ensemble Methods: An Empirical Study

Ensemble Learning

Ensemble Learning

Ensemble Learning (2), Tree and Forest

Lecture 6 Ensemble Learning (1) Boosting

From Evolutionary Computation to Ensemble Learning

Ensemble learning

Ensemble Learning: An Introduction

Ensemble Learning

Ensemble Learning

Study Meeting on Learning Organizations

Machine Learning Lecture 8: Ensemble Methods

Ensemble Learning: AdaBoost

Ensemble Learning

Ensemble Learning: An Introduction

Ensemble Methods for Machine Learning

Ensemble learning