170 likes | 336 Views
Study on Ensemble Learning. By Feng Zhou. Content . Introduction A Statistical View of M3 Network Future Works. Introduction. Ensemble learning: To combine a group of classifiers rather than to design a new one.
E N D
Study on Ensemble Learning By Feng Zhou
Content • Introduction • A Statistical View of M3 Network • Future Works
Introduction • Ensemble learning: • To combine a group of classifiers rather than to design a new one. • The decisions of multiple hypotheses are combined to produce more accurate results. • Problems in traditional learning algorithms • Statistical Problem • Computational Problem • Representation Problem • Related Works • Resampling techniques: Bagging, Boosting • Approaches for extending to multi-class problem: One-vs-One, One-vs-All.
Min-Max-Modular (M3) Network(Lu, IEEE TNN 1999) • Steps • Dividing training sets. (Chen, IJCNN 2006; Wen, ICONIP 2005) • Training pair-wise classifiers • Integrating the outcomes (Zhao, IJCNN 2005) • Min process • Max process
A Statistical View • Assumption • The pair-wise classifier outputs a probabilistic value. Sigmoid function (J.C. Platt, ALMC 1999): • Bayesian decision theory
A Simple Discrete Example (II) Pc0(w+|x=x2) = 1/3 Pc1(w+|x=x2) = 1/2 Pc2(w+|x=x2) = 1/2 Pc0 < min(Pc1,Pc2) Classifier 0 (w+:w-) Classifier 2 (w+:w2-) Classifier 1 (w+:w1-)
A More Complicated Example • When consider a new more classifier, the evidence that x belong to w+ is getting shrinking. • Pglobal(w+) < min(Ppartial(w+)) • The one reporting the minimum value contains the most information about w-(Minimization principle) • If Ppartial(w+)=1, no information about w- iscontained. Information about w- is increasing …… Classifier 1 (w+:w1-) Classifier 2 (w+:w2-)
Analysis • For each classifier cij • For each sub-positive class wi+ • For positive class w+
Analysis (II) • Decomposition of a complex problem • Restoration to the original resoluation
Composition of Training Sets Have been used Not used yet Trivial set, useless
Another Way of Combination Training and testing Time:
Experiments – Text Categorization(20 Newsgroup copus) • Experiments Setup • Removing words : • stemming • stop • words < 30 • Using Naïve Bayes as the elementary classifier • Estimating the probability with a sigmod function
Future Work • Situation with consideration of noise • The virtue of the problem: To access the underlying distribution • Independent parameters for the model: • Constraints we get: • To obtain the best estimation. Kullback-Leibler Distance (T. Hastie, Ann Statist 1998)
References [1] T. Hastie & R. Tibshirani, Classification by pairwise coupling, Ann Statist 1998. [2] J. C. Platt, (Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods, ALMC 1999 [3] B. Lu & , Task decomposition and module combination based on class relations a modular neural network for pattern classification, IEEE Tran. Neural Networks, 1999 [4] Y. M. Wen & B. Lu, Equal Clustering Makes Min-Max Modular Support Vector Machines More Efficient, ICONIP 2005 [5] H. Zhao & B. Lu, On efficient selection of binary classifiers for min-max modular classifier, IJCNN 2005 [6] K. Chen & B. Lu, Efficient classification of multi-label and imbalanced data using min-max modular classifiers, IJCNN 2006