290 likes | 392 Views
An experimental study on a new ensemble method using robust and order statistics. Faisal Zaman and Hideo Hirose Kyushu Institute of Technology Fukuoka, Japan. Outline. Ensemble Learning Popular Ensemble Methods Properties Base Classifiers Combination Rules Diversity
E N D
An experimental study on a new ensemble method using robust and order statistics Faisal Zaman and Hideo Hirose Kyushu Institute of Technology Fukuoka, Japan
Outline • Ensemble Learning • Popular Ensemble Methods • Properties • Base Classifiers • Combination Rules • Diversity • New Ensemble Method • Design of an ensemble • Trimmean and Spread combination rule • New ensemble method-overview • New ensemble method-algorithm • Experiments and Discussion of Results • Aim and set up of the experiments • Experiment with linear classifiers-discussion • Experiment with linear classifiers-results fld • Experiment with linear classifiers-results loglc • Experiment with cart-error report • Experiment with cart-diversity • Conclusion 18/06/2009
Ensemble learning Ensemble learning is a method to learn decisions of several predictors on a same problem, then combine these decisions to predict an unseen problem. An ensemble is preferred over a single classifier for its • Accuracy: as more reliable mapping can be obtained by combining • the output of multiple “experts”. • Efficiency: as a complex problem can be decomposed into multiple • sub-problems that are easier to understand and solve. 18/06/2009 1
General architecture of ensemble of classifiers T Original Training Set Step 1: Create Multiple datasets T1 T2 TB-1 TB Step 2: Built Multiple Classifiers (Ensemble) C1 C2 CB-1 CB Step 3: Combine the decisions of the classifiers CCOM General Ensemble Architecture 18/06/2009 2
popular ensemble methods The following ensemble methods are popular and standard among ensemble of classifier : Bagging (Breiman, 1994). Adaboost (Freund and Schapire 1995). Random Forest (Breiman, 2001). Rotation Forest (Rodriguez et. al, 2006). Error of an ensemble can be decomposed into three parts Error = Intrinsic error + Bias + Variance Ensemble methods usually reduce variance and sometimes the bias also. 18/06/2009 3
Review of ensemble methods-properties 18/06/2009 4
Review of ensemble methods-base classifiers In ensemble methods following base classifiers are used: Classification and Regression Tree (CART) Neural Network (NN) Decision Stump (DS)==> CART with 2-3 nodes. 18/06/2009 5
Review of ensemble methods-combination rules Majority Vote: This will select “Class1”, as two classifiers resulted in Class1 among three classifiers Weighted Majority Vote: Weighted majority will select “Class2”, as it get the highest weight. Average: It will select “Class2”, as the average class posteriori probability for Class2 is more than Class1. 0.47 0.53 Class Posteriori Probability 18/06/2009 6
Review of ensemble methods-diversity FIGURE: Kappa-Error Diagram of popular ensemble methods From General Point of view (only cloud centers are pointed) Kappa-Error diagram is an efficient method to check the diversity of the base classifiers of an ensemble. In this diagram, the lower value of kappa, indicates higher diversity. In this diagram average error of a pair of base classifiers from each ensemble is plotted against the kappa value of the pair. 18/06/2009 7
Design of an ensemble method To design an ensemble of classifiers one need to ensure: Higher accuracy of the individual base classifiers. This refers to bias and variance of the prediction error of each base classifier. Diversity among the decisions of the base classifiers. Good combination rule, to combine the decisions of the base classifiers. Better accuracy can be achieved by training the base classifiers in the whole feature space. To do that we need to construct base classifiers on larger training sets. To make the base classifiers disagree with each other, each of them should be constructed on independent training sets. Use of subsampling with high subsample rate, instead of bootstrapping can facilitate us with both these criterion. It is feasible to use a combination rule which is not susceptible to imbalance decisions of any base classifier. 18/06/2009 8
Trimmean and Spread combination rule TRIMMEAN: Sort the class posteriori probabilities (CPP) for each class. 2. Trim a portion of them, and average the remaining probabilities. 3. Select the class with highest average. In this example, if we sort the CPP for each class and then trim the lowest ones, average the remaining CPP, the result is 0.65 for both classes. So we can select, “Class1” or “Class2”. SPREAD: Sort the CPP for each class. Take the MAX CPP and MIN CPP for each class. Compute the average of MAX CPP and MIN CPP. Select the class with highest average. In this example the MAX CPP and MIN CPP for Class1 is 0.1 and 0.7, do their Average is 0.4; similarly for Class2 this value is 0.6. So we will select, “Class2”. 18/06/2009 11
The new ensemble method-overview The new ensemble method is constructed as follows: Use subsampling to generate training sets to train the base classifiers. Select classifiers on the basis of a rejection-threshold value. Use a robust or order statistic to combine the decisions of the base classifiers. We have generated the training sets for each base classifier with subsample rate 0.75 , i.e., taking 75% observations from the original training set for each subsample. This ensures that each base classifier is trained on larger subsample. The rejection-threshold is Leave-One-Out Cross-validation (LOOCV) error of the base classifier, computed on the original training set. We select only those base classifiers which have generalization error less than or equal to this value. We have used one robust statistic named, “Trimmed Mean” and another order statistic named, “Spread” as the combination rule. 18/06/2009 9
The new ensemble method-algorithm INITIAL PHASE: Compute the LOOCV error of the base classifier C(x) on the training set X, denote this as εLOOCV. Repeat for l = 1, 2, … L Generate a subsample Xl from the training set X , with 75% observations from X . Construct a base classifier Cl on Xl . Compute the error εl* of Cl on an independent bootstrap Xl*sample from X. Select classifier Cl, if εl* ≤ εLOOCV , otherwise, train a weaker version of CL. Denote these classifier as C lVALID 5. Combine the selected classifiers C lVALID , l = 1, 2, … L , using Trimmean or Spread combination rule. INITIAL PHASE 0 TRAINING PHASE: 1 2 3 SELECTION PHASE: 4 COMBINATION PHASE 5 18/06/2009 10
Aim and Set up of the experiments Firstly, we wanted to check the performance of the new ensemble method with Linear base classifiers. a. We used here 2 linear classifiers: Fisher Linear Discriminant (FLD) classifier and Logistic Linear Classifier (LogLC). b. We compared the performance of the new ensemble method with Bagging and Adaboost. c. We also checked whether the ensemble of these linear classifiers achieved lower error rate than single linear classifier. 2. Secondly, we compared the performance of the new ensemble method with Bagging, Adaboost, Random Forest and Rotation Forest, with Classification And Regression Tree (CART) as the base classifier. Thirdly, check the diversity of the proposed ensemble method using κ-error diagram, with CART as the base classifier. Also check the relation between the two combination rules with several diversity measures. We have used 15 datasets from UCI Machine Learning Repository. We have used mean of 10 repetitions of 10 fold cross-validation (CV) to compute the error of all methods. 18/06/2009 11
Experiments with linear classifiers-discussion Usually the linear classifiers have high bias and low variance, so ensembles of linear classifiers usually do not improve the error rate that much than the single linear classifier. In the new ensemble method, Due to the selection phase, the base linear classifiers generated are more accurate (which automatically imply that they have lower bias than the single classifier). Also as we have imposed to select the weaker version of the linear classifier, this will increase the variance of the base classifiers; we all know that bagging type ensembles reduce the variance of the base classifiers. 18/06/2009 12
Experiments with linear classifiers-results FLD TABLE: Misclassification Error of single FLD and Ensembles of FLD 18/06/2009 13
Experiments with linear classifiers-results FLD 18/06/2009 14
Experiments with linear classifiers-results LOGLC TABLE: Misclassification Error of single LogLC and Ensembles of LogLC 18/06/2009 15
Experiments with linear classifiers-results LOGLC 18/06/2009 16
Experiment with cart-error report TABLE: Misclassification Error of Ensembles of CART 18/06/2009 18
Experiment with cart-error report 18/06/2009 19
Experiment with cart-diversity TABLE: Correlation between Trimmean and Spread Combination rule with several diversity measures. It is apparent from the table that there is no substantial amount of correlation between the combination rules and the diversity measures. 18/06/2009 20
Experiment with cart-diversity FIGURE: Kappa-Error Diagram for the New ensemble of CART with Trimmean combination rule. Sonar In the figure only the cloud center for Sonar and Wisconsin dataset are pointed 18/06/2009 21
Conclusion The proposed ensemble is able to reduce the error of linear classifiers It has performed better with FLD than LoglC. Both the combination rule has performed similarly with the linear classifiers. The new ensemble method produced lower error rate than Bagging and Adaboost. With CART the new ensemble method has similar performance with Bagging and Adaboost. But performed worse that Rotation Forest. Trimmean and Spread rule has low correlation with several diversity measures, from which nothing significant can be concluded. The new ensemble with trimmean has either low diversity high accuracy or high diversity low accuracy format in the κ-error diagram. 18/06/2009 22