160 likes | 374 Views
Ensemble Methods: Bagging and Boosting. Ensemble Paradigm. . Training Data. Data1. Data m. Data2. . Learner m. Learner2. Learner1. . Model1. Model2. Model m. Final Model. Model Combiner.
E N D
Ensemble Paradigm Training Data Data1 Data m Data2 Learner m Learner2 Learner1 Model1 Model2 Model m Final Model Model Combiner • Use m different learning styles to learn from one training data set. • Combine decisions of multiple classifiers using, e.g, weighted voting.
Why Ensembles • Sometimes a learning algorithm is unstable, i.e., a little change in the training set causes a big change in the learned classifier. • Sometimes there is substantial noise in the training set. • By using an ensemble of classifiers, we don’t just depend on the decision of just one classifier. • Disadvantages • Time consuming • Over-fitting sometimes
Homogenous Ensembles • Use a single, learning style but manipulate training data to make it learn multiple models. • Data1 Data2 … Data m • Learner1 = Learner2 = … = Learner m • Different methods for changing training data: • Bagging: Resample training data with replacement • Boosting: Weigh individual training vectors • DECORATE: Add additional artificial training data • In WEKA, Classify=>Choose=>classifiers=>meta They take a learning algorithm as an argument (base classifier) and create a meta-classifier.
Bagging • Create a set of m independent classifiers by randomly resample the training data • Given a training set of size n, create m bootstrap samples of size n’ by drawing n’ examples from the original data, with replacement, n’ usually < n. • If n=n’, each bootstrap sample will on average contain 63.2% of the unique training examples, the rest are duplicates. • Combine the m resulting models using simple majority vote. • Decreases error by decreasing the variance in the results due to unstable learners, algorithms (like decision trees) whose output can change dramatically when the training data is slightly changed.
Satellite Images Data • Generated by NASA • Own by the Australian Centre for Remote Sensing • One frame of Landsat imagery consists of 4 digital images of the same scene in 4 different spectral bands. • Two of these are in the visible region: green and red • Two are in the near infra-red • A pixel in the image corresponds to 80m by 80m of real land
Record format • Example: 92 115 120 94 84 102 106 79 84 102 102 83 101 126 133 103 92 112 118 85 84 103 104 81 102 126 134 104 88 121 128 100 84 107 113 87 3 • Each line of data corresponds to a 3x3 square neighborhood of pixels • Each line contains the pixel values in the 4 spectral bands • 3x3x4 = 36 numbers • The last number indicates type of land • The records are given in random order so that you cannot reconstruct the original landscape
Class labels There are no examples with class 6 in this particular dataset. The classification for each pixel was performed on the basis of an actual site visit by Ms. Karen Hall, when working for Professor John A. Richards, at the Centre for emote Sensing at the University of New South Wales, Australia.
Weka’s bagging • Single classifier • Use satellite image training and test data • Classify test data using NaiveBayesSimple • Observe the outputs • Bagging • Classify=>Choose=>meta=>Bagging • Set bagSizePercent to 80 • Try numIterations = 80 • Observe error rate • Try numIterations = 90 • Observe error rate • . • . • .
Boosting • Produce a sequence of classifiers • Bagging is easily parallelized, Boosting is not. • Each classifier is dependent on the previous one, and focuses on the previous one’s errors • Examples that are incorrectly predicted in previous classifiers are chosen more often or weighed more heavily
Boosting: Basic Algorithm Training phase: • Set all training examples to have equal weights. • For i = 1 to m • Build the ith classifier. Learn a hypothesis, hi, from the weighted examples. • Adjust the weights of the training data for the next classifier by decreasing the weights of examples successfully classified. • End For • m classifiers have been created. Testing phase: Each of the m hypotheses get a weighted vote proportional to their accuracy on the training data to decide an unknown.
King Rook vs King Pawn data • Chess End-Games • King+Rook versus King+Pawn on a7. The pawn on a7 means it is one square away from queening. • King+Rook: white to move
Chess end game analysis tree Basically, the machine is trying to learn this decision tree. 43 attributes all together
Record format • one instance (board position) per line • A typical example: • 1st attribute bkblk is false • 2ndattribute bknwy is false • Etc. • Last attribute indicates that White-can-win ("won") • White is deemed to be unable to win if the Black pawn can safely advance. f,f,f,f,f,f,f,f,f,f,f,f,l,f,n,f,f,t,f,f,f,f,f,f,f,t,f,f,f,f,f,f,f,t,t,n,won
Weka’s boosting • Single classifier • Use King Rook vs King Pawn data • 10-fold cross-validation using NaiveBayesSimple • Start to observe the error rate • Boosting • Classify=>Choose=>meta=>AdaBoostM1 • Set the parameter of AdaBoostM1 to use NaiveBayesSimple as the base classifier • Start to observe the error rate
Experimental Results on Ensembles • Ensembles have been used to improve generalization accuracy on a wide variety of problems. • Bagging almost always better than single decision tree or ANN (artificial neural net). • On average, Boosting provides a larger increase in accuracy than Bagging. • Boosting on some occasions can degrade accuracy particularly when there is significant noise in the training data. Bagging more consistently provides a modest improvement. • Boosting is particularly subject to over-fitting.