Understanding Ensemble Methods and Classifier Combining in Machine Learning

Ensemble Methods • An ensemble method constructs a set of base classifiers from the training data • Ensemble or Classifier Combination • Predict class label of previously unseen records by aggregating predictions made by multiple classifiers

General Idea

Why does it work? • Suppose there are 25 base classifiers • Each classifier has error rate,  = 0.35 • Assume classifiers are independent • Probability that the ensemble classifier makes a wrong prediction:

Methods • By manipulating the training dataset: a classifier is built for a sampled subset of the training dataset. • Two ensemble methods: bagging (bootstrap averaging) and boosting.

Characteristics • Ensemble methods work better with unstable classifiers. • Base classifiers that are sensitive to minor perturbations in the training set. • For example, decision trees and ANNs. • The variability among training examples is one of the primary sources of errors in a classifier.

Bias-Variance Decomposition • Consider the trajectories of a projectile launched at a particular angle. The observed distance can be divided into 3 components. • Force (f) and angle (θ) • Suppose the target is t, but the projectile hits at x at distance d away from t.

Two Decision Trees (1)

Two Decision Trees (2) • Bias: The stronger the assumptions made by a classifier about the nature of its decision boundary, the larger the classifier’s bias will be. • A smaller tree has a stronger assumption. • An algorithm cannot learn the target. • Variance: Variability in the training data affects the expected error, because different compositions of the training set may lead to different decision boundaries. • Intrinsic noise in the target class. • Target class for some domain can be non-deterministic. • Same attributes values with different class labels.

Bagging • Sampling with replacement • Build classifier on each bootstrap sample • Each sample has probability 1 - (1 – 1/n)n of being selected. When n is large, a bootstrap sample Di contains about 63.2% of the training data.

Bagging Algorithm

A Bagging Example (1) • Consider a one-level binary decision tree x <= k where k is a split point to minimize the entropy. • Without bagging, the best decision stump is • x <= 0.35 or x >= 0.75

A Bagging Example (2)

Summary on Bagging • Bagging improves generalization error by reducing the variance of the base classifier. • Bagging depends on the stability of the base classifier. • If a base classifier is unstable, bagging helps to reduce the errors associated with random fluctuations in the training data. • If a base classifier is stable, then the error of the ensemble is primarily caused by bias in the base classifier. Bagging may make error larger, because the sample size is 37% smaller.

Boosting • An iterative procedure to adaptively change distribution of training data by focusing more on previously misclassified records • Initially, all N records are assigned equal weights • Unlike bagging, weights may change at the end of boosting round • The weights can be used by a base classifier to learn a model that is biased toward higher-weight examples.

Boosting • Records that are wrongly classified will have their weights increased • Records that are classified correctly will have their weights decreased • Example 4 is hard to classify • Its weight is increased, therefore it is more likely to be chosen again in subsequent rounds

Example: AdaBoost • Base classifiers: C1, C2, …, CT • Error rate: • Importance of a classifier:

Example: AdaBoost • Weight update: • If any intermediate rounds produce error rate higher than 50%, the weights are reverted back to 1/n and the resampling procedure is repeated • Classification:

A Boosting Example (1) • Consider a one-level binary decision tree x <= k where k is a split point to minimize the entropy. • Without bagging, the best decision stump is • x <= 0.35 or x >= 0.75

A Boosting Example (2)

A Boosting Example (3)

Understanding Ensemble Methods and Classifier Combining in Machine Learning

Understanding Ensemble Methods and Classifier Combining in Machine Learning

Presentation Transcript

Chapter 7: Ensemble Methods

Ensemble Methods: Bagging and Boosting

Ensemble Kalman Filter Methods

Topic 10 - Ensemble Methods

METHODS TO EVALUATE PROBABILISTIC AND ENSEMBLE FORECASTS

Popular Ensemble Methods: An Empirical Study

Ensemble Methods:

Classification Ensemble Methods 1

Massively Parallel Ensemble Methods Using Work Queue

Classification Ensemble Methods 2

Ensemble Methods and tools

Verification Methods for High Resolution Ensemble Forecasts

Examples of Ensemble Methods

Ensemble

Ensemble Methods: Bagging and Boosting

Common verification methods for ensemble forecasts

Machine Learning Lecture 8: Ensemble Methods

Ensemble Classification Methods

Ensemble Methods for Machine Learning

[[PDF]] Ensemble Learning: Pattern Classification Using Ensemble Methods (Second Edition) BY-Lior Rokach