340 likes | 411 Views
Random Forest. Boosting (1). Ensemble learning Reminder - Bagging of Trees Random Forest. Adaboost. Ensemble learning. Aggregating a group of classifiers (“base classifiers”) as an ensemble committee and making the prediction by consensus.
E N D
Random Forest Boosting (1) Ensemble learning Reminder - Bagging of Trees Random Forest Adaboost
Ensemble learning Aggregating a group of classifiers (“base classifiers”) as an ensemble committee and making the prediction by consensus. Weak learner ensembles (each base learner has high EPE, but is easy to train): CurrentBioinformatics, 5, (4):296-308, 2010.
Ensemble learning Strong learner ensembles (“Stacking” and beyond): CurrentBioinformatics, 5, (4):296-308, 2010.
Ensemble learning Why? Statistical A learning algorithm searches a space of hypotheses for the best fit to the data. With insufficient data (almost always), the algorithm can find many equally good solutions. Averaging reduces risk. Thomas G. Dietterich, “Ensemble Methods in Machine Learning”
Ensemble learning Why? (2) Computational Modern learning algorithms represent complicated optimization problems. Often a search cannot guarantee global optimum. Ensemble can be seen as running the search from many starting points. Thomas G. Dietterich, “Ensemble Methods in Machine Learning”
Ensemble learning Why? (3) Representational A true function may not be represented by any of the (group of) hypotheses. Ensemble expands the space of representable functions. Thomas G. Dietterich, “Ensemble Methods in Machine Learning”
Reminder - Bootstrapping • Directly assess uncertainty from the training data Basic thinking: assuming the data approaches true underlying density, re-sampling from it will give us an idea of the uncertainty caused by sampling
Bagging “Bootstrap aggregation.” Resample the training dataset. Build a prediction model on each resampled dataset. Average the prediction. It’s a Monte Carlo estimate of , where is the empirical distribution putting equal probability 1/N on each of the data points. Bagging only differs from the original estimate when f() is a non-linear or adaptive function of the data! When f() is a linear function, Tree is a perfect candidate for bagging – each bootstrap tree will differ in structure.
Bagging trees Bagged trees are of different structure.
Random Forest Bagging can be seen as a method to reduce variance of an estimated prediction function. It mostly helps high-variance, low-bias classifiers. Comparatively, boosting build weak classifiers one-by-one, allowing the collection to evolve to the right direction. Random forest is a substantial modification to bagging – build a collection of de-correlated trees. - Similar performance to boosting - Simpler to train and tune compared to boosting
Random Forest The intuition – the average of random variables. B i.i.d. random variables, each with variance The mean has variance B i.d. random variables, each with variance , with pairwise correlation , The mean has variance ------------------------------------------------------------------------------------- Bagged trees are similar to i.d.samples. Random forest aims at reducing the correlation to reduce variance. This is achieved by random selection of variables.
Random Forest Benefit of RF – out of bag (OOB) sample cross validation error. For sample i, find its RF error from only trees built from samples where sample i did not appear. The OOB error rate is close to N-fold cross validation error rate. Unlike many other nonlinear estimators, RF can be fit in a single sequence. Stop growing forest when OOB error stabilizes.
Random Forest Variable importance – find the most relevant predictors. At every split of every tree, a variable contributed to the improvement of the impurity measure. Accumulate the reduction of i(N) for every variable, we have a measure of relative importance of the variables. The predictors that appears the most times at split points, and lead to the most reduction of impurity, are the ones that are important. ------------------ Another method – Permute the predictor values of the OOB samples at every tree, the resulting decrease in prediction accuracy is also a measure of importance. Accumulate it over all trees.
Random Forest Finding interactions between variables? Y=sin(2V2)+V52+V2V5+V8V9+|V9|
Boosting Construct a sequence of weak classifiers, and combine them into a strong classifier by a weighted majority vote. “weak”: better than random coin-tossing Some properties: Flexible. Able to do feature selection. Good generalization. Could fit noise.
Boosting Adaboost:
Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire
Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire
Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire
Boosting “A Tutorial on Boosting”,Yoav Freund and Rob Schapire
Boosting This is the weight of the current weak classifier in the final model. This weight is for individual observations. Notice it is stacked from step 1. If an observation is correctly classified at this step, its weight doesn’t change. If incorrectly classified, its weight increases.
Boosting 10 predictors The weak classifier is a Stump: a two-level tree.