Lecture 10. Trees and Boosting

Lecture 10. Treesand Boosting Instructed by Jinzhu Jia

Outline • Tree based methods • CART • MARS • Boosting

Tree based methods • Tree-based methods partition the feature space into a set of rectangles, and then fit a simple model (like a constant) in each one.

Tree based methods

Regression Trees • P inputs and a response (x_i,y_i),i=1,2,...,N • Goal: automatically decides on the splitting variables and split points, and also the topology (shape) of the tree • Alg: suppose we know the partitation • R_1,R_2,...,R_M

Regression Trees • Greedy procedures

Tree size • Large tree might over-fit the data • Small tree might not capture the important structure • Tree size is a tuning parameter • Stop the splitting process only when some minimum node size (say 5) is reached • Then this large tree is pruned using cost complexitypruning

Pruning • Define a subtree to be any tree that can be obtained by pruning that is collapsing any number of its internal nodes. • Index terminal nodes by m; |T| denotes the number of terminal nodes in T.

Classification Trees • The target variable is a classification outcome taking values 1,2,...,K • The only changes needed is the criteria for splitting nodes and pruning the tree. • Measures of node impurity:

MARS • Multivariate Adaptive Regression Splines • Well suited for high-dim problems • A generalization of stepwise linear regression • It uses expansions in piesewise linear basis functions of the form:

MARS A reflected pair

MARS • where is a function in , or a product of two or more such functions. • Adaptive way to add basis functions:

MARS

MARS • The size of terms matters • M is the effective number of parameters in the model: this accounts both for the number of terms and the number of parameters when selecting the positions of the knots.

Boosting • Originally designed for classification problems • Can be extended to regression problems • Motivation: combines the output of many weak classifiers to produce a powerful “committee”

Adaboost • Consider a two-class problem: • G(X) is the classifier • Error rate: • Weak classifier: error rate is only slightly better than random guessing.

Adaboost • Boosting sequentially apply the week classification algorithm to repeatedly modified versions of the data, thereby producing a sequence of weak classifiers:

Adaboost Freund and Schapire 1997

Examples • 2000 trains and 10,000 tests • Weak classifier is just a stump: a two terminal node classification tree. • Error rate: 48.5%

Boosting Fits and Additive Model • Boosting is a way of fitting an additive expansion in a set of elementary “basis” functions. • For Adaboost, basis functions are weak classifiers • More generally, Additive Model:

Forward Stagewise Additive Modeling

Exponential Loss and AdaBoost • Adaboost is equivalent to forward additive modeling using the following loss function: • Forward step:

Exp Loss and AdaBoost 2(a) 2(c) 2(b) 2(d)

Why Exp Loss? • Computational reason • Leads the simple reweighting scheme • Question: what does adaboostestimate? • Modeling:

Loss functions and Robustness

Loss functions and Robustness For regression Huber Loss

Boosting for Regression • Iteratively fit the residuals

Exercise (Not Homework) • 1. Reproduce Figure 10.2

Lecture 10. Trees and Boosting

Lecture 10. Trees and Boosting

Presentation Transcript

Boosting and Additive Trees (2)

Chapter 10 Trees

Chapter 10 Boosting

Boosting and Additive Trees (Part 1)

Information Gain, Decision Trees and Boosting

Ch. 10 Trees

Lecture 17 Trees

Lectures 17,18 – Boosting and Additive Trees

Classification and Regression trees: CART BOOSTING AND BAGGING RANDOM FOREST

Decision Trees and Boosting

Chapter 10 Trees

Chapter 10 Trees and Binary Trees

Lecture 19: Trees

Lecture 04 Trees

Chapter 10 Trees

10. Binary Trees

Chapter 10 Graphs and Trees

CSE332: Data Abstractions Lecture 10: More B-Trees

Decision Trees and Boosting

Lecture 19: Trees

10. Binary Trees

Chapter 10 Trees