1 / 28

Lecture 10. Trees and Boosting

Lecture 10. Trees and Boosting. Instructed by Jinzhu Jia. Outline. Tree based methods CART MARS Boosting. Tree based methods. Tree-based methods partition the feature space into a set of rectangles, and then fit a simple model (like a constant) in each one.

jasper-roy
Download Presentation

Lecture 10. Trees and Boosting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 10. Treesand Boosting Instructed by Jinzhu Jia

  2. Outline • Tree based methods • CART • MARS • Boosting

  3. Tree based methods • Tree-based methods partition the feature space into a set of rectangles, and then fit a simple model (like a constant) in each one.

  4. Tree based methods

  5. Regression Trees • P inputs and a response (x_i,y_i),i=1,2,...,N • Goal: automatically decides on the splitting variables and split points, and also the topology (shape) of the tree • Alg: suppose we know the partitation • R_1,R_2,...,R_M

  6. Regression Trees • Greedy procedures

  7. Tree size • Large tree might over-fit the data • Small tree might not capture the important structure • Tree size is a tuning parameter • Stop the splitting process only when some minimum node size (say 5) is reached • Then this large tree is pruned using cost complexitypruning

  8. Pruning • Define a subtree to be any tree that can be obtained by pruning that is collapsing any number of its internal nodes. • Index terminal nodes by m; |T| denotes the number of terminal nodes in T.

  9. Classification Trees • The target variable is a classification outcome taking values 1,2,...,K • The only changes needed is the criteria for splitting nodes and pruning the tree. • Measures of node impurity:

  10. MARS • Multivariate Adaptive Regression Splines • Well suited for high-dim problems • A generalization of stepwise linear regression • It uses expansions in piesewise linear basis functions of the form:

  11. MARS A reflected pair

  12. MARS • where is a function in , or a product of two or more such functions. • Adaptive way to add basis functions:

  13. MARS

  14. MARS • The size of terms matters • M is the effective number of parameters in the model: this accounts both for the number of terms and the number of parameters when selecting the positions of the knots.

  15. Boosting • Originally designed for classification problems • Can be extended to regression problems • Motivation: combines the output of many weak classifiers to produce a powerful “committee”

  16. Adaboost • Consider a two-class problem: • G(X) is the classifier • Error rate: • Weak classifier: error rate is only slightly better than random guessing.

  17. Adaboost • Boosting sequentially apply the week classification algorithm to repeatedly modified versions of the data, thereby producing a sequence of weak classifiers:

  18. Adaboost Freund and Schapire 1997

  19. Examples • 2000 trains and 10,000 tests • Weak classifier is just a stump: a two terminal node classification tree. • Error rate: 48.5%

  20. Boosting Fits and Additive Model • Boosting is a way of fitting an additive expansion in a set of elementary “basis” functions. • For Adaboost, basis functions are weak classifiers • More generally, Additive Model:

  21. Forward Stagewise Additive Modeling

  22. Exponential Loss and AdaBoost • Adaboost is equivalent to forward additive modeling using the following loss function: • Forward step:

  23. Exp Loss and AdaBoost 2(a) 2(c) 2(b) 2(d)

  24. Why Exp Loss? • Computational reason • Leads the simple reweighting scheme • Question: what does adaboostestimate? • Modeling:

  25. Loss functions and Robustness

  26. Loss functions and Robustness For regression Huber Loss

  27. Boosting for Regression • Iteratively fit the residuals

  28. Exercise (Not Homework) • 1. Reproduce Figure 10.2

More Related