1 / 26

Lecture 10

Tree-based methods, neutral networks. Lecture 10. Tree-based methods. Statistical methods in which the input space (feature space) is partitioned into a set of cuboids (rectangles), and then a simple model is set up in each one. Why decision trees. Compact representation of data

holly-duffy
Download Presentation

Lecture 10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tree-based methods, neutral networks Lecture 10

  2. Tree-based methods Statistical methods in which the input space (feature space) is partitioned into a set of cuboids (rectangles), and then a simple model is set up in each one

  3. Why decision trees • Compact representation of data • Possibility to predict outcome of new observations

  4. Tree structure • Root • Nodes • Leaves (terminal nodes) • Parent-child relationship • Condition • Label is assigned to a leaf Cond.1 Cond.2 Cond.3 Cond.4 Cond.6 Cond.5 N4 N5 N6 N7

  5. Example Body temperature? Root node Warm Cold Internal node Non-mammals Gives birth? Yes No Leaf nodes Mammals Non-mammals

  6. How to build a decision tree: Hunt’s algorithm Proc Hunt(Dt,t) • Given Data set Dt={(X1i,..Xpi, Yi), i=1..n}, t-curr.node • If all Yi are equal mark t as leaf with label Yi • If not, use the test condition to split into Dt1…Dtn, create children t1…tn and run Hunt(Dt1,t1),…, Hunt(Dtn,tn)

  7. Hunt’s algorithm example 20 X1 <9 >=9 X2 X2 <16 >=16 <7 >=7 0 1 1 X1 10 <15 >=15 1 0 0 10 20

  8. Hunt’s algorithmWhat if some combinations of attributes are missing? Empty node Is assigned the label representing the majority class among the records (instances, objects, cases) in its parent node. All records in a node have identical attributes The node is declared a leaf node with the same class label as the majority class of this node

  9. CART: Classification and regression trees Regression trees • Given Dt={(X1i,..Xpi, Yi), i=1..n}, Y – continuous, build a tree that will fit the data best Classification trees • Given Dt={(X1i,..Xpi, Yi), i=1..n}, Y – categorical, build a tree that will classify the observations best

  10. A CART algorithm: Regression trees Aim: Want to find ; computationally expensive to test all possible splits. Instead Splitting variables and split points Consider a splitting variable j and a split point s, and define the pair of half planes We seek the splitting variable j and split point s that solve

  11. Post-pruning How large tree to grow? Too large – overfitting! Grow a large tree T0 Then prune this tree using post-pruning Define a subtree T and index its terminal nodes by m, with node m representing region Rm. Let |T| denote the number of terminal nodes in T and set where Then minimize this expression, using cross-validation to select the factor  that penalizes complex trees.

  12. CART: Classification trees • For each node define proportions • Define measure of impurity

  13. Design issues of decision tree induction How to split the training records We need a measure for evaluating the goodness of various test conditions How to terminate the splitting procedure 1) Continue expanding nodes until either all the records belong to the same class or all the records have identical attribute values 2) Define criteria for early termination

  14. How to split: CART Select splitting with max information gain where I(.) is the impurity measure of a given node, N is the total number of records at the parent node, and N(vj) is the is the number of records associated with the child node vj

  15. How to split: C4.5 Impurity measures such as Gini index tend to favour attributes that have a large number of distinct values Strategy 1: Restrict the test conditions to binary splits only Strategy 2: Use the gain ratio as splitting criterion

  16. Constructing decision trees Home owner Yes No Defaulted = No Marital status Not married Married Defaulted = No Income ≤ 100 K > 100 K Defaulted = ?

  17. Expressing attribute test conditions Binary attributes Binary splits Nominal attributes Binary or multiway splits Ordinal attributes Binary or multiway splits honoring the order of the attributes Continuous attributes Binary or multiway splits into disjoint interval

  18. Characteristics of decision tree induction • Nonparametric approach (no underlying probability model) • Computationally inexpensive techniques have been dveloped for constructing decision trees. Once a decision tree has been built, classification is extremely fast • The presence of redundant attributes will not adversely affect the accuracy of decision trees • The presence of irrelevant attributes can lower the accuracy of decision trees, especially if no measures are taken to avoid overfitting • At the leaf nodes, the number of records may be too small (data fragmentation)

  19. Neural networks • Joint theoretical framework for prediction and classification

  20. Principal components regression (PCR) Extract principal components (transformation of the inputs) as derived features, and then model the target (response) as a linear function of these features y … z1 z2 zM … x1 x2 xp

  21. Neural networks with a single target Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of a sigmoid function of these features y z1 z2 … zM … x1 x1 xp

  22. Artificial neural networks Introduction from biology: • Neurons • Axons • Dendrites • Synapse Capabilities of neural networks: • Memorization (noise stable, fragmentary stable!) • Classification

  23. Terminology Feed-forward neural network • Input layer • [Hidden layer(s)] • Output layer … f1 fK z1 z2 … zM … x1 x2 xp

  24. Terminology • Feed-forward network • Nodes in one layer are connected to the nodes in next layer • Recurrent network • Nodes in one layer may be connected to the ones in previous layer or within the same layer

  25. Terminology Formulas for multilayer perceptron (MLP) • C1, C2combination function • g, ςactivation function • α0mβ0kbias of hidden unit • αimβjkweight of connection

  26. Recommended reading • Book, paragraph 9.2 • EM Reference: Tree Node Start with: • Book, paragraph 11 • EM Reference: Neural Network node

More Related