1 / 20

Decision Trees

Decision Trees. Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University. Classification. Stages in classification

altmanm
Download Presentation

Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees Jyh-Shing Roger Jang (張智星) CSIE Dept, National Taiwan University

  2. Classification • Stages in classification • Model construction: Given a collection of records (training set), where each record has a set of attributes, including the class, we want to find a model (classifier) for predicting the class as a function of other attributes. • Model evaluation: Use previously unseen records (test set) to test the model, and hopefully the model should be able to assign a class as accurately as possible. • Model application: Apply the model directly.

  3. Stages in Classification • Stages in Classification

  4. Example

  5. Examples Classification/Regression Tasks • Classification • Predict the trend (up or down) of stock markets • Predict tumors as benign or malignant • Classify credit card transactions as legitimate or fraudulent • Categorize news articles as finance, weather, entertainment, sports, etc. • Regression • Predict the temperature in 3 hours from now • Predict tomorrow’s gold/oil price • Estimate the paths of typhoon

  6. Methods for Classification • Numerous methods for classification • Decision trees • Minimum-distance classifiers • Artificial neural networks • Naïve Bayes classifiers • Quadratic classifiers • Gaussian-mixture-model classifiers • Support vector machines • Rule-based methods • …

  7. Decision Tree Induction • Again, many algorithms • Hunt’s algorithm (one of the earliest) • CART (classification and regression trees) • ID3, C4.5 • SLIQ, SPRINT • …

  8. Decision Tree Induction • Again, many algorithms • Hunt’s algorithm (one of the earliest) • CART (classification and regression trees) • ID3, C4.5 • SLIQ, SPRINT • …

  9. General Steps in Tree Induction • Idea • We want to send all the training data along the tree until it reach the leaves where the data should be as “pure” as possible. • Let D be the data set that reach a node • General procedure • If D contains records belonging to the same class y, then mark the node as a leaf with class y. • Otherwise use a test to split the data set based on an attribute to create subtree recursively.

  10. Tree Induction • Issues in tree induction • How to split the dataset at a node: Split the dataset based on a greedy search to optimize a certain criterion/test • When to stop splitting: When the “impurity measure” is less than a threshold

  11. How to Specify Test? • Depends on attribute types • Nominal • Car types: Family, sports, luxury, etc • Ordinal • T-shirt size: Small, median big, etc • Continuous • Temperature: 10.3, 25.6, 38, etc • Depends on number of ways to split • Binary (2-way) split • Multi-way split Aka “factor”

  12. CarType Family Luxury Sports CarType CarType {Sports, Luxury} {Family, Luxury} {Family} {Sports} Splitting Based on Nominal/Ordinal Attributes • Multi-way split • Use as many partitions as distinct values • Binary split • Divides values into two subsets via optimal partitioning OR

  13. Splitting Based on Continuous Attributes • Multi-way split • Discretization to form an ordinal categorical attribute • Binary split (A<v or A v) • Consider all possible splits to find the best one

  14. To Determine the Best Split • Goal • Nodes with homogeneous (pure) class distribution are preferred • Need a measure of node impurity (which should be keep as low as possible during split selection) Non-homogeneous, High degree of impurity Homogeneous, Low degree of impurity

  15. Measures of Node Impurity • Numerous measures of node impurity • Gini index • Entropy • Classification error For 2-class problem

  16. Impurity Measure: Gini Index P(j|t) is the relative frequency of class j at node t • Gini index for a given node t: • Extreme values • Minimum = 0 • Maximum = 1/(# of classes) • Examples “confusion” in HW4 All records in the same class Records equally distributed among all classes

  17. Splitting Based on Gini Index • The quality of splitting a node t into k childrens • ti = node of child i • ni = number of records at ti • n = number of records at note t “total confusion” in HW4

  18. Gini Index for General Binary Split • Example for computing Gini index for binary split B? Yes No Node N1 Node N2 Gini(N1) = 1 – (5/6)2 – (2/6)2= 0.194 Gini(N2) = 1 – (1/6)2 – (4/6)2= 0.528 Ginisplit(B) = 7/12 * 0.194 + 5/12 * 0.528= 0.333

  19. Gini Index for Nominal Attributes • For each child, obtain counts for each class • Compute the Gini index for each child • Compute the Gini index for the split Multi-way split Two-way split (find best partition of values)

  20. Sorted Values Split Positions Gini Index for Binary Split on Continuous Attributes • For each attribute • Sort the attribute values • Linearly scan these value, and update the count matrix and compute Gini index for a new value each time • Choose the split that has the smallest Gini index

More Related