1 / 19

Decision Trees

Learn how to construct effective identification trees using the ID3 algorithm. Explore methods for predicting sunburns and creating the smallest tree consistent with samples in real data. Utilize information gain and simplify rules for optimization.

lmickens
Download Presentation

Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  2. Resources • Artificial Intelligence, 3rd Edition,Patrick Henry Winston, Ch. 21 • http://www.cse.unr.edu/~sushil/class/games/notes/ch21.pdf • Artificial Intelligence: A Modern Approach, 3rd Edition, Russell , Norvig, Ch. 18.3,pg. 531-554 Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  3. Identification Tree • Type of Decision Tree • The Winston book call their methods SPROUTER and PRUNER, but it’s basically simplified example of an algorithm called ‘ID3’ Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  4. Identification Tree • Sunburn Dataset • Select one attribute to be predicted/identified • All other attributes used to identify the selected target attributed, or classification Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  5. Identification Tree • Predict Sunburns • More than one tree can correctly identify the dataset • Some trees generalize information better • Smaller trees tend to be better (Occam’s Razor) • The smallest identification tree consistent with the samples is the one most likely to identify unknown objects correctly • How to we construct the smallest/‘best’ tree? Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  6. Identification Tree • Computationally impractical to find the smallest tree when many tests are required • Use a procedure that builds small trees, but is NOT guaranteed to build the SMALLEST possible tree. Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  7. Identification Tree • Split the samples based on the best attribute • A single attribute that comes closest to correctly grouping the samples based on the target classification • Number of samples in homogeneous sets 4 2 0 3 Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  8. Identification Tree • Select best attribute, and repeat with remaining attributes • Must repeat for each heterogeneous branch • Only split the samples that went down that branch • The next attribute you select for one branch may be different from the attribute you select for another branch, even if they share the same parent node Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  9. Identification Tree • In real data, unlikely to get ANY homogeneous branches • Need a measure of inhomogeneity/disorder/entropy • Minimize disorder/entropy (or maximize Information Gain) • Many different measurements/calculations that can be used • Example: Entropy(S) Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  10. Identification Tree • Results using new disorder measurement Hair Attribute Disorder Calculation All Disorder Calculations (second Node) All Disorder Calculations (first Node) Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  11. Identification Tree • Information Gain • Expected reduction in entropy due to sorting Sample Set S on attribute A Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  12. Identification Tree • SPROUTER algorithm Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  13. Tree to Rules • Each path, from root to leaf, is a rule • The value of each attribute node are the antecedents • The leaf value is the consequence Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  14. Simplify Rules • For each rule, drop antecedents if it won’t change what the rule does on all the samples Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  15. Eliminate Rules • Once all individual rules have been simplified, you can eliminate unnecessary rules • Create a “default rule” eliminates the most rules • In the event of a tie, make up some metric to break the tie • Examples: Covers the most common consequent in the sample setLeaves the simplest rules Most common consequent Simplest rules Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  16. Eliminate Rules Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  17. Decision Tree Algorithms • ID3 (Iterative Dichotomiser 3) • Gets stuck on local optimums, Greedy • Not good on attributes with continuous values • C4.5/J4.8 • Extension of ID3 • Better handling of attributes with continuous values • Can handle training data where some attribute values are missing/unknown • Handling attributes with different costs • Pruning Tree after creation • C5.0/See5.0 • Commercial, closed-source • Not talking about this, but it exists Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  18. C4.5 • Pruning • Helps avoid over fitting • Prepruning • Deciding not to split a set of sample any further based on some heuristic, during tree construction • Usually based on some statistical test • Chi-squared • Postpruning • Subtree Replacement • Subtree Raising Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

  19. C4.5 • Continuous Values • For an attribute with continuous values, sort all samples based on that attribute • Mark a ‘split point’ between samples where the classification changes • Calculate information gain on all split points • Select split point with highest information gain and use for that attribute Evolutionary Computing Systems Lab (ECSL), University of Nevada, Reno

More Related