Data Mining CSCI 307, Spring 2019 Lecture 16

Data MiningCSCI 307, Spring 2019Lecture 16 Constructing Trees Covering algorithms

Computing the Gain Ratio Gain ratio corrects the information gain by taking the intrinsic information of a split into account. Example: intrinsic information for ID code info([1,1,...,1]) = 14 x (-1/14 x log(1/14)) = 3.807bits Value of attribute decreases as intrinsic information gets larger Definition of gain ratio: Example: 0.940 bits 3.807 bits gain(attribute) intrinsic_info(attribute) gain_ratio(ID code) = = 0.246 gain_ratio(attribute) =

Gain Ratios for Outlook Outlook Info: 0.693 Gain:0.940-0.693 0.247 Split info: Gain Ratio: To calculate the gain ratio, we first need to calculate the split information, i.e.info([5,4,5]) Now, calculate the ratio, gain/info([5,4,5])

Gain Ratios for Weather Data Humidity Info: 0.788 Gain:0.940-0.788 0.152 Split info:info([7,7]) 1.000 Gain Ratio:0.152/1 0.152 Outlook Info: 0.693 Gain:0.940-0.693 0.247 Split info: Gain Ratio: Temperature Info: 0.911 Gain:0.940-0.911 0.029 Split info:info([4,6,4] 1.557 Gain Ratio:0.029/1.557 0.019 Windy Info: 0.892 Gain:0.940-0.892 0.048 Split info:info([8,6]) 0.985 Gain Ratio:0.048/0.9850.049

More on the Gain Ratio • “Outlook” still comes out top • However: “ID code” has greater gain ratio • Standard fix: ad hoc test to prevent splitting on that type of attribute • Problem with gain ratio: it may overcompensate • May choose an attribute just because its intrinsic information is very low • Standard fix: only consider attributes with greater than average information gain

Discussion • This top-down induction of decision trees: essentially the same as ID3 algorithm developed by Ross Quinlan • Gain ratio just one modification of this basic algorithm • Eventually turned into ==> C4.5: deals with numeric attributes, missing values, noisy data • Similar approach: CART (classification and regression tree) algorithm. • There are many other attribute selection criteria! (But little difference in accuracy of result)

Covering Algorithm • Convert decision tree into a rule set • Straightforward, but rule set overly complex • More effective conversions are not trivial • Instead, can generate rule set directly • for each class in turn find rule set that covers all instances in it (excluding instances not in the class) • Called a coveringapproach: • at each stage a rule is identified that “covers” some of the instances

Example: Generating a Rule

Rules versus Trees Corresponding decision tree: (produces exactly the same predictions) • But: rule sets can be more perspicuous when decision trees suffer from replicated subtrees • Also: in multiclass situations, covering algorithm concentrates on one class at a time whereas decision tree learner takes all classes into account

Simple Covering Algorithm • Generates a rule by adding tests that maximize rule’s accuracy • Similar to situation in decision trees: problem of selecting an attribute to split on • But: decision tree inducer maximizes overall purity Each new test reduces the rule's coverage:

Selecting a Test • Goal: maximize accuracy • t total number of instances covered by rule • ppositive examples of the class covered by rule • t–pnumber of errors made by rule • Select test that maximizes the ratio p/t • We are finished whenp/t=1or the set of instances cannot be split any further

The Contact Lenses Data

Example: Contact Lens Data If ? then recommendation = hard Rule we seek: Possible Tests: Age = Young Age = Pre-presbyopic Age = Presbyopic Spectacle prescription = Myope Spectacle prescription = Hypermetrope Astigmatism = no Astigmatism = yes Tear production rate = Reduced Tear production rate = Normal

Modified Rule and Resulting Data Rule with the best test is added: Instances covered by modified rule:

Further Refinement Current State: Possible Tests: Age = Young Age = Pre-presbyopic Age = Presbyopic Spectacle prescription = Myope Spectacle prescription = Hypermetrope Tear production rate = Reduced Tear production rate = Normal

Modified Rule and Resulting Data Rule with the best test is added: Instances covered by modified rule:

Further Refinement Current State: Possible Tests: Age = Young Age = Pre-presbyopic Age = Presbyopic Spectacle prescription = Myope Spectacle prescription = Hypermetrope

The Result Final Rule: This does not cover all the "hard lens" rules.

Data Mining CSCI 307, Spring 2019 Lecture 16

Data Mining CSCI 307, Spring 2019 Lecture 16

Presentation Transcript

Data Mining CSCI 307 Spring, 2019

Data Mining CSCI 307, Spring 2019 Lecture 13

Data Structures CSCI 132, Spring 2019 Lecture 21 Doubly Linked Lists

CSci 160 Lecture 16

Data Structures CSCI 132, Spring 2014 Lecture 17 Backtracking

CSci 125 Lecture 16

Data Structures CSCI 132, Spring 2019 Lecture 14 Review for Exam 1

Data Structures CSCI 132, Spring 2019 Lecture 18 Recursion and Look-Ahead