1 / 41

Decision Trees

Decision Trees. 10-601 Recitation 1/17/08 Mary McGlohon mmcgloho+10601@cs.cmu.edu. Announcements. HW 1 out- DTs and basic probability Due Mon, Jan 28 at start of class Matlab High-level language, specialized for matrices Built-in plotting software, lots of math libraries

jacobst
Download Presentation

Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees • 10-601 Recitation • 1/17/08 • Mary McGlohon • mmcgloho+10601@cs.cmu.edu

  2. Announcements • HW 1 out- DTs and basic probability • Due Mon, Jan 28 at start of class • Matlab • High-level language, specialized for matrices • Built-in plotting software, lots of math libraries • On campus lab machines • Interest in tutorial? • Smiley Award Plug

  3. AttendClass? Raining Represent as a logical expression. • Represent this tree as logical expression. • AttendClass = Yes If • Raining = False OR • Material = New AND Before10am = False OR • Is10601 = Yes True False Yes Is10601 True False Yes Material New Old No Before10 True False No Yes True False Yes

  4. AttendClass? Raining Represent as a logical expression. • Represent this tree as logical expression. • AttendClass = Yes If • Raining = False OR • Material = New AND Before10am = False OR • Is10601 = Yes True False Yes Is10601 AttendClass = Yes if: (Raining = False) OR (Is10601 = True) OR (Material = New AND Before10 =False) True False Yes Material New Old No Before10 True False No Yes True False Yes

  5. Split decisions • There are other trees logically equivalent. • How do we know which one to use?

  6. Split decisions • There are other trees logically equivalent. • How do we know which one to use? • Depends on what is important to us.

  7. Information Gain • Classically we rely on “information gain”, which uses the principle that we want to use the least number of bits, on average, to get our idea across. • Suppose I want to send a weather forecast with 4 possible outcomes: Rain, Sun, Snow, and Tornado. 4 outcomes = 2 bits. • In Pittsburgh there’s Rain 90% of the time, Snow 5%, Sun 4.9%, and Tornado .01%. So if you assign Rain to a 1-bit message, you rarely send >1 bit.

  8. Entropy

  9. Entropy Set S has 6 positive, 2 negative examples. H(S) = -.75 log2(.75) - .25 log2(.25) =

  10. Conditional Entropy “The average number of bits it would take to encode a message Y, given knowledge of X”

  11. Conditional Entropy H(Attend | Rain) = H(Attend | Rain=T)*P(Rain=T) + H(Attend|Rain=F)*P(Rain=F)

  12. Conditional Entropy H(Attend | Rain) = H(Attend | Rain=T)*P(Rain=T) + H(Attend|Rain=F)*P(Rain=F)=1 * 0.5 + 0 * 0.5 = 0.5 Entropy of this set = 1 Entropy of this set = 0

  13. Information Gain IG(S,A) = H(S) - H(S|A) “How much conditioning on attribute A increases our knowledge (decreases entropy) of S.

  14. Information Gain IG(Attend,Rain) = H(Attend) - H(Attend|Rain)= .8113 - .5 = .3113

  15. What about this? For some dataset, could we ever build this DT? Material New Old Raining Before10 True False True False Yes Yes Is10601 Raining True False True False Yes No Yes Yes Is10601 True False Yes No

  16. What about this? For some dataset, could we ever build this DT? Material New Old Raining Before10 True False True False Yes Yes Is10601 Raining True False True False Yes No Yes Yes Is10601 What if you were taking 20 classes, and it rains 90% of the time? True False Yes No

  17. What about this? For some dataset, could we ever build this DT? Material New Old Raining Before10 If most information is gained from Material or Before10, we won’t ever need to traverse to 10-601. So even a bigger tree (node-wise) may be “simpler”, for some sets of data. True False True False Yes Yes Is10601 Raining True False True False Yes No Yes Yes Is10601 What if you were taking 20 classes, and it rains 90% of the time? True False Yes No

  18. Node-based pruning • Until further pruning is harmful, • For each node n in trained tree T, • Let Tn’ be T without n (and descendents). Assign removed node to be “best choice” under that traversal. • Record error of Tn’ on validation set. • Let T= Tk’ where Tk’ is pruned tree with best performance on validation set.

  19. Node-based pruning For each node, record performance on validation set of tree without node. Material New Old Suppose our initial tree has 0.7 accurate performance on validation. Raining Before10 True False True False Yes Yes Is10601 Raining True False True False Yes No Yes Is10601 True False Yes No

  20. Node-based pruning For each node, record performance on validation set of tree without node. Material New Old Suppose our initial tree has 0.7 accurate performance on validation. Raining Before10 True False True False Let’s test this node... Yes Yes Is10601 Raining True False True False Yes No Yes Is10601 True False Yes No

  21. Node-based pruning For each node, record performance on validation set of tree without node. Material New Old Suppose our initial tree has 0.7 accurate performance on validation. Raining Before10 Text True False True False Yes Yes Is10601 Yes Suppose that most examples where Material=New and Before10=True are “Yes”. Our new subtree has “Yes” here. True False Yes No

  22. Node-based pruning For each node, record performance on validation set of tree without node. Material New Old Suppose our initial tree has 0.7 accurate performance on validation. Raining Before10 Text True False True False Yes Yes Is10601 Yes Suppose that most examples where Material=New and Before10=True are “Yes”. Our new subtree has “Yes” here. True False Yes No Now, test this tree!

  23. Node-based pruning For each node, record performance on validation set of tree without node. Material New Old Suppose our initial tree has 0.7 accurate performance on validation. Raining Before10 Text True False True False Yes Yes Is10601 Yes Suppose that most examples where Material=New and Before10=True are “Yes”. Our new subtree has “Yes” here. True False Yes No Now, test this tree!

  24. Node-based pruning For each node, record performance on validation set of tree without node. Material New Old Suppose our initial tree has 0.7 accurate performance on validation. Raining Before10 Text True False True False Yes Yes Is10601 Yes Suppose that most examples where Material=New and Before10=True are “Yes”. Our new subtree has “Yes” here. True False Yes No Suppose we get accuracy of 0.73 on this pruned tree. Repeat the test procedure by removing a different node from the original tree...

  25. Node-based pruning Material Try this tree (with a different node pruned)... New Old Raining Before10 True False True False Yes Yes Is10601 Raining True False True False Yes No Yes Is10601 True False Yes No

  26. Node-based pruning Material Try this tree (with a different node pruned)... New Old Raining Before10 True False True False Yes Yes No Raining Now, test this tree and record its accuracy. True False Yes Is10601 True False Yes No

  27. Node-based pruning Material Try this tree (with a different node pruned)... New Old Once we test all possible prunings, modify our tree T with the pruning that has the best performance. Repeat the entire pruning selection procedure on new T, replacing T each time with the best performing pruned tree, until we no longer gain anything by pruning. Raining Before10 True False True False Yes Yes No Raining Now, test this tree and record its accuracy. True False Yes Is10601 True False Yes No

  28. Rule-based pruning Material New Old Raining Before10 True False True False Yes Yes Is10601 Raining 1. Convert tree to rules, one for each leaf: IF Material=Old AND Raining = False THEN Attend = Yes IF Material=Old AND Raining=True AND Is601=True THEN Attend=Yes ... True False True False Yes No Yes Is10601 True False Yes No

  29. Rule-based pruning • 2. Prune each rule. For instance, to prune this rule: • IF Material=Old AND Raining = F THEN Attend = T • Test potential rule without preconditions on validation set, compare to performance of original rule on set. • IF Material=OLD THEN Attend=T • IF Raining=F THEN Attend = T

  30. Rule-based pruning • Suppose we got the following accuracy for each rule: • IF Material=Old AND Raining = F THEN Attend = T -- 0.6 • IF Material=OLD THEN Attend=T -- 0.5 • IF Raining=F THEN Attend = T -- 0.7

  31. Rule-based pruning • Suppose we got the following accuracy for each rule: • IF Material=Old AND Raining = F THEN Attend = T -- 0.6 • IF Material=OLD THEN Attend=T -- 0.5 • IF Raining=F THEN Attend = T -- 0.7 • Then, we would keep the best one and drop the others.

  32. Rule-based pruning • Repeat for next rule, comparing the original rule with each rule with one precondition removed. • IF Material=Old AND Raining=T AND Is601=T then Attend=T • If Material=Old AND Raining=T then Attend=T • If Material=Old AND Is601=T then Attend=T • If Raining=T and Is601=T then Attend=T

  33. Rule-based pruning • Repeat for next rule, comparing the original rule with each rule with one precondition removed. • IF Material=Old AND Raining=T AND Is601=T then Attend=T-- 0.6 • If Material=Old AND Raining=T then Attend=T-- 0.7 • If Material=Old AND Is601=T then Attend=T-- 0.3 • If Raining=T and Is601=T then Attend=T-- 0.65

  34. Rule-based pruning • Repeat for next rule, comparing the original rule with each rule with one precondition removed. • IF Material=Old AND Raining=T AND Is601=T then Attend=T-- 0.6 • If Material=Old AND Raining=T then Attend=T-- 0.7 • If Material=Old AND Is601=T then Attend=T-- 0.3 • If Raining=T and Is601=T then Attend=T-- 0.65 • If a shorter rule works better, we may also choose to further prune on this step before moving on to next leaf. • If Material=Old AND Raining=T then Attend=T-- 0.7 • If Material=Old then Attend=T-- 0.3 • If Raining = T then Attend = T-- 0.2

  35. Rule-based pruning • Repeat for next rule, comparing the original rule with each rule with one precondition removed. • IF Material=Old AND Raining=T AND Is601=T then Attend=T-- 0.6 • If Material=Old AND Raining=T then Attend=T-- 0.75 • If Material=Old AND Is601=T then Attend=T-- 0.3 • If Raining=T and Is601=T then Attend=T-- 0.65 • If a shorter rule works better, we may also choose to further prune on this step before moving on to next leaf. • If Material=Old AND Raining=T then Attend=T-- 0.75 • If Material=Old then Attend=T-- 0.3 • If Raining = T then Attend = T-- 0.2 Well, maybe not this time!

  36. Rule-based pruning • Once we have done the same pruning procedure for each rule in the tree.... • 3. Order the ‘kept rules’ by their accuracy, and do all subsequent classification with that priority. • -IF Material=Old AND Raining=T THEN Attend=T-- 0.75 • -IF Raining=F THEN Attend = T -- 0.7 • -....(and so on for other pruned rules)... • (Note that you may wind up with a differently-structured DT than before, as discussed in class)

  37. Adding randomness Raining What if you didn’t know if you had new material? For instance, you wanted to classify this: True False Yes Is10601 True False Yes Material New Old No Before10 True False No Yes

  38. Adding randomness Raining What if you didn’t know if you had new material? For instance, you wanted to classify this: True False Yes Is10601 True False Yes Material where to go? New Old No You could look at training set, and see that when Rain=T an 10601=F, p fraction of the examples had new material. Then flip a p-biased coin and descend the appropriate branch. But that might not be the best idea. Why not? Before10 True False No Yes

  39. Adding randomness Also, you may have missing data in the training set. Raining True False There are also methods to deal with this using probability. “Well, 60% of the time when Rain and not 601, there’s new material (when we know there is new material). So we’ll just randomly select 60% of rainy, non-601 examples where we don’t know the material, to be old material. Yes Is10601 True False Yes ?

  40. Adventures in Probability • That approach tends to work well. Still, we may have the following trouble. • What if there aren’t very many training examples where Rain = True and 10601=False? Wouldn’t we still want to use examples where Rain=False to get the missing value? • Well, it “depends”. Stay tuned for lecture next week!

More Related