1 / 20

Iterative Dichotomiser ( ID3) Algorithm

By: Phuong H. Nguyen Professor: Lee, Sin-Min Course: CS 157B Section: 2 Date: 05/08/07 Spring 2007. Iterative Dichotomiser ( ID3) Algorithm. Overview. Introduction Entropy Information Gain Detailed Example Walkthrough Conclusion References. Introduction.

Download Presentation

Iterative Dichotomiser ( ID3) Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. By: Phuong H. Nguyen Professor: Lee, Sin-Min Course: CS 157B Section: 2 Date: 05/08/07 Spring 2007 Iterative Dichotomiser (ID3) Algorithm

  2. Overview • Introduction • Entropy • Information Gain • Detailed Example Walkthrough • Conclusion • References

  3. Introduction • ID3 algorithm is a greedy algorithm for decision tree construction developed by Ross Quinlan in 1987. • ID3 algorithm uses information gain to select best attribute as root node or decision nodes: • Max-Gain approach (highest information gain) for splitting

  4. Entropy • Measure the impurity or randomness of an example collection. • A quantitative measurement of the homogeneity of a set of examples. • Basically, it tells us how random the given examples are according to the target classification class.

  5. Entropy (cont.) • Entropy (S) = -Ppositive log2Ppositive– Pnegative log2Pnegative Where: - Ppositive = proportion of positive examples • Pnegative = proportion of negative examples Example: If S is a collection of 14 examples with 9 YES and 5 NO, then: Entropy(S) = - (9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940

  6. Entropy (cont.) • More than two classification classes: Entropy(S) = ∑ -p(i) log2 p(i) • Result for any entropy calculation will be between 0 and 1. • Two special cases: If Entropy(S) = 1(max value) members are split equally between the two classes (min uniformity, max randomness) If Entropy(S) = 0 all members in S belong to strictly one class (max uniformity, min randomness)

  7. Information Gain • A statistical property measures how well a given attribute separates example collection into target classes. • ID3 algorithm uses Max-Gain approach (highest information gain) to select best attribute for root node and decision nodes.

  8. Information Gain (cont.) • Gain(S, A) = Entropy(S) – ∑((|Sv| / |S|) *Entropy(Sv)) Where: • A is an attribute of collection S • Sv = subset of S for which attribute A has value v • |Sv| = number of elements in Sv • |S| = number of elements in S

  9. Information Gain (cont.) Example: Collection S = 14 examples (9 YES - 5 NO) Wind speed is one attribute of S = {Weak, Strong} • Weak = 8 occurrences (6 YES - 2 NO) • Strong = 6 occurrences (3 YES - 3 NO) Calculation: Entropy(S) = - (9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940 Entropy(Sweak) = - (6/8)*log2(6/8) - (2/8)*log2(2/8) = 0.811 Entropy(Sstrong) = - (3/6)*log2(3/6) - (3/6)*log2(3/6) = 1.00 Gain(S,Wind) = Entropy(S) - (8/14)*Entropy(Sweak) - (6/14)*Entropy(Sstrong) = 0.940 - (8/14)*0.811 - (6/14)*1.00 = 0.048 • Then for each attribute in S, the information gain is calculated in the same way. • The highest gain attribute is used in the root node or decision node.

  10. Example Walkthrough • An example of a company sending out some promotions to various houses and recording a few facts about each house and also whether people responded or not:

  11. Example Walkthrough (cont.) The target classification is “Outcome” which can be “Responded” or “Nothing”. The attributes in collection are “District, House Type, Income, Previous Customer, and Outcome”. They have the following values: - District = {Suburban, Rural, Urban} - House Type = {Detached, Semi-detached, Terrace} - Income = {High, Low} - Previous Customer = {No, Responded} - Outcome = {Nothing, Responded}

  12. Example Walkthrough (cont.) Detailed Calculation for Gain(S, District): Entropy (S = [9/14 responses, 5/14 no responses]) = -9/14 log2 9/14 - 5/14 log2 5/14 = 0.40978 + 0.5305 = 0.9403 Entropy(SDistrict = Suburban= [2/5 responses, 3/5 no responses]) = -2/5 log2 2/5 – 3/5 log2 3/5 = 0.5288 + 0.4422 = 0.9709 Entropy(SDistrict = Rural= [4/4 responses, 0/4 no responses]) = -4/4 log2 4/4 = 0 Entropy(SDistrict = Urban= [3/5 responses, 2/5 no responses]) = -3/5 log2 3/5 – 2/5 log2 2/5 = 0.4422 + 0.5288 = 0.9709 Gain(S, District) = Entropy(S) – ((5/14) * Entropy(SDistrict = Suburban) + (5/14) * Entropy(SDistrict = Urban) + (4/14) * Entropy(SDistrict = Rural)) = 0.9403 – ((5/14)*0.9709 + (5/14)*0 + (4/14)*0.9709) = 0.9403 – 0.3468 – 0 – 0.34678 = 0.2468

  13. Example Walkthrough (cont.) • So we now have: Gain(S, District) = 0.2468 • Apply the same process to the remaining 3 attributes of S, we get: • - Gain(S,House Type) = 0.049 • - Gain(S,Income) = 0.151 • - Gain(S,Previous Customer) = 0.048 • Comparing the information gain of the four attributes, we see that “District” has the highest value. • “District” will be the root node of the decision tree. • So far the decision tree will look like following: District Suburban Urban Rural ??? ??? ???

  14. Example Walkthrough (cont.) • Apply the same process to the left side of the root node (Suburban), we get: • - Entropy(Ssuburban) = 0.970 • - Gain(Ssuburban,House Type) = 0.570 • - Gain(Ssuburban,Income) = 0.970 • - Gain(Ssuburban,Previous Customer) = 0.019 • The information gain of “Income” is highest: • “Income” will be the decision node. • Then decision tree will look like following: District Suburban Urban Rural Income ??? ???

  15. Example Walkthrough (cont.) For the center of the root node (Rural), it is a special case because: - Entropy(SRural) = 0  all members in SRuralbelong to strictly one target classification class, which is “Responded” Thus, we skip all the calculation and add the corresponding target classification value to the tree. Then decision will look like following: District Suburban Urban Rural Income Responded ???

  16. Example Walkthrough (cont.) • Apply the same process to the right side of the root node (Urban), we get: • Entropy(Surban) = 0.970 • Gain(Surban,House Type) = 0.019 • Gain(Surban,Income) = 0.019 • Gain(Surban,Previous Customer) = 0.970 • The information gain of “Previous Customer” • is highest: • “Previous Customer” will be the decision node. • Then decision tree will look like following: District Suburban Urban Rural Income Responded Previous Customer

  17. For “Income” side, we have: High  Nothing (3/3)  Entropy = 0 and Low  Responded (2/2)  Entropy = 0 For “Previous Customer” side, we have: No  Responded (3/3)  Entropy = 0 and Yes  Nothing (2/2)  Entropy = 0  No longer need to split the tree; therefore, the final decision tree will look like following: District Suburban Urban Rural Income Responded Previous Customer High Low No Yes Nothing Responded Responded Nothing

  18. District Suburban Urban Rural Income Responded Previous Customer High Low No Yes Nothing Responded Responded Nothing • From the above decision tree, some rules can be extracted: • Examples: • (District = Suburban) AND (Income = Low)  (Outcome = Responded) • (District = Rural)  (Outcome = Responded) • (District = Urban) AND (Previous Customer = Yes)  (Outcome = Nothing) • and so on…

  19. Conclusion • ID3 algorithm is easy to implement if we know how it works. • ID3 algorithm is one of the most important techniques in data mining. • Industry has shown that ID3 algorithm has been effective for data mining.

  20. References • Dr. Lee’s Slides, San Jose State University, Spring 2007, http://www.cs.sjsu.edu/%7Elee/cs157b/cs157b.html • "Building Decision Trees with the ID3 Algorithm", by: Andrew Colin, Dr. Dobbs Journal, June 1996 • "Incremental Induction of Decision Trees", by Paul E. Utgoff, Kluwer Academic Publishers, 1989 • http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm • http://decisiontrees.net/node/27

More Related