280 likes | 812 Views
ID3 and Decision tree. by Tuan Nguyen. ID3 and Decision tree. ID3 algorithm. Is the algorithm to construct a decision tree Using Entropy to generate the information gain The best value then be selected. ID3 and Decision tree. Entropy. The complete formula for entropy is:
E N D
ID3 and Decision tree by Tuan Nguyen
ID3 and Decision tree ID3 algorithm • Is the algorithm to construct a decision tree • Using Entropy to generate the information gain • The best value then be selected
ID3 and Decision tree Entropy • The complete formula for entropy is: E(S) = -(p+)*log2(p+ ) - (p_ )*log2(p_ ) • Where p+ is the positive samples • Where p_ is the negative samples • Where S is the sample of attributions
[29+,35-] A1=? True False [21+, 5-] [8+, 30-] ID3 and Decision tree Example E(A) = -29/(29+35)*log2(29/(29+35)) – 35/(35+29)log2(35/(35+29)) = 0.9937 E(TRUE) = - 21/(21+5)*log2(21/(21+5)) – 5/(5+21)*log2(5/(5+21)) = 0.7960 E(FALSE) = -8/(8+30)*log2(8/(8+30)) – 30/(30+8)*log2(30/(30+8)) = 0.7426 The Entropy of A1 is computed as the following: • The Entropy of True: • The Entropy of False:
Gain(S,A) = Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv) ID3 and Decision tree Information Gain • Gain (Sample, Attributes) or Gain (S,A) is expected reduction in entropy due to sorting S on attribute A So, for the previous example, the Information gain is calculated: • G(A1) = E(A1) - (21+5)/(29+35) * E(TRUE) - (8+30)/(29+35) * E(FALSE) = E(A1) - 26/64 * E(TRUE) - 38/64* E(FALSE) = 0.9937– 26/64 * 0.796 – 38/64* 0.7426 = 0.5465
ID3 and Decision tree The complete example Consider the following table
ID3 and Decision tree Decision tree • We want to build a decision tree for the tennis matches • The schedule of matches depend on the weather (Outlook, Temperature, Humidity, and Wind) • So to apply what we know to build a decision tree based on this table
ID3 and Decision tree Example • Calculating the information gains for each of the weather attributes: • For the Wind • For the Humidity • For the Outlook
S=[9+,5-] E=0.940 Wind Weak Strong [6+, 2-] [3+, 3-] Gain(S,Wind): =0.940 - (8/14)*0.811 - (6/14)*1.0 =0.048 ID3 and Decision tree For the Wind
S=[9+,5-] E=0.940 Humidity High Normal [3+, 4-] [6+, 1-] Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 ID3 and Decision tree For the Humidity
S=[9+,5-] E=0.940 Outlook Rain Over cast Sunny [2+, 3-] [4+, 0] [3+, 2-] E=0.971 E=0.0 E=0.971 ID3 and Decision tree For the Outlook Gain(S,Outlook) =0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971 =0.247
Outlook Sunny Overcast Rain Humidity Yes Wind [D3,D7,D12,D13] High Normal Strong Weak No Yes No Yes [D6,D14] [D8,D9,D11] [D1,D2] ID3 and Decision tree Complete tree • Then here is the complete tree:
Reference: • Dr. Lee’s Slides, San Jose State University, Spring 2007 • "Building Decision Trees with the ID3 Algorithm", by: Andrew Colin, Dr. Dobbs Journal, June 1996 • "Incremental Induction of Decision Trees", by Paul E. Utgoff, Kluwer Academic Publishers, 1989 • http://www.cise.ufl.edu/~ddd/cap6635/Fall-97/Short-papers/2.htm • http://decisiontrees.net/node/27