1 / 33

ID3 Algorithm

ID3 Algorithm. CS 157B: Spring 2010 Meg Genoar. Iterative Dichotomiser 3. Ross Quinlan – 1987 C4.5 Precursor Decision Tree Generation. Ross Quinlan. Computer Scientist – UW 1968 Data Mining & Decision Theory AI: Data Mining ID3, C4.5, & C5.0 RuleQuest Research. ID3 & Entropy.

rufin
Download Presentation

ID3 Algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ID3 Algorithm CS 157B: Spring 2010 Meg Genoar

  2. Iterative Dichotomiser 3 • Ross Quinlan – 1987 • C4.5 Precursor • Decision Tree Generation

  3. Ross Quinlan • Computer Scientist – UW 1968 • Data Mining & Decision Theory • AI: Data Mining • ID3, C4.5, & C5.0 • RuleQuest Research

  4. ID3 & Entropy • Measure of Uncertainty • Randomness • Efficient Separation of Decision Tree Elements • Max-Gain Split • Most Useful Attribute • Highest Information  Best Attribute

  5. Entropy Entropy(S) = – PpositiveLog2Ppositive – PnegativeLog2Pnegative Ppositive: proportion of positive data Pnegative: proportion of negative data

  6. Example… A collection S consists of 20 data examples: 13 Yes : 7 No Entropy(S) = – (13/20) Log2(13/20) – (7/20) Log2(7/20) Entropy(S) = 0.934

  7. Entropy  Gain Value • Gain: Place to Split the Tree • High Gain > Low Gain • High Gain: Top of the Tree • Gain(A) = E(Current Set) - ∑ E(All Child Sets)

  8. Movie Example

  9. Entropy of Table Is the Film a Success? Entropy(5 Yes, 5 No) = – (5/10) Log2(5/10) – (5/10) Log2(5/10) Entropy(Success) = 1

  10. Split – Country of Origin

  11. Gain – Country of Origin Where is the film from? Entropy(USA) = – (3/4) Log2(3/4) – (1/4) Log2(1/4) Entropy(USA) = 0.811 Entropy(Europe) = – (2/4) Log2(2/4) – (2/4) Log2(2/4) Entropy(Europe) = 1 Entropy(Rest of World) = – (0/2) Log2(0/2) – (2/2) Log2(2/2) Entropy(Rest of World) = 0 Gain(Origin) = 1 – (4/10 *0.811 + 4/10*1 + 2/10*0) = 0.276

  12. Split – Big Star

  13. Gain – Big Star Is there a Big Star in the film? Entropy(Yes) = – (4/7) Log2(4/7) – (3/7) Log2(3/7) Entropy(Yes) = 0.985 Entropy(No) = – (1/3) Log2(1/3) – (2/3) Log2(2/3) Entropy(No) = 0.918 Gain(Star) = 1 – (7/10 *0.985 + 3/10*0.918) = 0.0351

  14. Split – Genre

  15. Gain – Genre What genre is the film? Entropy(SciFi) = – (1/3) Log2(1/3) – (2/3) Log2(2/3) Entropy(SciFi) = 0.918 Entropy(Com) = – (4/6) Log2(4/6) – (2/6) Log2(2/6) Entropy(Com) = 0.918 Entropy(Rom) = – (0/1) Log2(0/1) – (1/1) Log2(1/1) Entropy(Rom) = 0 Gain(Genre) = 1 – (3/10 *0.918 + 6/10*0.918+ 1/10*0) = 0.1738

  16. Compare Gains… Gain(Origin) = 0.276 Gain(Star) = 0.0351 Gain(Genre) = 0.1738

  17. Compare Gains… Gain(Origin) = 0.276 Gain(Star) = 0.0351 Gain(Genre) = 0.1738 First Split: Origin

  18. United States Europe Rest of World All Movies New Table New Table New Table

  19. United States Europe Rest of World All Movies New Table New Table New Table

  20. New Table – United States Entropy(3 Yes, 1 No) = – (3/4) Log2(3/4) – (1/4) Log2(1/4) Entropy(Success) = 0.811

  21. Split – Big Star

  22. Gain – Big Star Is there a Big Star in the film? Entropy(Yes) = – (3/3) Log2(3/3) – (0/3) Log2(0/3) Entropy(Yes) = 0 Entropy(No) = – (0/1) Log2(0/1) – (1/1) Log2(1/1) Entropy(No) = 0 Gain(Star) = 0.811 – (3/4 *0 + 1/4*0) = 0.811

  23. Split – Genre

  24. Gain – Genre What genre is the film? Entropy(SciFi) = – (1/1) Log2(1/1) – (0/1) Log2(0/1) Entropy(SciFi) = 0 Entropy(Com) = – (2/3) Log2(2/3) – (1/3) Log2(1/3) Entropy(Com) = 0.918 Gain(Genre) = 0.811 – (1/4 *0 + 3/4*0.918) = 0.1225

  25. Compare Gains… Gain(Star) = 0.811 Gain(Genre) = 0.1225

  26. Compare Gains… Gain(Star) = 0.811 Gain(Genre) = 0.1225 Split: Star

  27. United States Europe Rest of World All Movies New Table New Table New Table Star No Star New Table New Table

  28. United States Europe Rest of World All Movies New Table New Table New Table Star No Star New Table Failure Sci-Fi Comedy Success Success

  29. All Movies Rest of World United States Europe Table Table Table Star No Star Star No Star Success New New Failure Sci-Fi Comedy Sci-Fi Comedy Failure Success Success Success

  30. All Movies Rest of World United States Europe Table Table Table Star No Star Star No Star Success New New Failure Sci-Fi Comedy Sci-Fi Comedy Failure Success Success Success Comedy from the US, with a big star…

  31. All Movies Rest of World United States Europe Table Table Table Star No Star Star No Star Success New New Failure Sci-Fi Comedy Sci-Fi Comedy Failure Success Success Success Comedy from the US, with a big star…

  32. All Movies Rest of World United States Europe Table Table Table Star No Star Star No Star Success New New Failure Sci-Fi Comedy Sci-Fi Comedy Failure Success Success Success Comedy from the US, with a big star…

  33. All Movies Rest of World United States Europe Table Table Table Star No Star Star No Star Success New New Failure Sci-Fi Comedy Sci-Fi Comedy Failure Success Success Success Comedy from the US, with a big star…

More Related