1 / 30

Homework 1: Solutions

CS4445/B12 Provided by: Kenneth J. Loomis. Homework 1: Solutions. Entropy of the original set. Entropy (target attribute). Determine the root node attribute. genre. =comedy. =drama. =action. .6935. Determine the root node attribute. c ritics-reviews. =thumbs-down. =neutral.

kiaria
Download Presentation

Homework 1: Solutions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4445/B12 Provided by: Kenneth J. Loomis Homework 1: Solutions

  2. Entropy of the original set Entropy (target attribute)

  3. Determine the root node attribute genre =comedy =drama =action .6935

  4. Determine the root node attribute critics-reviews =thumbs-down =neutral =thumbs-up .9111

  5. Determine the root node attribute rating =PG-13 =R .7885

  6. Determine the root node attribute IMAX =TRUE =FALSE .8922

  7. Determine the root node attribute genre =comedy =drama =action • We can see that genre provides us with the lowest entropy, thus it becomes the root node of our ID3 tree. .6935 .9111 .7885 .8922

  8. Determine the left child attribute genre =comedy =drama =action ? We now move on to the left child node of our tree. What attribute do we choose for this node? Options: critics-reviews rating IMAX

  9. Determine the left child attribute genre =comedy =drama =action critics-reviews =thumbs-down =neutral =thumbs-up .4000

  10. Determine the left child attribute genre =comedy =drama =action rating =PG-13 =R

  11. Determine the left child attribute genre =comedy =drama =action IMAX =PG-13 =R

  12. Determine the left child attribute genre =comedy =drama =action rating =PG-13 =R • We can see that rating provides us with the lowest entropy, thus it becomes the left child node of our ID3 tree. .4000

  13. Determine the left child attribute genre =comedy =drama =action rating =PG-13 =R [yes] [no] • This also makes this split homogeneous so we can add our leaf nodes here.

  14. Determine the center child attribute genre =comedy =action =drama rating [yes] =PG-13 =R [yes] [no] • We can see that genre = drama provides us with a homogeneous sub-set, so we can provide a leaf node here.

  15. Determine the right child attribute genre =comedy =action =drama ? rating [yes] =PG-13 =R [yes] [no] Options: critics-reviews rating IMAX We now move on to the right child node of our tree. What attribute do we choose for this node?

  16. Determine the right child attribute genre =comedy =action =drama Critics-reviews rating [yes] =PG-13 =neutral =R =thumbs-up =thumbs-down [yes] [no]

  17. Determine the right child attribute genre =comedy =action =drama rating rating [yes] =PG-13 =PG-13 =R =R [yes] [no]

  18. Determine the right child attribute genre =comedy =action =drama IMAX rating [yes] =PG-13 =FALSE =R =TRUE [yes] [no]

  19. Determine the right child attribute genre =comedy =action =drama IMAX rating [yes] =PG-13 =FALSE =R =TRUE [yes] [no] • We can see that IMAX provides us with the lowest entropy, thus it becomes the right child node of our ID3 tree. Entropy (critics-reviews) = .9510 = .9510 Entropy (IMAX) = 0.0

  20. Determine the right child attribute genre =comedy =action =drama IMAX rating [yes] =FALSE =PG-13 =TRUE =R [yes] [no] [yes] [no] • This also makes this split homogeneous so we can add our leaf nodes here.

  21. ID3 Decision tree is complete genre =action =comedy =drama IMAX rating [yes] =FALSE =PG-13 =TRUE =R [yes] [no] [yes] [no] • Since we have only leaf nodes remaining we are finished building our tree.

  22. Handling missing values during prediction genre =action =comedy =drama IMAX rating [yes] =FALSE =PG-13 =TRUE =R [yes] [no] [yes] [no] • Given an instance: • Genre = action • Critics-reviews = ? • Rating = R • IMAX = ? • How do we classify it? • How can we handle missing values using this decision tree?

  23. Handling missing values during prediction: a solution • Consider adding frequency counts to each leaf node: • shown here in curly braces. genre =action =comedy =drama IMAX rating [yes] {4} =FALSE =PG-13 =TRUE =R [yes] {2} [no] {3} [yes] {3} [no] {2}

  24. Handling missing values during prediction: a solution genre =action =comedy =drama IMAX rating [yes] {4} =FALSE =PG-13 =TRUE =R [yes] {2} [no] {3} [yes] {3} [no] {2} • Genre = action • Critics-reviews = ? • Rating = R • IMAX = ? • Traverse the tree.

  25. Handling missing values during prediction: a solution genre =action =comedy =drama IMAX rating [yes] {4} =FALSE =PG-13 =TRUE =R [yes] {2} [no] {3} [yes] {3} [no] {2} • Genre = action • Critics-reviews = ? • Rating = R • IMAX = ? • Traverse the decision tree normally when the attribute value is known.

  26. Handling missing values during prediction: a solution genre =action =comedy =drama IMAX rating [yes] {4} =FALSE =PG-13 =TRUE =R [yes] {2} [no] {3} [yes] {3} [no] {2} • Genre = action • Critics-reviews = ? • Rating = R • IMAX = ? • Traverse every possible path when a missing value is encountered.

  27. Handling missing values during prediction: a solution genre =action =comedy =drama IMAX rating [yes] {4} =FALSE =PG-13 =TRUE =R [yes] {2} [no] {3} [yes] {3} [no] {2} • Traverse every possible path when a missing value is encountered. • Sum the frequency counts of all like leaf nodes that are reached: • Genre = action • Critics-reviews = ? • Rating = R • IMAX = ?

  28. Handling missing values during prediction: a solution genre =action =comedy =drama IMAX rating [yes] {4} =FALSE =PG-13 =TRUE =R [yes] {2} [no] {3} [yes] {3} [no] {2} • Follow every possible path when a missing value is encountered. • Determine the frequency count by summing like classification frequencies: • Classify based on the highest frequency count. • Genre = action • Critics-reviews = ? • Rating = R • IMAX = ? • like = yes

  29. Handling missing values during prediction: 2nd example genre =action =comedy =drama IMAX rating [yes] {4} =FALSE =PG-13 =TRUE =R [yes] {2} [no] {3} [yes] {3} [no] {2} • Genre = ? • Critics-reviews = ? • Rating = R • IMAX = TRUE • like = no • Consider this 2nd example:

  30. Handling missing values during prediction: 3rd example genre =action =comedy =drama IMAX rating [yes] {4} =FALSE =PG-13 =TRUE =R [yes] {2} [no] {3} [yes] {3} [no] {2} • Genre = ? • Critics-reviews = ? • Rating = ? • IMAX = ? • likes = yes • Consider if all attribute values are unknown:

More Related