1 / 16

Machine Learning in Practice MidTerm Review

Machine Learning in Practice MidTerm Review. Carolyn Penstein Ros é Kishore Prahallad Language Technologies Institute. Error Analysis. Error Analysis from Assgn4. === Confusion Matrix === a b c d e f g h <-- classified as

xenos-love
Download Presentation

Machine Learning in Practice MidTerm Review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning in PracticeMidTerm Review Carolyn Penstein Rosé Kishore Prahallad Language Technologies Institute

  2. Error Analysis

  3. Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07

  4. Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07

  5. Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07 Diagonal Elements are non-zero

  6. Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07 Diagonal Elements are non-zero NON-Diagonal Elements should be Zero

  7. Try to find an explanation for large error cells in confusion matrix

  8. Try to find an explanation for large error cells in confusion matrix

  9. Try to find an explanation for large error cells in confusion matrix

  10. Try to find an explanation for large error cells in confusion matrix

  11. From Assgn6 • === Stratified cross-validation === • === Summary === • Correctly Classified Instances 77 51.3333 % • Incorrectly Classified Instances 73 48.6667 % • Kappa statistic 0.0235 • === Confusion Matrix === • a b <-- classified as • 33 40 | a = negative • 33 44 | b = positive

  12. Ranked Attributes • Ranked attributes: • 16.6146 6465 life • 15.3272 996 bad • 14.3417 7565 nothing • 12.3659 2625 created • 12.24 12337 world • 11.7684 7798 others • 10.8115 10654 stupid • 9.6538 11050 terrible • 9.5345 2552 could • 9.0771 3388 dream • 8.86 11285 top • 8.4936 1992 children

  13. Add Bigrams (only) and select Top 5 Attributes • Correctly Classified Instances 87 58% • Incorrectly Classified Instances 63 42 % • Kappa statistic 0.1414 • === Confusion Matrix === • a b <-- classified as • 12 61 | a = negative • 2 75 | b = positive

  14. What these Top 5 Attributes are? • Ranked attributes: • 7.745 22 entir_movi • 5.456 23 fall_flat • 5.456 42 million_dollar • 4.904 56 support_role • 4.904 59 visual_effect

  15. Add All Features and do a Naïve Bayes • Correctly Classified Instances 107 71.3% • Incorrectly Classified Instances 43 28.6% • Kappa statistic 0.4252 • === Confusion Matrix === • a b <-- classified as • 49 24 | a = negative • 19 58 | b = positive

  16. Methods of Analyzing Error • Confusion amongst classes • Check the confusion Matrix • Check to see what is common across these two classes • Find out ways to remove these commonalities by feature extraction or selection • Algorithms • Often errors are also due to nature of ML algorithm used • Experiment with different algorithms

More Related