Machine Learning in Practice MidTerm Review

Machine Learning in PracticeMidTerm Review Carolyn Penstein Rosé Kishore Prahallad Language Technologies Institute

Error Analysis

Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07

Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07 Diagonal Elements are non-zero

Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07 Diagonal Elements are non-zero NON-Diagonal Elements should be Zero

Try to find an explanation for large error cells in confusion matrix

From Assgn6 • === Stratified cross-validation === • === Summary === • Correctly Classified Instances 77 51.3333 % • Incorrectly Classified Instances 73 48.6667 % • Kappa statistic 0.0235 • === Confusion Matrix === • a b <-- classified as • 33 40 | a = negative • 33 44 | b = positive

Ranked Attributes • Ranked attributes: • 16.6146 6465 life • 15.3272 996 bad • 14.3417 7565 nothing • 12.3659 2625 created • 12.24 12337 world • 11.7684 7798 others • 10.8115 10654 stupid • 9.6538 11050 terrible • 9.5345 2552 could • 9.0771 3388 dream • 8.86 11285 top • 8.4936 1992 children

Add Bigrams (only) and select Top 5 Attributes • Correctly Classified Instances 87 58% • Incorrectly Classified Instances 63 42 % • Kappa statistic 0.1414 • === Confusion Matrix === • a b <-- classified as • 12 61 | a = negative • 2 75 | b = positive

What these Top 5 Attributes are? • Ranked attributes: • 7.745 22 entir_movi • 5.456 23 fall_flat • 5.456 42 million_dollar • 4.904 56 support_role • 4.904 59 visual_effect

Add All Features and do a Naïve Bayes • Correctly Classified Instances 107 71.3% • Incorrectly Classified Instances 43 28.6% • Kappa statistic 0.4252 • === Confusion Matrix === • a b <-- classified as • 49 24 | a = negative • 19 58 | b = positive

Methods of Analyzing Error • Confusion amongst classes • Check the confusion Matrix • Check to see what is common across these two classes • Find out ways to remove these commonalities by feature extraction or selection • Algorithms • Often errors are also due to nature of ML algorithm used • Experiment with different algorithms

Machine Learning in Practice MidTerm Review

Machine Learning in Practice MidTerm Review

Presentation Transcript

Machine Learning and Review

Midterm : Review Terms and Practice

Machine Learning in Practice Lecture 9

Midterm Review

Machine Learning in Practice Lecture 3

Machine Learning in Practice Lecture 18

Midterm Practice

Machine Learning in Practice Lecture 12

Machine Learning in Practice Lecture 19

Machine Learning in Practice Lecture 14

Machine Learning in Practice Lecture 7

Machine Learning in Practice Lecture 5

Machine Learning in Practice Lecture 8

Midterm Review

Machine Learning in Practice Lecture 26

Machine Learning in Practice Lecture 27

Machine Learning in Practice Lecture 7

Machine Learning in Practice Lecture 6

Midterm Review