160 likes | 299 Views
Machine Learning in Practice MidTerm Review. Carolyn Penstein Ros é Kishore Prahallad Language Technologies Institute. Error Analysis. Error Analysis from Assgn4. === Confusion Matrix === a b c d e f g h <-- classified as
E N D
Machine Learning in PracticeMidTerm Review Carolyn Penstein Rosé Kishore Prahallad Language Technologies Institute
Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07
Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07
Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07 Diagonal Elements are non-zero
Error Analysis from Assgn4 • === Confusion Matrix === • a b c d e f g h <-- classified as • 1443 105 98 127 141 27 396 73 | a = irf03 • 934 190 125 264 86 58 604 163 | b = irf04 • 985 95 350 219 108 69 863 134 | c = irf06 • 841 152 177 774 127 80 524 161 | d = irf07 • 269 98 25 78 111 180 1208 294 | e = irm02 • 369 94 27 61 95 438 1062 235 | f = irm05 • 241 70 43 38 123 216 1457 88 | g = irm06 • 470 66 38 55 73 211 1188 422 | h = irm07 Diagonal Elements are non-zero NON-Diagonal Elements should be Zero
Try to find an explanation for large error cells in confusion matrix
Try to find an explanation for large error cells in confusion matrix
Try to find an explanation for large error cells in confusion matrix
Try to find an explanation for large error cells in confusion matrix
From Assgn6 • === Stratified cross-validation === • === Summary === • Correctly Classified Instances 77 51.3333 % • Incorrectly Classified Instances 73 48.6667 % • Kappa statistic 0.0235 • === Confusion Matrix === • a b <-- classified as • 33 40 | a = negative • 33 44 | b = positive
Ranked Attributes • Ranked attributes: • 16.6146 6465 life • 15.3272 996 bad • 14.3417 7565 nothing • 12.3659 2625 created • 12.24 12337 world • 11.7684 7798 others • 10.8115 10654 stupid • 9.6538 11050 terrible • 9.5345 2552 could • 9.0771 3388 dream • 8.86 11285 top • 8.4936 1992 children
Add Bigrams (only) and select Top 5 Attributes • Correctly Classified Instances 87 58% • Incorrectly Classified Instances 63 42 % • Kappa statistic 0.1414 • === Confusion Matrix === • a b <-- classified as • 12 61 | a = negative • 2 75 | b = positive
What these Top 5 Attributes are? • Ranked attributes: • 7.745 22 entir_movi • 5.456 23 fall_flat • 5.456 42 million_dollar • 4.904 56 support_role • 4.904 59 visual_effect
Add All Features and do a Naïve Bayes • Correctly Classified Instances 107 71.3% • Incorrectly Classified Instances 43 28.6% • Kappa statistic 0.4252 • === Confusion Matrix === • a b <-- classified as • 49 24 | a = negative • 19 58 | b = positive
Methods of Analyzing Error • Confusion amongst classes • Check the confusion Matrix • Check to see what is common across these two classes • Find out ways to remove these commonalities by feature extraction or selection • Algorithms • Often errors are also due to nature of ML algorithm used • Experiment with different algorithms