820 likes | 837 Views
Learn about ROC curves, rule induction basics, and the importance of text mining in various fields such as medical, computer vision, security, and biotech applications. Understand the significance of true positives, false positives, false negatives, and true negatives in classification tasks. Dive into the concepts of sensitivity and specificity and explore how to interpret and optimize rule sets for accurate classification.
E N D
Data Mining(and machine learning) ROC curves Rule Induction Basics of Text Mining
Two classes is a common and special case Medical applications: cancer, or not? Computer Vision applications: landmine, or not? Security applications: terrorist, or not? Biotech applications: gene, or not? … …
Two classes is a common and special case Medical applications: cancer, or not? Computer Vision applications: landmine, or not? Security applications: terrorist, or not? Biotech applications: gene, or not? … …
Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer
Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly.
Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly. False Negative: also to be minimised – miss a landmine / cancer very bad in many applications
Two classes is a common and special case True Positive: these are ideal. E.g. we correctly detect cancer False Positive: to be minimised – cause false alarm – can be better to be safe than sorry, but can be very costly. False Negative: also to be minimised – miss a landmine / cancer very bad in many applications True Negative?:
Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks
Sensitivity and Specificity: common measures of accuracy in this kind of 2-class task Sensitivity = TP/(TP+FN) - how much of the real ‘Yes’ cases are detected? How well can it detect the condition? Specificity = TN/(FP+TN) - how much of the real ‘No’ cases are correctly classified? How well can it rule out the condition?
Sensitivity: 100% Specificity: 25% YES NO YESNO
Sensitivity: 93.8% Specificity: 50% YESNO
Sensitivity: 81.3% Specificity: 83.3% YESNO YES NO
Sensitivity: 56.3% Specificity: 100% YESNO YES NO
Sensitivity: 100% Specificity: 25% YES NO YESNO 100% Sensitivity means: detects allcancer cases (or whatever) but possibly with many false positives
Sensitivity: 56.3% Specificity: 100% YESNO YES NO 100% Specificity means: misses some cancer cases (or whatever) but no false positives
Sensitivity and Specificity: common measures of accuracy in this kind of 2-class tasks Sensitivity = TP/(TP+FN) - how much of the real TRUE cases are detected? How sensitive is the classifier to TRUE cases? A highly sensitive test for cancer: if “NO” then you be sure it’s “NO” Specificity = TN/(TN+FP) - how sensitive is the classifier to the negative cases? A highly specific test for cancer: if “Y” then you be sure it’s “Y”. With many trained classifiers, you can ‘move the line’ in this way. E.g. with NB, we could use a threshold indicating how much higher the log likelihood for Y should be than for N
ROC curves David Corne, and Nick Taylor, Heriot-Watt University - dwcorne@gmail.com These slides and related resources: http://www.macs.hw.ac.uk/~dwcorne/Teaching/dmml.html
Rule Induction • Rules are useful when you want to learn a clear / interpretable classifier, and are less worried about squeezing out as much accuracy as possible • There are a number of different ways to ‘learn’ rules or rulesets. • Before we go there, what is a rule / ruleset?
Rules IF Condition … Then Class Value is …
Rules are Rectangular YESNO IF (X>0)&(X<5)&(Y>0.5)&(Y<5) THEN YES 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Rules are Rectangular YESNO IF (X>5)&(X<11)&(Y>4.5)&(Y<5.1) THEN NO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
A Ruleset IF Condition1 … Then Class = A IF Condition2 … Then Class = A IF Condition3 … Then Class = B IF Condition4 … Then Class = C …
What’s wrong with this ruleset? (two things) YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
What about this ruleset? YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Two ways to interpret a ruleset: As a Decision List IF Condition1 … Then Class = A ELSE IF Condition2 … Then Class = A ELSE IF Condition3 … Then Class = B ELSE IF Condition4 … Then Class = C … ELSE … predict Background Majority Class
Two ways to interpret a ruleset: As an unordered set IF Condition1 … Then Class = A IF Condition2 … Then Class = A IF Condition3 … Then Class = B IF Condition4 … Then Class = C Check each rule and gather votes for each class If no winner, predict background majority class
Three broad ways to learn rulesets 1. Just build a decision tree with ID3 (or something else) and you can translate the tree into rules!
Three broad ways to learn rulesets 2. Use any good search/optimisation algorithm. Evolutionary (genetic) algorithms are the most common. You will do this coursework 3. This means simply guessing a ruleset at random, and then trying mutations and variants, gradually improving them over time.
Three broad ways to learn rulesets 3. A number of ‘old’ AI algorithms exist that still work well, and/or can be engineered to work with an evolutionary algorithm. The basic idea is: iterated coverage
Take each class in turn .. YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Pick a random member of that class in the training set YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Extend it as much as possible without including another class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Next class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
Next class YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
And so on… YESNO 5 4 3 2 1 0 0 1 2 3 4 5 6 7 8 9 10 11 12
2012 Students’ implementation choices for DMML CW1 2014 “Word Clouds” - word frequency patterns provides useful information
Classify sentiment • “Word Clouds” • - word frequency patterns • provides useful information • …which can be used to predict • a class value / category / signal • … in this case • the document(s) are “tweets • mentioning our airline over • past few hours” • class value is a satisfaction • score, between 0 and 1 ACS Index Twitter sentiment http://www.inside-r.org/howto/mining-twitter-airline-consumer-sentiment
http://necsi.edu/research/social/newyork/sentimentmap/ sentiment map of NYC more info from tweets, this time, a “happiness” score.
“similar pages” Based on distances between word frequency patterns
Predicting relationship between two people based on their text messages
Can you predict class: Desktop, Laptop or LED-TV from word frequencies of product description on amazon ?