160 likes | 319 Views
Tutorial 2. LIU Tengfei 2/19/2009. Contents. Introduction TP, FP, ROC Precision, recall Confusion matrix Other performance measures Resource. Classifier output of Weka(1). Classifier output of Weka(2). TP rate, FP rate(1). Consider a diagnostic test
E N D
Tutorial 2 LIU Tengfei 2/19/2009
Contents • Introduction • TP, FP, ROC • Precision, recall • Confusion matrix • Other performance measures • Resource
TP rate, FP rate(1) Consider a diagnostic test • A false positive(FP): the person tests positive, but actually does not have the disease. • A false negative(FN): the person tests negative, suggesting he is healthy, but he actually does have the disease. Note: True positive/negative are similar
TP rate, FP rate(2) • TP rate = true positive rate FP rate = false positive rate
TP rate, FP rate(3) Definition: TP rate = TP/(TP+FN) FP rate = FP/(FP+TN) From the actual value point of view
ROC curve(1) • ROC = receiver operating characteristic Y:TP rate X:FP rate
ROC curve(2) Which method (A or B) is better? compute ROC area: area under ROC curve
Precision, Recall(1) • Precision = TP/(TP + FP) Recall = TP/(TP + FN) Precision: is the probability that a retrieved document is relevant. Recall: is the probability that a relevant document is retrieved in a search.
Precision, Recall(2) • F-measure = 2*(precision*recall)/(precision + recall) • Precision, recall and F-measure come from information retrieval domain.
Confusion matrix • Example: using J48 to process iris.arff
Other performance measures *p are predicted values and a are actual values
Resource 1. Wiki page for TP, FP, ROC 2. Wiki page for Precision and Recall 3. Ian H. Witten, Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques (Second Edition), Chapter 5