1 / 12

Data Mining CSCI 307, Spring 2019 Lecture 34

Learn how to evaluate numeric prediction models using ROC curves. Discover different methods for generating ROC curves and how to choose the appropriate one based on the size of your dataset. Explore the concept of the convex hull and its application in achieving any point on the curve. Additionally, understand the importance of classifier evaluation metrics such as precision, recall, sensitivity, and specificity.

lhawkins
Download Presentation

Data Mining CSCI 307, Spring 2019 Lecture 34

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data MiningCSCI 307, Spring 2019Lecture 34 More on ROC Curves Evaluating Numeric Prediction

  2. Cross-Validation and ROC Curves • Simple method of getting a ROC curve using cross-validation: • Collect probabilities for instances in test folds • Sort instances according to probabilities • This method is implemented in WEKA • However, this is just one possibility • Another possibility is to generate an ROC curve for each fold and average them

  3. ROC Curves for Two Learning Schemes For a small, focused sample, use Method A For a larger one, use Method B In between, choose between A and B with appropriate probabilities

  4. The Convex Hull Given two learning schemes we can achieve any point on the convex hull • TP and FP rates for scheme 1: t1 and f1 • TP and FP rates for scheme 2: t2 and f2 • If scheme 1 is used to predict 100 x q% of the cases and scheme 2 for the rest, then • TP rate for combined scheme: q x t1 + (1-q) x t2 • FP rate for combined scheme: q x f1 + (1-q) x f2

  5. Summary: Classifier Evaluation Metrics Precision = 90/230 = 39.13% Recall = 90/300 = 30.00% • Classifier Accuracy (aka recognition rate): percentage of test set instances correctly classified Accuracy = (TP + TN)/All • Precision: exactness – what % of instances that the classifier labeled as positive are actually positive Precision = TP/(TP+FP) • Recall: completeness – what % of positive instances did the classifier label as positive?Recall = TP/(TP+FN) • Sensitivity: True Positive recognition rate Sensitivity = TP/P • Specificity: True Negative recognition rate Specificity = TN/N Recall == Sensitivity P = (TP+FN) N = (TN+FP) 5

  6. Summary of Some Measures 6

  7. 5.9 Evaluating Numeric Prediction • Same strategies: independent test set, cross-validation, significance tests, etc. • Difference: error measures • Actual target values: a1 a2 …an,Predicted target values: p1 p2 … pn Most popular measure: mean-squared error (p1-a1)2+(p2-a2)2 …(pn-an)2 n • Mean-squared error is easy to manipulate mathematically Other Measures: • The root mean-squared error: • The mean absolute error is less sensitive to outliers • Sometimes relative error values are more appropriate (e.g. 10% for an error of 50 when predicting 500)

  8. Improvement on the Mean How much does the scheme improve on simply predicting the average? • The relative squared error is: • The root relative squared error is: • Therelative absolute error is:

  9. Correlation Coefficient Measures the statistical correlation between the predicted values and the actual values • Scale independent, between –1 and +1 • Good performance leads to large values.

  10. Which Measure? • Best to look at all of them • Often it doesn’t matter Example:

  11. Summary: Performance Measures for Numeric Prediction

  12. Issues Affecting Model Selection Accuracy classifier accuracy: predicting class label Speed time to construct the model (training time) time to use the model (classification/prediction time) Robustness: handling noise and missing values Scalability: efficiency in disk-resident databases Interpretability understanding and insight provided by the model Other measures, e.g., goodness of rules, such as decision tree size or compactness of classification rules 12

More Related