Data Mining CSCI 307, Spring 2019 Lecture 25

Data MiningCSCI 307, Spring 2019Lecture 25 Evaluating the Results

Credibility: Evaluating What's Been Learned • Issues: training, testing, tuning • Predicting performance: confidence limits • Holdout, cross-validation, bootstrap

Evaluation: the Key to Success How predictive is the model we learned? • Performance on the training data is not a good indicator of performance on future data • Simple solution that can be used if lots of (labeled) data is available: • Split data into training and test set • However: quality data is often scarce

Issues in Evaluation • We need statistics to estimate differences in performance • What should be measured? There are performance measure methods/choices: • Number of correct classifications • Accuracy of probability estimates in predicting the class • Error numeric predictions (versus nominal predictions) • As a practical matter, costs assigned to different types of errors • a misclassification error depends on the type of error —i.e. positive example erroneously classified as negative or vice versa

Model Evaluation and Selection Summary Evaluation Metrics: How can we measure accuracy? Other metrics to consider? • Use test set of class-labeled instances instead of training set when assessing accuracy Looking Ahead: • Methods for estimating a classifier’s accuracy: • Holdout method, Cross-validation, Bootstrap • Comparing classifiers: • Confidence intervals, Cost-benefit analysis and ROC (receiver operating characteristic—graph to show performance) Curves • Thinking about numeric prediction 5

5.1 Training and Testing • Natural performance measure for classification problems: Error Rate • Success: instance’s class is predicted correctly • Error: instance’s class is predicted incorrectly • Error Rate: proportion of errors made over the whole set of instances • Resubstitution error: error rate obtained from using training data to measure performance. • Resubstitution error is (hopelessly) optimistic

Training and Testing continued • Test set: independent instances that have played no part in formation of classifier • Assumption: both training data and test data are representative samples of the underlying problem • Test and training data may differ in nature • Example: classifiers built using customer data from two different towns A and B • To estimate performance of the classifier from town A in a completely new town, test it on data from town B

Parameter Tuning We can see that it is important that the test data is not used in any way to create the classifier • Some learning schemes operate in two stages: • Stage 1: build the basic structure • Stage 2: optimize parameter settings • Test data cannot be used for parameter tuning. Must use three sets: training data (for stage 1), validation data (for stage 2), and test data

Making the Most of the Data • Often, once evaluation is complete, all the data can be used to build the final classifier • Generally, the larger the training data the better the classifier • The larger the test data the more accurate the error estimate • Holdout procedure: method of splitting original data into training and test set • Dilemma: ideally both training set and test set should be large (and representative)

Data Mining CSCI 307, Spring 2019 Lecture 25

Data Mining CSCI 307, Spring 2019 Lecture 25

Presentation Transcript

CSCI 235 , Fall 2019, Lecture 25 Dynamic Programming

Data Mining CSCI 307 Spring, 2019

Data Mining CSCI 307, Spring 2019 Lecture 13

Data Structures CSCI 132, Spring 2019 Lecture 21 Doubly Linked Lists

Data Structures CSCI 132, Spring 2014 Lecture 17 Backtracking

Data Structures CSCI 132, Spring 2019 Lecture 14 Review for Exam 1

Data Mining Spring 2013

Data Structures CSCI 132, Spring 2019 Lecture 18 Recursion and Look-Ahead

Data Mining Spring 2007