1 / 9

Data Mining CSCI 307, Spring 2019 Lecture 25

Data Mining CSCI 307, Spring 2019 Lecture 25. Evaluating the Results. Credibility: Evaluating What's Been Learned. Issues: training, testing, tuning Predicting performance: confidence limits Holdout, cross-validation, bootstrap. Evaluation: the Key to Success.

quintero
Download Presentation

Data Mining CSCI 307, Spring 2019 Lecture 25

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data MiningCSCI 307, Spring 2019Lecture 25 Evaluating the Results

  2. Credibility: Evaluating What's Been Learned • Issues: training, testing, tuning • Predicting performance: confidence limits • Holdout, cross-validation, bootstrap

  3. Evaluation: the Key to Success How predictive is the model we learned? • Performance on the training data is not a good indicator of performance on future data • Simple solution that can be used if lots of (labeled) data is available: • Split data into training and test set • However: quality data is often scarce

  4. Issues in Evaluation • We need statistics to estimate differences in performance • What should be measured? There are performance measure methods/choices: • Number of correct classifications • Accuracy of probability estimates in predicting the class • Error numeric predictions (versus nominal predictions) • As a practical matter, costs assigned to different types of errors • a misclassification error depends on the type of error —i.e. positive example erroneously classified as negative or vice versa

  5. Model Evaluation and Selection Summary Evaluation Metrics: How can we measure accuracy? Other metrics to consider? • Use test set of class-labeled instances instead of training set when assessing accuracy Looking Ahead: • Methods for estimating a classifier’s accuracy: • Holdout method, Cross-validation, Bootstrap • Comparing classifiers: • Confidence intervals, Cost-benefit analysis and ROC (receiver operating characteristic—graph to show performance) Curves • Thinking about numeric prediction 5

  6. 5.1 Training and Testing • Natural performance measure for classification problems: Error Rate • Success: instance’s class is predicted correctly • Error: instance’s class is predicted incorrectly • Error Rate: proportion of errors made over the whole set of instances • Resubstitution error: error rate obtained from using training data to measure performance. • Resubstitution error is (hopelessly) optimistic

  7. Training and Testing continued • Test set: independent instances that have played no part in formation of classifier • Assumption: both training data and test data are representative samples of the underlying problem • Test and training data may differ in nature • Example: classifiers built using customer data from two different towns A and B • To estimate performance of the classifier from town A in a completely new town, test it on data from town B

  8. Parameter Tuning We can see that it is important that the test data is not used in any way to create the classifier • Some learning schemes operate in two stages: • Stage 1: build the basic structure • Stage 2: optimize parameter settings • Test data cannot be used for parameter tuning. Must use three sets: training data (for stage 1), validation data (for stage 2), and test data

  9. Making the Most of the Data • Often, once evaluation is complete, all the data can be used to build the final classifier • Generally, the larger the training data the better the classifier • The larger the test data the more accurate the error estimate • Holdout procedure: method of splitting original data into training and test set • Dilemma: ideally both training set and test set should be large (and representative)

More Related