1 / 15

8. Evaluation Methods

8. Evaluation Methods. Errors and Error Rates Precision and Recall Similarity Cross Validation Various Presentations of Evaluation Results Statistical Tests. How to evaluate/estimate error. Resubstitution one data set used for both training and for testing Holdout (training and testing)

aysha
Download Presentation

8. Evaluation Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 8. Evaluation Methods Errors and Error Rates Precision and Recall Similarity Cross Validation Various Presentations of Evaluation Results Statistical Tests Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  2. How to evaluate/estimate error • Resubstitution • one data set used for both training and for testing • Holdout (training and testing) • 2/3 for training, 1/3 for testing • Leave-one-out • If a data set is small • Cross validation • 10-fold, why 10? • m 10-fold CV Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  3. Error and Error Rate • Mean and Median • mean = 1/nxi • weighted mean = (wixi)/wi • median = x(n+1)/2 if n is odd, else (xn/2+x(n/2)+1)/2 • Error – disagreement btwn y and y’ (predicted) • 1 if they disagree, 0 otherwise (0-1 loss l01) • Other definitions depending on the output of a predictor such as quadratic loss l2, absolute loss l‖ Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  4. Error estimation • Error rate e = #Errors/N, where N is the total number of instances • Accuracy A = 1 - e Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  5. R P Precision and Recall • False negative and false positive • Types of errors for k classes = k2-k • k = 3, 3*3-3 = 6, k = 2, 2*2-2 = 2 • Precision (wrt the retrieved) • P = TP/(TP+FP) • Recall (wrt the total relevant) • R = TP/(TP+FN) • Precision×Recall (PR) and PR gain • PR gain = (PR’ – PR0)/PR0 • Accuracy • A = (TP+TN)/(TP+TN+FP+FN) Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  6. Similarity or Dissimilarity Measures • Distance (dissimilarity) measures (Triangle Inequality) • Euclidean • City-block, or Manhattan • Cosine (pi,pj)= [(pikpjk)/ (pik)2(pjk)2] • Inter-clusters and intra-clusters • Single linkage vs. complete linkage • Dmin = min|pi - pj|, two data points • Dmax= max|pi - pj| • Centroid methods • Davg= 1/(ninj)|pi – pj| • Dmean= |mi - mj|, two means Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  7. Fold 2 Fold 1 Fold 3 k-Fold Cross Validation • Cross validation • 1 fold for training, the rest for testing • rotate until every fold is used for training • calculate average • mk-fold cross validation • reshuffle data, repeat XV for m times • what is a suitable k? • Model complexity • use of XV • tree complexity, training/testing error rates Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  8. Learning (happy) curves Accuracy increases over X Its opposite (or error) decreases over X Box-plot Whiskers (min, max) Box: confidence interval Graphical equivalent of t-test max 2 mean min Presentations of Evaluation Results Results are usually about time, space, trend, average case Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  9. Statistical Tests • Null hypothesis and alternative hypothesis • Type I and Type II errors • Student’s t test comparing two means • Paired t test comparing two means • Chi-Square test • Contingency table Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  10. Null Hypothesis • Null hypothesis (H0) • No difference between the test statistic and the actual value of the population parameter • E.g., H0:  = 0 • Alternative hypothesis (H1) • It specifies the parameter value(s) to be accepted if the H0 is rejected. • E.g., H1:  != 0 – two-tailed test • OrH1:  > 0 – one-tailed test Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  11. Type I, II errors • Type I errors () • Rejecting a null hypothesis when it is true (FN) • Type II errors () • Accepting a null hypothesis when it is false (FP) • Power = 1 –  • Costs of different errors • A life-saving medicine appears to be effective, which is cheap and has no side effect (H0: non-effective) • Type I error: it is effective, not costly • Type II error: it is non-effective, very costly Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  12. Test using Student’s t Distribution • Use t distribution for testing the difference between two population means is appropriate if • The population standard deviations are not known • The samples are small (n < 30) • The populations are assumed to be approx. normal • The two unknown 1 = 2 • H0: (1 - 2) = 0, H1: (1 - 2) != 0 • Check the difference of estimated means normalized by common population means • degree of freedom and p level of significance • df = n1 + n2 – 2 Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  13. -/2 0 +/2 Paired t test • With paired observations, use paired t test • Now H0: d = 0 and H1: d != 0 • Check the estimated difference mean • The t in previous and current cases are calculated differently. • Both are 2-tailed test, p = 1% means .5% on each side • Excel can do that for you! Rejection Region Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  14. Chi-Square Test (thegoodness-of-fit) • Testing a null hypothesis that the population distribution for a random variable follows a specified form. • The chi-square statistic is calculated: • degree of freedom df = k-m-1 • k = num of data categories • m = num of parameters estimated • 0 – uniform, 1- Poisson, 2 - normal • Each cell should be at least 5 • One-tail test 2 k 2 =  (Aij – Eij)2 / Eij i=1 j=1 Rejection Region Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

  15. Bibliography • W. Klosgen & J.M. Zytkow, edited, 2002, Handbook of Data Mining and Knowledge Discovery. Oxford University Press. • L. J. Kazmier & N. F. Pohl, 1987. Basic Statistics for Business and Economics. • R.E. Walpole & R.H. Myers, 1993. Probability and Statistics for Engineers and Scientists (5th edition). MACMILLAN Publishing Company. Data Mining – Evaluation H. Liu (ASU) & G Dong (WSU)

More Related