130 likes | 399 Views
Performance measurement. Performance measurement. Must be careful what performance metric we use For example, say we have a NN classifier with 1 output unit, and we code ‘1 = YES’ and ‘0 = NO’ Should we threshold at 0.5, saying that anything > 0.5 is a 1, anything =< 0.5 is a zero?.
E N D
Performance measurement • Must be careful what performance metric we use • For example, say we have a NN classifier with 1 output unit, and we code ‘1 = YES’ and ‘0 = NO’ • Should we threshold at 0.5, saying that anything > 0.5 is a 1, anything =< 0.5 is a zero?
Performance measurement • Only if classification/misclassification cost are the same for each of the two classes • Output threshold of 0.5 is not set in stone • What is performance if we use decision threshold of 0.6, or 0.4?
Performance measurement • For example, in predicting consumer creditworthiness: • Are costs of loaning money to someone who then defaults same as: • Costs of not lending money to someone who would in actual fact have repaid the loan?
Confusion matrix/crosstabs • Calculate four quantities: • True Positives (TP): answer = YES, network said YES • True Negatives (TN): answer = NO, network said NO • False Positives (FP): answer = NO, network said YES • False Negatives (FN): answer = YES, network said NO
Confusion matrix • Calculate a confusion matrix for many different output thresholds (e.g., 0.1, 0.2 …0.9) • From these matrices, calculate the following values: • Calculate probabilities • hit rate = true positive ratio = sensitivity = TP/(TP+FN) • false alarm rate = false positive ratio = FP/(FP+TN) • Plot Receiver Operating Characteristic (ROC) Curve
ROC curves • Area under curve gives idea of how good classifier is. 0.5 = no good, approaching 1 = excellent • Can then build in profits/costs of different correct answers/mistakes into the confusion matrices to build a Gains Chart. Again, look at this area on chart • Classifier with highest area on gains chart is the most profitable
Performance of regression networks • Mean square error? • Goodness of fit, R squared values? • Again, are costs/benefits of errors same for all different values? • Useful to ‘eyeball’ data, see if there are some areas where network is good, some where not as good, look at relative costs
Summary • When you measure performance, be careful what you are measuring!