250 likes | 517 Views
CSC445: A Case Study Case-Based Reasoning. Case-Based Reasoning. Case-based reasoning akin to the human intuitive thinking process make use of analogies or cases of previous experiences when solving problems useful in a wide variety of software development domains software quality estimation
E N D
Case-Based Reasoning • Case-based reasoning • akin to the human intuitive thinking process • make use of analogies or cases of previous experiences when solving problems • useful in a wide variety of software development domains • software quality estimation • software cost estimation • software design and reuse
Case-Based Reasoning (cont.) • Working hypothesis for CBR • modules with similar attributes should belong to the same quality-based group • To obtain a CBR model for a given data set some parameters have to be assigned • e.x. nN & c • In order to obtain a preferred model, we have to vary the combinations of parameters, build the models and choose the ''best one'' manually
Case-Based Reasoning (cont.) • A CBR system comprises of 3 major components: • a case library • a similarity function • a solution algorithm • In a CBR system, program modules related to previously developed systems are stored in a case library
Case-Based Reasoning (cont.) • A similarity function measures the distance between the current case and all the cases in the case library. • Modules with the smallest distances from the module under investigation are considered similar and designated as the nearest neighbors. • Many similarity functions can be used, such as • city block, Euclidean & Mahalanobis
Case-Based Reasoning (cont.) • Mahalanobis distance where • xi stands for the current case • cj is the jth case in the case library • the prime (′) implies a transpose • S is the variance-covariance matrix of the independent variables over the entire case library
Case-Based Reasoning (cont.) • A generalized data clustering classification rule is used as the solution algorithm of the CBR system
Case-Based Reasoning (cont.) • In the context of a two-group classification model, two types of misclassifications can occur: • Type I (nfp module classified as fp) • Type II (fp module classified as nfp)
Case-Based Reasoning (cont.) An Example: • For a given nN, an inverse relationship between the Type I and Type II error rates is observed when varying the value of c • The preferred balance is that the two error rates are approximately equal with the Type II error rate being as low as possible. preferred balance: C=0.95 Type I =23.16% Type II = 23.14%
Case-Based Reasoning (cont.) • Create a new project • Choose the fit data set Cross validation: In K-fold cross-validation, the original sample is partitioned into K subsamples. Of the K subsamples, a single subsample is retained as the validation data for testing the model, and the remaining K − 1 subsamples are used as training data. The cross-validation process is then repeated K times (the folds), with each of the K subsamples used exactly once as the validation data. The K results from the folds then can be averaged (or otherwise combined) to produce a single estimation.
Case-Based Reasoning (cont.) 3. Select the metrics (independent variables) and dependent variable. 4. Choose the model with CBR
Case-Based Reasoning (cont.) 5. Create a new experiment 6. Choose the similarity function Note: If you choose Mahaanobis distance, use the “pooled covariance” 7. Model type should be “classification”
Case-Based Reasoning (cont.) Press the “Execute” button to run the program.
Case-Based Reasoning (cont.) Result will be display in this box. 8. Choose the preferred model based on the model selection strategy.
Case-Based Reasoning (cont.) Which one is the preferred model?
Case-Based Reasoning (cont.) Which one is the preferred model? C=0.6 and nN=14 or C=0.6 and nN=15 Type I error rate =27.551% Type II error rate =28.571%
Case-Based Reasoning (cont.) 9. Once you choose the preferred model, record the parameters you used. For example C=0.6 and nN=15 10. Then, apply the selected model ( the selected parameters) to the test data set.
Case-Based Reasoning (cont.) This is the prediction result on the test data set
Case-Based Reasoning (cont.) 12. Calculate the ECM. For example: ECM = (15×1+14×5)/94=0.904255
Case-Based Reasoning (cont.) In terms of Type I and Type II error rates