170 likes | 295 Views
R. Weber College of Information Science & Technology Drexel University. Identifying Critical Factors in Case-Based Prediction. Outline. Case-Based Prediction, Critical Factors Motivation Background: Use of Domain Knowledge Methods to Identify Critical Factors
E N D
R. Weber College of Information Science & Technology Drexel University Identifying Critical Factors in Case-Based Prediction
Outline • Case-Based Prediction, Critical Factors • Motivation • Background: Use of Domain Knowledge • Methods to Identify Critical Factors Gradient descent, Logistic regression, Feature-oriented Case-based, Knowledge-based, Union • Comparative Study Dataset, Methodology, Results • Conclusions • Future Work
Case-Based Prediction • The predicted outcome can be: • Irreversible • Path of natural disasters, e.g. hurricane, tornados • Reversible • Ongoing project outcome, project effort, cost; health conditions • Critical Factors: • features (feature-value) that support the outcome • significant changes in their values can potentially reverse the prediction either alone or in conjunction with changes in values of other critical factors • Critical Success and Critical Failure Factors
Motivation • Assumption: • Users are interested in prediction of reversible outcomes so they can reverse unwanted predictions • Health conditions, project/system failure • Aamodt and Nygaard (1995): • Consider the entire application context (including user’s perspective) to maximize usefulness of CBR systems • Motivation: • Case-based prediction systems that do not indicate effective and efficient ways to reverse unwanted outcomes do not take into account the user’s perspective. • Find a minimal set of critical factors that maximize the chances of reversing unwanted outcomes
Background on Case-Based Prediction • ICCBR 2001: Kadoda et al. has stated that design decisions depend on the dataset • FLAIRS 2002: Watson et al. has evaluated different design decisions because of such bias • CBRW91: Cain, Pazzani, Silverstein proposed EBL+CBR to improve accuracy of case-based prediction when features outnumber cases • ICCBR03: Weber et al. confirmed the improvement in accuracy (scarce data, bias) against other CBR techniques and logistic regression
Methods to Identify Critical Factors Scope • Personalized • Methods that identify failure and success factors that are specific to the case under assessment and to its actual values • Collective • They only identify the features • Provide trends based upon a community of cases. When this community consists of real world experiences, they represent evidence of the importance of these factors
Collective Methods • Gradient descent • Critical factors are those features whose resulting importance values are above the overall average. • Logistic regression • Critical factors are those features with the strongest correlations to the outcome and then these features are used for prediction purposes • Feature-oriented • Using LOOCV, submit a project description for prediction and observe the resulting accuracy; then, submit each feature separately and the success factors the features that produce accuracy closest to the overall accuracy of true positives and as failure factors the ones with overall accuracy closest to true negatives
Personalized Methods • Case-based • Failure factors are feature-value pairs that co-occur in both the target case and in the similar case(s) that was(ere) used to predict failure in the target??????? • Knowledge-based • Submit new case to the EBL method to identify relevance factors with the resulting prediction • In predictions of failure, the feature-values assigned relevance factors are critical failure factors • For the remaining features, we replaced the predicted outcome to assign relevance factors for the alternate outcome • Union • We combined the knowledge-based and the case-based methods by taking the union of the factors each individually identify.
Comparative Study: Dataset • Dataset • 20 out of 88 real cases of software development projects • 23 symbolic features • The 12 out of 21 projects have all originally failed and when submitted to the EBL+CBR prediction, they were predicted to fail.
Comparative Study: Methodology • Methodology consists of 3 stages: • 1) Identification of critical factors • 2) Overturn • 3) Prediction
Results for Collective Methods • GD maximizes reversal but does minimize the set of factors • Feature-oriented is the most efficient • Methods currently used performed most poorly
Results for Personalized Methods Results for Knowledge-Based Overturning
Knowledge-Based Overturning • Personalized • Different methods are able to reverse a project’s prediction using different sets of factors, and one method reversed a prediction contrary to domain knowledge. • Collective • GD failed to reverse one project. However, when we perform knowledge-based overturning we found that it still cannot reverse that one project. More interestingly, some projects are no longer reversed.
Conclusions ?? Recommendations • Domain specific conclusion • 2 factors were identified by all of them • a well defined scope • end users having time for requirements gathering-- • Domain knowledge combined with contextual experiential knowledge may uncover knowledge • Define the level of reversibility of factors, e.g., using measures of efficiency of factors throughout the dataset and by project. Factors that are easy to reverse should receive priority.
Future Work • Case-based framework to learn: • Weights for EBL rules • Dependencies between rules • Dependencies between factors • How to use contextual knowledge embedded in cases to reverse unwanted outcomes? • Use collective methods to identify critical factors and then use cases to assess their potential to reverse unwanted outcomes
Acknowledgements • Co-authors • William Evanco, Michael Waller, June Verner • Colleagues • This and previous work • Anonymous reviewers • National Institute for Systems Test and Productivity