230 likes | 419 Views
CBR for Modeling Complex Systems. Rosina Weber, Jason M. Proctor, Ilya Waldstein College of Information Science & Technology Andres Kriete School of Biomedical Engineering, Science and Health System, Coriell Institute for Medical Research. In a Nutshell.
E N D
CBR for Modeling Complex Systems Rosina Weber, Jason M. Proctor, Ilya Waldstein College of Information Science & Technology Andres Kriete School of Biomedical Engineering, Science and Health System, Coriell Institute for Medical Research
In a Nutshell • Some systems are too complex to be directly used in reasoning tasks • E.g., biological systems, large organizations, ecosystem • Large, hard to access, difficult to understand, hidden interactions • The alternative is to use models to represent these systems • Models can be built when there is knowledge or data about the system • In the absence of both, we propose to use CBR to recommend a model for reuse Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Open research questions • How can CBR manipulate complex systems and models? • Can CBR recommend accurate models for reuse? Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Manipulating Complex Systems with CBR Model2 Model2 Model1 Complex systemn+ Unknown Complex system …. Case problems Complex system1 Complex system2 Complex systemn Modeln …. Case solutions Modeln+ Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
CBR for Modeling Complex Systems • Does this work in CBR? Complex system1 Complexsystem2 Complexsystemn Case problem …. Model1 Model2 Modeln Case solution Estimated Measure of Certainty1 Estimated Measure of Certainty2 Estimated Measure of Certaintyn Case outcome Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Challenges • What makes one system similar to another? • How can models be compared? How can we find similar solutions for similar problems? Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Approach: Assumptions (i) Problemi Problemj • 1st assumption Two solutions are similar if they have similar features in a chosen representation. Solutioni Solutionj Outcome1 Outcome2 Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Approach: Assumptions (ii) Problemi Problemj • 2nd assumption: Two problems are similar if they are solved by solutions that are considered similar. Solutioni Solutionj Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Approach • 1st step: Identify similar solutions • Cluster existing problem-solution pairs (cases) based on features of the solutions • 2nd step: Identify problem features that support the clustering • Determine participation of problem features in each cluster to eliminate less relevant features • 3rd step: Define a similarity measure for all cases • Use the results of step 2 to assess similarity between problems Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Open research questions • How can CBR manipulate complex systems and models? • Can CBR recommend accurate models for reuse? Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Validation: Dataset • Complex systems • Models to represent them • Verification of the models’ quality Software systems Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
CI-Tool (baseline approach) No indication of what makes a software program similar to another for the purposes of input-output analysis Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Data Set • Twenty-one (21) software programs described through 23 features • Problem features • parameters e.g. # of inputs • Solution features • ANN configuration parameter values • dataset used for the training • Outcome feature • Success rate of ANN Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Validation: Hypothesis, Metrics • Hypothesis • Our approach can support the recommendation of models as accurate as the baseline approach • Metric: accuracy • Average accuracy of the models recommended by our CBR approach compared to baseline approach Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Methodology: LOOCV Si Si Si Si Si Si Si Si Si Si • 1st step: Cluster analysis Pi Pi Pi Pi Pi Pi Pi Pi Pi Pi 5 Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Methodology: LOOCV Si • 2nd step: Stepwise discriminant analysis Si Si Si Si Si Si Si Si Si Pi Pi Pi Pi Pi Pi Pi Pi Pi Pi • Discriminant functions that map problem features in the cluster space Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Methodology: LOOCV Si TPi • 3rd step: Apply discriminant functions to assess similarity between cases Pi Si Pi Si Pi Si Pi Si Pi Si Pi Si Pi Si Pi Pi Si Pi Si Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Results • 71.4% of the results support our hypothesis • 61.9% no statistical difference • 9.5% is significantly higher • CBR can recommend accurate models for reuse in the absence of an alternative • CBR may also be considered to find highly suitable models 2 Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Performing tasks with gene expression data Complex system1 Biological system1 Complexsystem2 Biological system2 Biological systemn Complexsystemn Biological systems described through gene expression data Gene expression can be measured with microarrays Microarrays reveal how genes “behave” Reasoning tasks: case solution includes the model and task solution Case problem …. Model1 Task solution1 Model1 Diagnosis1 Model1 Model1 Prescription1 Model2 Model2 Task solution2 Model2 Prescription2 Model2 Diagnosis2 Modeln Task solutionn Modeln Prescriptionn Modeln Modeln Diagnosisn Case solution Estimated Measure of Certainty1 Estimated Measure of Certainty2 Estimated Measure of Certaintyn Case outcome Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Example Complex system1 Individual1 Demographics1 GE[x1y1] Complexsystem2 Individual2 Demographics2 GE[x2y2] Complexsystemn Individual3 Demographicsn GE[xnyn] A study is represented in one case Model is build with data and diagnosis EMC is determined with statistics of the study Individual Demographics GE[yn] Case problem …. Model1 Task solution1 Model1 Prescription1 Model1 Model1 Diagnosis1 Model2 Prescription2 Model2 Diagnosis2 Model2 Model2 Task solution2 Modeln Prescriptionn Modeln Task solutionn Modeln Modeln Diagnosisn Modeli Diagnosisi Modelj Diagnosisj Case solution EMC1 EMC2 EMCn EMCi EMCj Case outcome Diagnose a new target individual using this case base No GE data is available for brain cells Retrieval uses information and data available Recommends models Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Conclusion • As more studies are conducted more cases are created • The certainty of the diagnosis has the potential to increase • Increased understanding of the domain by the incorporation of analogy through CBR Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Future Work • Develop and test reuse methods • Test other models (e.g. SVM, IFN) • Methods for determining EMC • Apply the approach to biological and environmental problems Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005
Thank you! Any questions? Rosina Weber, ICCBR05 Chicago, Il Aug 26 2005