1 / 50

Verification and Validation of Simulation Models in Bioinformatics Computing

Explore the importance of verification and validation in agent-based and equation-based simulations for identifying transposable elements in the Aedes aegypti genome. Learn about distinct models and cost-effective techniques in this interdisciplinary project involving NOM evolution and an economic model.

hknight
Download Presentation

Verification and Validation of Simulation Models in Bioinformatics Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Verification and Validation of Agent-based and Equation-based SimulationsandBioinformatics Computing: Identifying Transposable Elements in the Aedes aegypti Genome Ryan C. Kennedy Department of Computer Science and Engineering University of Notre Dame

  2. Verification and Validation of Agent-based and Equation-based Simulations

  3. Overview • Introduction • Motivation • Concepts of Verification and Validation • Research Objectives and Methods • Case Study I • An Agent-based Scientific Model • Case Study II • An Equation-based Economic Model • Conclusion • Future Work

  4. Motivation • NSF Blue Ribbon Panel (February 2006): “New theory and methods are needed for handling stochastic models and for developing meaningful and efficient approaches to the quantification of uncertainties. As they stand now, verification, validation, and uncertainty quantification are challenging and necessary research areas that must be actively pursued.” • Dr. Richard W. Amos • Deputy to the Commanding General, U.S. Army Aviation and Missile Command (AMCOM) • Previously the Director of the System Simulation and Development Directorate in the Aviation and Missile Research, Development and Engineering Center (AMRDEC) • Verification and Validation • 10-15% of total cost of model development, but often overlooked in overall lifecycle *Oden: “Simulation-Based Engineering Science: Revolutionizing Engineering Science through Simulation”

  5. Model Verification & Validation (V & V) • V & V • Verification: • solve model right • Validation: • solve right model • The cost and value influence confidence of model • Want optimal cost-effectiveness of V & V *Adapted from Sargent: “Verification and Validation of Simulation Models”

  6. Verification and Validation Process *Adapted from Sargent: “Verification and Validation of Simulation Models” and Huang: “Agent-Based Scientific Simulation”

  7. Applicable Verification and Validation Methods *Balci: “Handbook of Simulation: Principles, Methodology, Advances, Applications, and Practice” lists more than 75 Methods

  8. V & V: Subjective Analysis • Examples of V & V Techniques • Face Validity • Animation • Graphical Representation • Turing Test • Internal Validity • Tracing • Black-Box Testing

  9. V & V: Quantitative Analysis • Examples of V & V Techniques • Docking (Model-to-Model Comparison) • Historical Data Validation • Sensitivity Analysis/Parameter Variability • Prediction Validation

  10. What and How • Research objective • Perform V & V on distinct models and identify the more cost-effective techniques • How • Two very different projects as case studies • Evaluate and adapt the formalized V & V techniques in industrial and system engineering

  11. Case Study I:An Agent-based Scientific Model • NSF funded interdisciplinary project • Understanding the evolution and heterogeneous structure of Natural Organic Matter (NOM) • E-science example • Chemists, biologists, ecologists, and computer scientists • Agent-based stochastic model • Web-based simulation model

  12. Case Study I:NOM • What is NOM? • Heterogeneous mixture of molecules in terrestrial and aquatic ecosystems • Why study NOM? • Plays a crucial role in the evolution of soils, the transport of pollutants, and the global carbon cycle • Understanding NOM helps us better understand natural ecosystems • Hard to study in laboratory

  13. Case Study I:The Conceptual Model I • Agents • A large number of molecules • Heterogeneous properties • Elemental composition • Molecular weight • Characteristic functional groups • Behaviors • Transport through soil pores (spatial mobility) • Chemical reactions: first order and second order • Sorption

  14. Case Study I:The Conceptual Model II • Stochastic Model • Individual behaviors and interactions are stochastically determined by: • Internal attributes • Molecular structure • State (adsorbed, desorbed, reacted, etc.) • External conditions • Environment (pH, light intensity, etc.) • Proximity to other molecules • Length of time step, Δt • Space • 2D Grid Structure • Emergent properties • Distribution of molecular properties over time

  15. Case Study I:Implementations

  16. Case Study I:Face Validity

  17. Case Study I:Internal Validity I

  18. Case Study I:Internal Validity II

  19. Case Study I:Docking I • Compare the model with validated one • Compare the model with non-validated one • Different implementations • Different programming languages • Different packages • Different modeling approaches • Agent-based approach vs. Equation-based approach • Powerful method

  20. Case Study I:Docking II

  21. Case Study I:Docking III

  22. Case Study I:Docking IV

  23. Case Study I:Docking V

  24. Case Study II:An Economic Model • Interdisciplinary project • Initially written in Matlab within Department of Finance • Converted to C++ by Computer Scientists • Equation-based system • Concerned with identifying ideal economic variables, such as debt, money growth, and tax rate

  25. Case Study II:The Conceptual Model • Equation-based system • Nonlinear projection methods used to solve Ramsey problems in a stochastic money economy • Goal is to generate the best social welfare for a given economy • Motivation

  26. Case Study II:Face Verification

  27. Case Study II:Tracing • Matlab: it 44, af 3.7496e-08, rc 0, timer 11.1, l 0.1382704496, m -0.0092286139, t 0.1881024991, h 0.3093668925 cc1 0.4861695543, cc2 0.6212795130, rl 1.0092221442 it 45, af 2.64653e-08, rc 0, timer 11.0, l 0.1382704643, m -0.0092286175, t 0.1881024947, h 0.3093668931 cc1 0.4861695553, cc2 0.6212795120, rl 1.0092221442 • C++: it: 44 af: 0.00144839 rc: 0 l: 0.138359 m: -0.00936025 t: 0.188252 h: 0.309338 cc1: 0.486205 cc2: 0.621244 rl: -0.65888 it: 45 af: 0.00144784 rc: 0 l: 0.138401 m: -0.00937062 t: 0.188239 h: 0.30934 cc1: 0.486208 cc2: 0.621241 rl: -0.665511

  28. Case Study II:Docking

  29. Case Study II:Performance

  30. Summary & Conclusion • Applied V & V techniques to distinct case studies to increase model confidence • Some techniques are more cost-effective

  31. Future Work • More in-depth survey of V & V methods • More rigorous quantitative methods • Compare simulation results against empirical data • Invalidation Testing • More general and formalized V & V process model

  32. Bioinformatics Computing: Identifying Transposable Elements in the Aedes aegypti Genome

  33. Overview • Introduction • Motivation • Basic Biological Concepts • Bioinformatics • Aedes aegypti • Transposable Elements • Approaches to Identifying Transposable Elements • Conclusion • Future Work

  34. Motivation • Bioinformatics field is rapidly growing • Computer scientists can help advance its study • A better understanding of the biology of organisms would be helpful to scientists • Transposable elements can be useful tools to scientists • Computer scientists can help biologists develop advanced techniques to find transposable elements

  35. Biological Foundations • All cells contain DNA, RNA, and protein molecules • DNA • Composed of four nucleotides • Building block of life • RNA • Transfers DNA throughout a cell • Protein • Laborer of the cell • Central Dogma of Molecular Biology:

  36. Bioinformatics • Collective study of numerous fields and techniques to solve biological problems • Focused on the study of DNA and its underlying characteristics • Computer science lends itself well to bioinformatics

  37. Bioinformatics Research Topics • Genome Annotation • Assigning biological meaning to regions of a sequence • Sequence Alignment • Comparing two or more sequences • Sequencing • Finding the structure of a given sequence • Genome Assembly • Assembling many short sequences of DNA

  38. Bioinformatics Tools • Perl • BioPerl • BLAST • Popular alignment tool • Hidden Markov Model • Clustal X • Phylogenetic Tree • Relationships between sequences • Bioinformatics Collaboratories • NCBI, Ensembl, VectorBase

  39. Aedes aegypti • Tropical Mosquito • Vector for dengue and yellow fever viruses • Its unannotated genome recently released • Much larger genome than that of other mosquitoes

  40. Transposable Elements • Often referred to as “jumping genes” • Can make up large portions of a genome • Can transfer genetic material • Useful when performing evolutionary studies • Typically divided into Class I, Class II, and Class II elements

  41. Transposons • Class II transposable elements • Divided into many families • piggyBac, Tc1, pogo, mariner, P element • Typical structure of a transposon:

  42. Typical Approach • BLAST known transposons against a new genome • Good for identifying known or similar transposons in new genomes • Does not account for sequence variations

  43. Focused on identifying P elements Utilized multiple tools and scripts Able to identify previously unknown transposons Clustal X and the HMMER suite allowed us to perform a more through search Cannot account for frame shifts Approach I

  44. Used for five families of transposons Utilized GeneWise Did not search for new transposons Approach II

  45. Proposed approach Utilize better aspects of first two approaches Can be used for all families described in this study Hybrid Approach: A Transposable Element Discovery Methodology

  46. mariner family Clustered clades indicate close relationships Phylogentic Tree

  47. Summary & Conclusion • Found a reasonable number of transposons • Utilized novel approaches to finding transposons • First such study using this type of approach on the Aedes aegypti genome • Proposed a hybrid approach

  48. Future Work • Utilize hybrid approach • Automate process • Comparison of transposable elements found in Aedes aegypti and Anopheles gambiae

  49. Questions or Comments?

More Related