500 likes | 511 Views
Explore the importance of verification and validation in agent-based and equation-based simulations for identifying transposable elements in the Aedes aegypti genome. Learn about distinct models and cost-effective techniques in this interdisciplinary project involving NOM evolution and an economic model.
E N D
Verification and Validation of Agent-based and Equation-based SimulationsandBioinformatics Computing: Identifying Transposable Elements in the Aedes aegypti Genome Ryan C. Kennedy Department of Computer Science and Engineering University of Notre Dame
Verification and Validation of Agent-based and Equation-based Simulations
Overview • Introduction • Motivation • Concepts of Verification and Validation • Research Objectives and Methods • Case Study I • An Agent-based Scientific Model • Case Study II • An Equation-based Economic Model • Conclusion • Future Work
Motivation • NSF Blue Ribbon Panel (February 2006): “New theory and methods are needed for handling stochastic models and for developing meaningful and efficient approaches to the quantification of uncertainties. As they stand now, verification, validation, and uncertainty quantification are challenging and necessary research areas that must be actively pursued.” • Dr. Richard W. Amos • Deputy to the Commanding General, U.S. Army Aviation and Missile Command (AMCOM) • Previously the Director of the System Simulation and Development Directorate in the Aviation and Missile Research, Development and Engineering Center (AMRDEC) • Verification and Validation • 10-15% of total cost of model development, but often overlooked in overall lifecycle *Oden: “Simulation-Based Engineering Science: Revolutionizing Engineering Science through Simulation”
Model Verification & Validation (V & V) • V & V • Verification: • solve model right • Validation: • solve right model • The cost and value influence confidence of model • Want optimal cost-effectiveness of V & V *Adapted from Sargent: “Verification and Validation of Simulation Models”
Verification and Validation Process *Adapted from Sargent: “Verification and Validation of Simulation Models” and Huang: “Agent-Based Scientific Simulation”
Applicable Verification and Validation Methods *Balci: “Handbook of Simulation: Principles, Methodology, Advances, Applications, and Practice” lists more than 75 Methods
V & V: Subjective Analysis • Examples of V & V Techniques • Face Validity • Animation • Graphical Representation • Turing Test • Internal Validity • Tracing • Black-Box Testing
V & V: Quantitative Analysis • Examples of V & V Techniques • Docking (Model-to-Model Comparison) • Historical Data Validation • Sensitivity Analysis/Parameter Variability • Prediction Validation
What and How • Research objective • Perform V & V on distinct models and identify the more cost-effective techniques • How • Two very different projects as case studies • Evaluate and adapt the formalized V & V techniques in industrial and system engineering
Case Study I:An Agent-based Scientific Model • NSF funded interdisciplinary project • Understanding the evolution and heterogeneous structure of Natural Organic Matter (NOM) • E-science example • Chemists, biologists, ecologists, and computer scientists • Agent-based stochastic model • Web-based simulation model
Case Study I:NOM • What is NOM? • Heterogeneous mixture of molecules in terrestrial and aquatic ecosystems • Why study NOM? • Plays a crucial role in the evolution of soils, the transport of pollutants, and the global carbon cycle • Understanding NOM helps us better understand natural ecosystems • Hard to study in laboratory
Case Study I:The Conceptual Model I • Agents • A large number of molecules • Heterogeneous properties • Elemental composition • Molecular weight • Characteristic functional groups • Behaviors • Transport through soil pores (spatial mobility) • Chemical reactions: first order and second order • Sorption
Case Study I:The Conceptual Model II • Stochastic Model • Individual behaviors and interactions are stochastically determined by: • Internal attributes • Molecular structure • State (adsorbed, desorbed, reacted, etc.) • External conditions • Environment (pH, light intensity, etc.) • Proximity to other molecules • Length of time step, Δt • Space • 2D Grid Structure • Emergent properties • Distribution of molecular properties over time
Case Study I:Docking I • Compare the model with validated one • Compare the model with non-validated one • Different implementations • Different programming languages • Different packages • Different modeling approaches • Agent-based approach vs. Equation-based approach • Powerful method
Case Study II:An Economic Model • Interdisciplinary project • Initially written in Matlab within Department of Finance • Converted to C++ by Computer Scientists • Equation-based system • Concerned with identifying ideal economic variables, such as debt, money growth, and tax rate
Case Study II:The Conceptual Model • Equation-based system • Nonlinear projection methods used to solve Ramsey problems in a stochastic money economy • Goal is to generate the best social welfare for a given economy • Motivation
Case Study II:Tracing • Matlab: it 44, af 3.7496e-08, rc 0, timer 11.1, l 0.1382704496, m -0.0092286139, t 0.1881024991, h 0.3093668925 cc1 0.4861695543, cc2 0.6212795130, rl 1.0092221442 it 45, af 2.64653e-08, rc 0, timer 11.0, l 0.1382704643, m -0.0092286175, t 0.1881024947, h 0.3093668931 cc1 0.4861695553, cc2 0.6212795120, rl 1.0092221442 • C++: it: 44 af: 0.00144839 rc: 0 l: 0.138359 m: -0.00936025 t: 0.188252 h: 0.309338 cc1: 0.486205 cc2: 0.621244 rl: -0.65888 it: 45 af: 0.00144784 rc: 0 l: 0.138401 m: -0.00937062 t: 0.188239 h: 0.30934 cc1: 0.486208 cc2: 0.621241 rl: -0.665511
Summary & Conclusion • Applied V & V techniques to distinct case studies to increase model confidence • Some techniques are more cost-effective
Future Work • More in-depth survey of V & V methods • More rigorous quantitative methods • Compare simulation results against empirical data • Invalidation Testing • More general and formalized V & V process model
Bioinformatics Computing: Identifying Transposable Elements in the Aedes aegypti Genome
Overview • Introduction • Motivation • Basic Biological Concepts • Bioinformatics • Aedes aegypti • Transposable Elements • Approaches to Identifying Transposable Elements • Conclusion • Future Work
Motivation • Bioinformatics field is rapidly growing • Computer scientists can help advance its study • A better understanding of the biology of organisms would be helpful to scientists • Transposable elements can be useful tools to scientists • Computer scientists can help biologists develop advanced techniques to find transposable elements
Biological Foundations • All cells contain DNA, RNA, and protein molecules • DNA • Composed of four nucleotides • Building block of life • RNA • Transfers DNA throughout a cell • Protein • Laborer of the cell • Central Dogma of Molecular Biology:
Bioinformatics • Collective study of numerous fields and techniques to solve biological problems • Focused on the study of DNA and its underlying characteristics • Computer science lends itself well to bioinformatics
Bioinformatics Research Topics • Genome Annotation • Assigning biological meaning to regions of a sequence • Sequence Alignment • Comparing two or more sequences • Sequencing • Finding the structure of a given sequence • Genome Assembly • Assembling many short sequences of DNA
Bioinformatics Tools • Perl • BioPerl • BLAST • Popular alignment tool • Hidden Markov Model • Clustal X • Phylogenetic Tree • Relationships between sequences • Bioinformatics Collaboratories • NCBI, Ensembl, VectorBase
Aedes aegypti • Tropical Mosquito • Vector for dengue and yellow fever viruses • Its unannotated genome recently released • Much larger genome than that of other mosquitoes
Transposable Elements • Often referred to as “jumping genes” • Can make up large portions of a genome • Can transfer genetic material • Useful when performing evolutionary studies • Typically divided into Class I, Class II, and Class II elements
Transposons • Class II transposable elements • Divided into many families • piggyBac, Tc1, pogo, mariner, P element • Typical structure of a transposon:
Typical Approach • BLAST known transposons against a new genome • Good for identifying known or similar transposons in new genomes • Does not account for sequence variations
Focused on identifying P elements Utilized multiple tools and scripts Able to identify previously unknown transposons Clustal X and the HMMER suite allowed us to perform a more through search Cannot account for frame shifts Approach I
Used for five families of transposons Utilized GeneWise Did not search for new transposons Approach II
Proposed approach Utilize better aspects of first two approaches Can be used for all families described in this study Hybrid Approach: A Transposable Element Discovery Methodology
mariner family Clustered clades indicate close relationships Phylogentic Tree
Summary & Conclusion • Found a reasonable number of transposons • Utilized novel approaches to finding transposons • First such study using this type of approach on the Aedes aegypti genome • Proposed a hybrid approach
Future Work • Utilize hybrid approach • Automate process • Comparison of transposable elements found in Aedes aegypti and Anopheles gambiae