410 likes | 561 Views
Working in Real Time: Building Ontologies While Annotating the Mouse from Genotype to Phenotype. Judith Blake, Ph.D . Mouse Genome Informatics The Jackson Laboratory Bar Harbor, ME 04609. Mouse Genome Informatics. Genotype. Expression. Phenotype. Mouse Genome Database Project (MGD)
E N D
Working in Real Time: Building Ontologies While Annotating the Mouse from Genotype to Phenotype Judith Blake, Ph.D. Mouse Genome Informatics The Jackson Laboratory Bar Harbor, ME 04609
Mouse Genome Informatics Genotype Expression Phenotype • Mouse Genome Database Project (MGD) • Genes and Gene Products • Comparative Analysis • Alleles and Phenotypes • Gene Expression DB Project (GXD) • Embryonic gene expression • Extensive experimental data • Mouse Genome Sequence Project (MGS) • Connecting sequence & biology Objective: Facilitate the use of the mouse as a model for human biology by furthering our understanding of the relationship between genotype and phenotype.
MGI Integration Efforts • Integrated experimental and consensus views • Mapping, molecular, alleles, expression, phenotypes • Gene to GO associations • Canonical gene and sequence • Collaborations with SWISS-PROT and LocusLink • Nomenclature standards, gene groupings • Curated mammalian orthologies • used in collaborations with RatDB, NCBI and others • Index of primary literature • Share knowledge from mouse disease models with medical informatics resources All data associations supported with evidence and citation
Common Issues for Model Organism Databases • Data Integration • From Genotype to Phenotype • Experimental and Consensus Views • Incorporation of large datasets • Whole genome annotation pipelines • Large scale mutagenesis projects • Computational vs. Literature-based data collection and evaluation • Data Mining…extraction of new knowledge
Challenges • Genotype • Mouse and Human genome sequences • Integrating genes/models with existing biological information • Updates, emerging knowledge • Phenotype • Mega-mutagenesis programs • Phenome project / baselines • Standard screens • Integration of mutant information, targeted mutations, transgenes, expression arrays jblake-Manchester BioInform Wk
Numbers (20 March 2002) No. of References 70,874 No. of Genes 35,404 No. of Markers 54,834 Genes w/ NT Seq 31,386 Genes w/ AA Seq 12,875 Genes w/ Orthologs 7,051 Genes Mapped 19,058 jblake Manchester BioInfor Wk
Genes and Markers Mammalian Homology Sequences and Maps Strains and Polymorphisms Embryonic Expression mouse BLAST, molecular segments References, AccID, Access to MGI resources Alleles and Phenotypes
Enable Complex Queries “Show me all genes with their human orthologs located between cM 5 and 7 on Chr. 3 whose gene products localize to the mitochondrial membrane and whose associated mutant phenotypes include ‘skeletal dysmophology” jblake Manchester BioInfor Wk
GOannotations Gene detail page in MGD for the vitamin D receptor gene, Vdr
Sets of Orthologs Data associations supported by evidence and citation Orthologs of Vdr
Gene/Marker Type Allele Type Assay Type Expression Mapping Molecular Mutation Inheritance Mode Nomenclature Evidence Codes Tissue Cell Lines Units Cytogenetic Molecular ES Cell Line Strain Multiple Keyword Sets jblake Manchester BioInfor Wk
Allele Query Form Controlled Vocabularies for Describing Alleles
Structured Vocabularies and Ontologies • Anatomy • GO: • Molecular function, • Biological process, • Cellular component • Phenotypes • Disease Models jblake Manchester BioInfor Wk
Anatomical Dictionary Theiler stage 10 (7 dpc) http://genex.hgu.mrc.ac.uk/Databases/Anatomy/ Collaboration with MRC / Edinburgh 3D-Atlas project
Links between anatomical structures at successive stages of mouse development enable the analysis of differentiation pathways
Alternative anatomical hierarchies - describe and view anatomy from different anatomical, physiological, and disease perspectives (not just ‘geographical location’, but systems (circulatory) that ‘span geography’ - integrated analysis of expression and phenotype / disease data
94 lines Consolidated Anatomical Dictionary | heart | %cardiogenic plate | %primitive heart tube | | <myocardium | | <endocardium | | <cardiac jelly | <aortic sinus | <atrio-ventricular canal (ependymal canal) | <atrio-ventricular cushion tissue (bulbar cushion,ependymal cushion tissue) | <atrium | | %primitive atrium | | %common atrial chamber | | | <common atrial chamber bulbous cordis | | | <common atrial chamber, left part | | | | <common atrial chamber, left part, cardiac muscle (myocardium) | | | | <common atrial chamber, left part, endocardial lining | | | | <common atrial chamber, left part, cardiac jelly | | | <common atrial chamber, right part | | | | <common atrial chamber, right part, cardiac muscle (myocardium) | | | | <common atrial chamber, right part, endocardial lining | | | | <common atrial chamber, right part, cardiac jelly | | <left atrium | | | < left atrium auricular region | | | | <left atrium auricular region cardiac muscle (myocardium) | | | | < left atrium auricular region endocardial lining | | | <left atrium cardiac muscle (myocardium) | | | <left atrium endocardial lining | | <right atrium | | | <right atrium auricular region | | | | <right atrium auricular region cardiac muscle (myocardium) | | | | <right atrium auricular region endocardial lining | | | <right atrium cardiac muscle (myocardium) | | | <right atrium endocardial lining | | | <right atrium valve | | | | % right atrium venous valve | | < interatrial septum | | | < foramen ovale | | | < septum primum | | | | < foramen primum (ostium primum) | | | | < foramen secundum (ostium secundum) | | | < septum secundum | <endocardial tissue | | <endocardial cushion tissue (bulbar cushion) | | <bulboventricular groove | | <bulbus cordis | | | < bulbus cordis caudal half (myocardium) | | | | <bulbus cordis caudal half cardiac muscle (myocardium) | | | | <bulbus cordis caudal half endocardial lining | | | | <bulbus cordis caudal half cardiac jelly | | | < bulbus cordis rostral half (conotruncus) | | | | < bulbus cordis rostral half cardiac muscle (myocardium) | | | | < bulbus cordis rostral half endocardial lining | | | | < bulbus cordis rostral half cardiac jelly | < heart mesentery | | <dorsal mesocardium (dorsal mesentery of heart) | | | <dorsal mesocardium transverse pericardial sinus | <outflow tract | | <outflow tract aortic component | | <outflow tract aortico-pulmonary spiral septum | | | <outflow tract future ascending aorta | | <outflow tract pulmonary component
Biol. Process Phenotype Anatomy Gene expression jblake Manchester BioInfor Wk
Mouse Heart Development From The Heart by Margaret Kirby in “Embryos, Genes and Birth Defects”. Edited by Peter Thorogood Beyond mouse • Data integration depends on indexing to defined sets of objects. • Speaking the same language • ‘Development’ • ‘Heart’ • Comparisons between model organisms
Goals of the Consortium • Develop structured vocabularies (ontologies) • Unique ID, Definition, Defined relationships • Annotate genes /gene products to vocabularies • Evidence and citation • Support common data resource for integrated queries across multiple organisms
First-Pass Phenotype Set jblake Manchester BioInfor Wk
Query: genes with mutants classified with term ‘eye dysmorphology’ Ey
Genotype/Phenotype A genotype consists of zero, one or more allele pairs on a defined genetic background. The genetic background may be an inbred strain, or it may be unknown.
Some Definitions • Trait: measurable characteristic of individual or population • Blood pressure, coat color, % body fat • May be associated with anatomical structure, e.g., an immune response with its site of action • Phenotype: name for a group of traits, syndrome, condition • e.g., type II diabetes, obesity, lymphocytic leukemia jblake Manchester BioInfor Wk
a phenotype can be characterized by many traits & a trait can help characterize many phenotypes Leprdb-3J/Leprdb-3J Phenotype a Phenotype b Phenotype c Trait 1 Trait 2 ….. Trait n jblake Manchester BioInfor Wk
Developing structured descriptors for traits • Use existing and develop new controlled vocabularies that cover orthogonal concepts • Combine terms from these vocabularies to describe traits • Assign phenotype (disease) terms for nomenclature ease Joel Richardson, Michael Ashburner, Martin Ringwald jblake Manchester BioInfor Wk
Concept Examples System: Immune system, cardiovascular system Tissue: heart, lung, liver, eye, skin Cell type: epithelial, fibroblast, myoblast, melanocyte Age: E15, P25 Biol.Process: apoptosis, growth, cell differentiation, behavior Metabolite: Glucose, Calcium Qualifier: abnormal, absent, enlarged, increased, disrupted DCS = dolichostenomelia = disproportionally long limbs, due to long bone overgrow
Relationships of Mouse Models to Human Diseases • Mouse gene ortholog, same mutation • Same phenotype • Different phenotype • Mouse gene ortholog, different or unknown mutations • Same or different phenotypes • Mouse phenotype same as human • Mouse gene ortholog • Another mouse gene • Gene unknown • Mouse phenotype similar • Unknown genetic component • Gene same or different
Goal: Query Mouse Data by Human Disease Test Results • 1676 disease listings in OMIM • 382 have phenotype reports • 3187 notated mouse/human orthologs • 958 correspond to OMIM entries • 305 have phenotype reports • 8535 listings in MESH disease tree • 709 correspond to orthologs • 237 have phenotype reports
Summary • Integration • Requires both manual and computational approaches • Attention to data modeling, object identity, data migration issues • Ontologies and standardized vocabularies • Integral component of integration effort • Essential for extracting knowledge • Parallel development • ontology representations • data acquisition and integration efforts jblake Manchester BioInfor Wk
Acknowledgments - MGI Carol Bult Ben King Richard Baldarelli Dirck Bradt Sridhar Ramachandran Deborah Reed Diane Dahman Sophia Zhu Donnie Qi LongLong Yang Pat Grant Nancy Butler Janan Eppig Joel Richardson Martin Ringwald Jim Kadin Lois Maltais Louise McKenzie Harold Drabkin Tom Weigers Jon Beal Lori Corbani Cathy Lutz Cynthia Smith Teresa Chu Sharon Cousins Donna Burkart Ira Lu Li Ni Carroll Goldsmith Moyha Lennon-Pierce Antonio Planchart www.informatics.jax.org David Hill Dale Begley Terry Hayamizu Ingeborg McCright Connie Smith Matt, Mike, Leslie, Jeff, Prita, Jill, Diane, DebbieK, Dieter, Lucette, Janice,
Mouse Genome Informatics http://www.informatics.jax.org Gene Ontology http://www.geneontology.org jblake Manchester BioInfor Wk