1 / 27

Using Ontologies to Annotate Phenotypic Data

Using Ontologies to Annotate Phenotypic Data. Janan T. Eppig December 2008. Mouse Genome Informatics. www.informatics.jax.org. www.informatics.jax.org. Human FOXN1 forkhead box N1 T-CELL IMMUNODEFICIENY, CONGENITAL ALOPECIA, AND NAIL DYSTROPHY. Frank J, et al. Nature 398, 473 - 474 (1999).

lamis
Download Presentation

Using Ontologies to Annotate Phenotypic Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Ontologies to Annotate Phenotypic Data Janan T. Eppig December 2008 Mouse Genome Informatics www.informatics.jax.org

  2. www.informatics.jax.org Human FOXN1 forkhead box N1 T-CELL IMMUNODEFICIENY, CONGENITAL ALOPECIA, AND NAIL DYSTROPHY Frank J, et al. Nature 398, 473 - 474 (1999) Mouse Foxn1. Homozygous “nude” mouse. One of 8 known phenotypic mutations in mouse for the forkhead box N1 gene.

  3. Gather data from multiple sources • Factor out common objects • Assemble integrated objects Data Integration Centers: mutagenesis, gene trap, etc Primary literature Data Loads: GenBank, SNPs, clone collections, UniProt, RIKEN, etc Electronic Submissions (individual labs) Processing, QC, and curation

  4. Integration is hard…not just a matter of combining data sources… • Data from multiple sources can be of differing quality • The same data can enter the system via various paths • Naming conventions may or may not be to standards • Some data sources don’t maintain unique accession numbers (or allow them to change) • Periodic updates from data sources can cause problems • if objects have disappeared… (or reappear) • If objects have split in two

  5. Data integration is hard • “Bucketizing” establishe types of correspondence between objects in the input sets. • Allows immediate incorporation of 1:1 corresponding data. • Sorts conflicting data into bins that allow prioritization for curator resolution.

  6. Literature & Loads Annotation Pipeline New Gene, Strain or Sequence? • Data Acquisition • Object Identity • Standardizations • Data Associations • Integration with other bioinformatics resources Controlled Vocabularies Evidence & Citation Co-curation of shared objects and concepts

  7. Making semantic sense Controlled vocabularies/nomenclatures • Strains • Genes • Alleles (phenotypic or variant) • Classes of genetic markers • Types of mutations • Types of assays • Developmental stages • Tissues • Clone libraries • ES cell lines • and more… ….. organized as lists or simple hierarchies

  8. DAGs Semantics plus relationship data Ontologies/structured vocabularies • Gene Ontology (GO) • Molecular function • Biological process • Cellular component • Mouse Anatomy (MA) • Embryonic • Adult • Mammalian Phenotype (MP) • Sequence Ontology (SO) • Trait Ontology ….. organized as directed acyclic graphs (DAGs)

  9. Vocabulary Note Terms DAGs Growth retardation EE J:65322 IDA J:62648 Dilated renal tubules MP:1956 Synonyms Definition TAS J:65378 Postnatal lethality … Respiratory failure … Annotations Vocabularies in MGI Genotype Strain: AEJ Alleles:bd/bd Strain: C57BL/6 Alleles: Ppp1r3atm1Adpt/ Ppp1r3atm1Adpt

  10. Common software for users to access vocabularies in MGI

  11. Mammalian Phenotype Ontology Synonyms • Structured as DAG • >6,250 terms covering physiological systems, behavior, survival, and development • Available in web browser and in OBO and text formats from MGI ftp and OBO sites • Each term linked to all annotations to the term or its children • >133,00 annotations genotype - MP Term in context Links to all mouse genotypes with this phenotype

  12. behavior/ neurological phenotype muscle phenotype abnormal Involuntary movement abnormal muscle physiology myoclonus opisthotonus tremors abnormal reflex

  13. Mammalian Phenotype (MP) Ontology • …make phenotype & disease model data robust & accessible to researchers & computational biologists • semantically consistent search methods • integrated access to all phenotypic variation sources • (single-gene, genomic mutations, engineered mutations, QTL, strains) • data on human disease correlation • access to mouse models from various approaches • - Genetic • - Phenotypic • - Computational

  14. Developing the Mammalian Phenotype Ontology • New terms from ongoing curation process • Collaborative community efforts • identify new terms • suggest improved organization of terms • Rat Genome Database • Mutagenesis Centers • Human (NCBI) • OMIA (Online Mendelian Inheritance in Animals) • Proprietary Databases • Future (International Mouse Knockout Projects) • Comparisons among Ontologies (GO Process, Mouse Anatomy, FMA, Cell Type, MPath, etc.) • Systematic review by domain experts

  15. Making Mammalian Phenotype Ontology Work DAGs • accommodate bio-specific terms • computationally useful • human accessible • practical for curation • cross-reference to other ontologies

  16. Terms in MP

  17. Complex Examples: id: MP:0006159 ! ocular albinismintersection_of: PATO:0001558 ! lacking processual parts intersection_of: inheres_in MA:0000261 ! eye intersection_of: towards GO:0006582 ! melanin metabolic process MP definition: absence of melanin (pigment) production in the eye with identifiable melanocytes present id: MP:0006110 ! ventricular fibrillation!intersection_of: PATO:0000688 ! asynchronous !intersection_of: inheres_in CL:0000746 ! cardiac muscle cell !intersection_of: towards GO:0060048 ! cardiac muscle contraction !intersection_of: located_in MA:0000079 ! ventricle endocardium !intersection_of: located_in MA:0000082 ! ventricle myocardium MP definition: asynchronous contraction or quivering of individual cardiac muscle fibers in the ventricles

  18. Status of Phenotype & Disease Data

  19. Current QTL Display

  20. Current QTL display + +

  21. Changes planned for QTL Display Genome coordinates: 132851306-135646474 (MGI Mouse GBrowse)

  22. Need for a trait ontology • What is measured • Blood pressure • % body fat • Coat color • Annotation of • QTL • Strain characteristics / baseline • Measurements Some issues • specificity vs broad • synchronizing wih MP • “how much” cross-species?

  23. OBO-Edit, curation tool for building ontologies

  24. Working on Trait Ontology • MGI • IMPC • MPD • RGD • Domestic Species (Animal QTL) Currently: approx. 3600 terms, built initially by stripping MP working systematically on branches

  25. MGI Phenotype Data Staff Anna Anagnostopoulos Randal P. Babiuk Susan M. Bello Donna L. Burkart Howard Dene Michelle Knowlton Ira Lu Hiroaki Onda Cynthia L. Smith Monika Tomczuk Linda L. Washburn Jonathan S. Beal Kim L. Forthofer Peter Frost

  26. NHGRI grant HG000330 The End

More Related