1 / 27

Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies

Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies. Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri. Research Collaborators. Olivier Bodenreider, M.D., Ph.D. Alexa T. McCray, Ph.D. Allen C. Browne. Research Goals.

dorit
Download Presentation

Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linking Diseases and Genes through Informatics Knowledge Bases and Ontologies Joyce A. Mitchell, Ph.D. National Library of Medicine University of Missouri

  2. Research Collaborators • Olivier Bodenreider, M.D., Ph.D. • Alexa T. McCray, Ph.D. • Allen C. Browne

  3. Research Goals • Investigating methods of connecting the disease and genomic information. • Overall goals are to: • Overcome difficulties traversing multiple information resources • Examine coverage of Unified Medical Language System® (UMLS®), Gene OntologyTM (GO), LocusLink-OMIM • Develop methods to use ontologies more effectively • Present data in understandable manner

  4. Background – UMLS • NLM developed, maintains • Purpose: facilitate retrieval & integration of information from multiple biomedical sources • Interrelates 60 biomedical terminologies • MeSH, SNOMED, Read Codes, ICD, etc • No vocabulary focused on molecular biology • 1.5 million English terms; 800,000 concepts; • 134 Semantic Types; 54 Semantic Relationships

  5. Background – Gene Ontology • GO Consortium developed, maintains • Purpose: • promoting cross-species methodologies for functional comparisions • Allows annotation of molecular information on genes, gene products • “an essential start to creating a shared language of biology” ** • Focused on • molecular function (5626 terms) • biological processes (4677 terms) • cellular components (1077 terms) • Two semantic relations (is-a and part-of) **Genome Research 2001; 11:1425-33.

  6. Background - LocusLink • Curated, gene-centered resource of National Center for Biotechnology Information (NLM) • Gene names, gene product names, gene product functions, and reference sequences (DNA, RNA, protein) • Associates phenotype (diseases) to the genotype via Online Mendelian Inheritance in Man (OMIM) • Online links to major bioinformatics knowledge bases and the literature

  7. Specific Questions This study looked at coverage in UMLS of • 1244 genes associated with human diseases • 1702 diseases associated with the genes • 11,380 Gene Ontology terms • 38,832 genes/gene products in GO database (141,071 names) • Associations of genes and their functions in UMLS • Representation of gene function in GO compared to the UMLS

  8. Methods • LocusLink query: • human genes whose sequence is known and associated with disease (1244 loci) • LocusLink data: • Genes/gene products (official names, synonyms, symbols) • Phenotypes (diseases) (1702 diseases) • GO data: • all concepts (ontology terms), excluding obsolete terms (11,380 terms) • Gene products from all species (134,646 unique names, 38,832 genes)

  9. Methods • LocusLink and GO terms mapped to UMLS concepts • normalization used • mappings constrained by semantic type • LocusLink loci studied for relationships in UMLS • Gene/GP – phenotype • Gene/GP – molecular function • Gene/GP – biological process • Gene/GP – cellular component • For specific genes compared annotations in GO to representation in UMLS

  10. Results - 1 • For 1244 genes from LocusLink • 18% found in the UMLS

  11. Results - 2 • For 1702 phenotypes (diseases) corresponding to 1244 genes • 34% found in the UMLS (575/1244) • Most frequent single gene diseases covered • Huntington Disease • Cystic Fibrosis • Marfan Syndrome • Phenylketonuria • Achondroplasia

  12. Results - 3 • GO terms found in MeSH 2764 terms • GO terms found in SNOMED 1366 terms • GO terms found overall: 27% 3062/11,380

  13. Results - 4 • For 134,646 unique gene names in GO database

  14. Results - 5 • LocusLink – UMLS Relationship Categories found overall: 72%

  15. Results - 5 Type of Relationship • Associative 613 • Co-occurrence 3353 • Hierarchical 1168

  16. Results - 6 • Representation of gene function in GO compared to the UMLS

  17. Neurofibromin 2 – merlin in GO

  18. Discussion

  19. Best & Worst Mappings Best mapping categories • Molecular function (GO) 44% • Cellular component (GO) 35% • Phenotype (LL) 34% Worst mapping categories • Gene synonym (GO) 6% • Biological process (GO) 5% • Gene symbol (GO) 2%

  20. Only 34% of diseases? In OMIM-LL, diseases are subdivided by genetic causes but not in UMLS E.g. Limb Girdle Muscular Dystrophy LGMD is represented in UMLS • A SNOMED term • in MeSH it is an entry term for muscular dystrophies • MeSH notes for MD: A general term for a group of inherited disorders which are characterized by progressive degeneration of skeletal muscles (ed, 2000)

  21. Limb Girdle Muscular Dystrophy – genetic types

  22. Only 5% of Biological Processes? • Only 256 of the biological processes mapped to terms in UMLS. • In GO, processes are elaborated & organism specific • Example: UMLS - Mitotic spindle • GO • Mitotic spindle assembly • Mitotic spindle assembly (sensu Saccharomyces) • Mitotic spindle assembly (sensu Fungi) • Mitotic spindle checkpoint • Mitotic spindle elongation • Mitotic spindle orientation • Mitotic spindle positioning • Mitotic spindle positioning and orientation

  23. Why so few gene names and synonyms mapped? • Official gene names have metadata and comments. • dystrophin (muscular dystrophy, Duchenne and Becker types), includes DXS143, DXS164, DXS206, DXS230, DXS239, DXS 268, DXS269, DXS270 DXS272 • No single source has all names and synonyms • GO synonym field contains IPI number for well known genes, does not match UMLS (useful cross reference but not a synonym) • Symbols are short acronyms and match poorly

  24. Summary 1 • UMLS needs improvement in molecular biology domain but has considerable content: • 27% of GO concepts map • 34% of single gene diseases • Existing UMLS terms come primarily from MeSH and SNOMED • Overall, positive mapping for 13,000 terms

  25. Summary continued • If the terms are in UMLS, it is possible to find a relationship between genes and phenotypes and gene function much of the time. • UMLS does better with the human genes (20%+) than with genes from all organisms (11%) • UMLS and GO representations complement each other.

More Related