1 / 75

How Ontologies Create Research Communities

How Ontologies Create Research Communities. Barry Smith University at Buffalo http://ontology.buffalo.edu/smith. Who am I?. NCBO: National Center for Biomedical Ontology (NIH Roadmap Center). Stanford Medical Informatics University of San Francisco Medical Center

havard
Download Presentation

How Ontologies Create Research Communities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How Ontologies Create Research Communities Barry Smith University at Buffalo http://ontology.buffalo.edu/smith

  2. Who am I? NCBO: National Center for Biomedical Ontology (NIH Roadmap Center) • Stanford Medical Informatics • University of San Francisco Medical Center • Berkeley Drosophila Genome Project • Cambridge University Department of Genetics • The Mayo Clinic • University at Buffalo Department of Philosophy

  3. Who am I? NYS Center of Excellence in Bioinformatics and Life Sciences Ontology Research Group Buffalo Clinical and Translational Science Institute (CTSI) Duke/Dallas/Houston CTSA Ontology Consortium

  4. Who am I? Cleveland Clinic Semantic Database Gene Ontology Ontology for Biomedical Investigations Open Biomedical Ontologies Consortium Institute for Formal Ontology and Medical Information Science BIRN Ontology Task Force ...

  5. Multiple kinds of data in multiple kinds of silos Lab / pathology data Electronic Health Record data Clinical trial data Patient histories Medical imaging Microarray data Protein chip data Flow cytometry Mass spec Genotype / SNP data

  6. How to find your data? How to find other people’s data? How to reason with data when you find it? How to work out what data does not yet exist?

  7. Multiple kinds of standardization for data • Terminologies (SNOMED, UMLS) • CDEs (Clinical research) • Information Exchange Standards (HL7 RIM) • LIMS (LOINC) • MGED standards for microarray data, etc.

  8. how solve the problem of making such data queryable and re-usable by others to address NIH mandates? part of the solution must involve: standardized terminologies and coding schemes

  9. most successful, thus far: UMLS collection of separate terminologies built by trained experts massively useful for information retrieval and information integration UMLS Metathesaurus a system of post hoc mappings between overlapping source vocabularies

  10. for UMLS local usage respected regimentation frowned upon cross-framework consistency not important no concern to establish consistency with basic science different grades of formal rigor, different degrees of completeness, different update policies

  11. caBIG approach: BRIDG (top-down imposition)

  12. for science A new approach where do you find scientifically validated information linking gene products and other entities represented in biochemical databases to semantically meaningful terms pertaining to disease, anatomy, development in different model organisms?

  13. where in the body ? where in the cell ?

  14. where in the body ? where in the cell ? what kind of organism ?

  15. where in the body ? where in the cell ? what kind of organism ? what kind of disease process ?

  16. we need semantic annotation of data = we need ontologies

  17. = natural language labels designed for use in annotations to make the data cognitively accessible to human beings and algorithmically tractable to computers

  18. compare: legends for maps compare: legends for maps

  19. compare: legends for maps common legends allow (cross-border) integration

  20. ontologies are legends for data

  21. ontologies = high quality controlled structured vocabularies for the annotation (description) of data

  22. compare: legends for diagrams

  23. or chemistry diagrams legends for chemistry diagrams Prasanna,et al. Chemical Compound Navigator: A Web-Based Chem-BLAST, Chemical Taxonomy-Based Search Engine for Browsing Compounds PROTEINS: Structure, Function, and Bioinformatics 63:907–917 (2006)

  24. Ramirez et al. Linking of Digital Images to Phylogenetic Data Matrices Using a Morphological Ontology Syst. Biol. 56(2):283–294, 2007

  25. computationally tractable legends help integrate complex representations of reality help human beings find things in complex representations of reality help computers reason with complex representations of reality

  26. The Gene Ontology

  27. what cellular component? what molecular function? what biological process?

  28. The Idea of Common Controlled Vocabularies GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem

  29. The Network Effects of Synchronization GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem

  30. Five bangs for your GO buck Five bangs for your GO buck • based in biological science • incremental approach (evidence-based evolutionary pathway) • cross-species data comparability (human, mouse, yeast, fly ...) • cross-granularity data integration (molecule, cell, organ, organism) • cumulation of scientific knowledge in algorithmically tractable form, links people to software

  31. The methodology of annotations Model organism databases employ scientific curators who use the experimental observations reported in the biomedical literature to associate GO terms with entries in gene product and other molecular biology databases ($4 mill. p.a. NIH funding)

  32. what cellular component? what molecular function? what biological process?

  33. How to extend the GO methodology to other domains of clinical and translational medicine?

  34. the problem existing clinical vocabularies are of variable quality and low mutual consistency current proliferation of tiny ontologies by different groups with urgent annotation needs

  35. http://ontologist.com

  36. the solution establish common rules governing best practices for creating ontologies in coordinated fashion, with an evidence-based pathway to incremental improvement

  37. First step (2003) a shared portal for (so far) 58 ontologies (low regimentation) http://obo.sourceforge.net NCBO BioPortal

  38. OBO now the principal entry point for creation of web-accessible biomedical data OBO and OBOEdit low-tech to encourage users Simple (web-service-based) tools created to support the work of biologists in creating annotations (data entry) OBO  OWL DL converters make OBO Foundry annotated data immediately accessible to Semantic Web data integration projects

  39. Second step (2004):reform efforts initiated, e.g. linking GO formally to other ontologies and data sources id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 GO + Cell type = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition

  40. Third step (2006) The OBO Foundryhttp://obofoundry.org/

  41. Building out from the original GO

  42. RELATION TO TIME GRANULARITY initial OBO Foundry coverage

  43. CRITERIA • The ontology isopenand available to be used by all. • The ontology is in, or can be instantiated in, a common formal language. • The developers of the ontology agree in advance to collaboratewith developers of other OBO Foundry ontology where domains overlap. CRITERIA

  44. UPDATE: The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. • ORTHOGONALITY: They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary. CRITERIA

More Related