1 / 74

Ontology (Science) vs. Ontology (Engineering)

Ontology (Science) vs. Ontology (Engineering). Barry Smith University at Buffalo http://ontology.buffalo.edu/smith. Working in ontology since 1975 Working with biomedical ontologists since 2002 Gene Ontology Protein Ontology Infectious Disease Ontology

lequoia
Download Presentation

Ontology (Science) vs. Ontology (Engineering)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology (Science)vs.Ontology (Engineering) Barry Smith University at Buffalo http://ontology.buffalo.edu/smith

  2. Working in ontology since 1975 • Working with biomedical ontologists since 2002 • Gene Ontology • Protein Ontology • Infectious Disease Ontology • OBO (Open Biomedical Ontologies) Foundry

  3. NCBO • National Center for Biomedical Ontology • Dissemination and Ontology Best Practices • http://bioontology.org

  4. ICBO • International Conference on Biomedical Ontology • Buffalo, NY. July 24-26, 2009 • http://icbo.buffalo.edu

  5. Example ontologies Basic Formal Ontology (BFO) Common Anatomy Reference Ontology (CARO) Environment Ontology (EnvO) Foundational Model of Anatomy (FMA) Infectious Disease Ontology (IDO) Ontology for Biomedical Investigations (OBI) Ontology for Clinical Investigations (OCI) Phenotypic Quality Ontology (PATO) Relation Ontology (RO)

  6. Multiple kinds of data in multiple kinds of silos Lab / pathology data Electronic Health Record data Clinical trial data Patient histories Medical imaging Microarray data Protein chip data Flow cytometry

  7. How to find your data? How to find other people’s data? How to reason with data when you find it? How to understand the significance of the data you collected 3 years earlier? To solve the silo problem medical researchers need the help of ontology engineers

  8. Ontologies facilitate retrieval of data by allowing grouping of annotations brain 20 hindbrain 15 rhombomere 10 Query brain without ontology 20 Query brain with ontology 45

  9. Uses of ‘ontology’ in PubMed abstracts

  10. biologists need help from ontology engineers

  11. How to do biology across the genome? MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV

  12. MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGEMKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE

  13. Gene Ontology: three types of questions what cellular component? what molecular function? what biological process?

  14. Clark et al., 2005 is_a part_of

  15. and through curation of literature what cellular component? what molecular function? what biological process?

  16. The Idea of Common Controlled Vocabularies GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem

  17. The Idea of Common Controlled Vocabularies GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem

  18. Gene Ontologyca. 25,000 nodes male courtship behavior, orientation prior to leg tapping and wing vibration

  19. Benefits of GO • rooted in basic experimental biology • links people to data and to literature • links data to data • across species (human, mouse, yeast, fly ...) • across granularities (molecule, cell, organ, organism, population)

  20. Benefits of GO • links medicine to biological science • allows cumulation of scientific knowledge in algorithmically tractable form LET’S GENERALIZE THESE BENEFITS TO OTHER AREAS OF BIOLOGY AND MEDICINE …

  21. The standard engineering methodology • Pragmatics (‘usefulness’) is everything • Usefulness = we get to write software which runs on our machines

  22. The standard engineering methodology • It is easier to write useful software if one works with a simplified model • (“…we can’t know what reality is like in any case; we only have our concepts…”) • This looks like a useful model to me • (One week goes by:) This other thing looks like a useful model to him • Data in Pittsburgh does not interoperate with data in Vancouver

  23. The standard engineering methodology Pragmatics (‘usefulness’) is everything  Science is siloed

  24. Why build scientific ontologies There are many ways to create ontologies Multiple ontologies only make our data silo problems worse We need to constrain ontologies so that they converge

  25. Science-based ontology development Q: What is to serve as constraint in order to avoid silo creation ? A: Reality, as revealed, incrementally, by experimentally-based science

  26. Ontological realism • Find out what the world is like by doing science • Ontology is ineluctably a multi-disciplinary enterprise – it cannot be left to the engineers • Build representations adequate to this world, not to some simplified model in your laptop

  27. In the olden days • people measured lengths using inches, ulnas, perches, king’s feet, Swiss feet, leagues of Portugal, varas of Texas, etc., etc.

  28. on June 22, 1799, in Paris,everything changed

  29. we now have the International System of Units

  30. The SI is a Controlled Vocabulary • Each SI unit is represented by a symbol, not an abbreviation. The use of unit symbols is regulated by precise rules. • The symbols are designed to be the same in every language. • Use of the SI system makes scientific results comparable

  31. The SI is an Ontology • Quantities are universals • one each for each measurable dimension of reality • Can we provide an analogue of the SI system for (basic dimensions of) biology?

  32. First step • OBO (Open Biomedical Ontologies) library • comprehends some 70 ontologies • now made available also on the NCBO Bioportal • the majority of these ontologies are built to work well with the Gene Ontology

  33. Goal of the OBO Foundry all biomedical research data should cumulate to form a single, algorithmically processable, whole Smith, et al. Nature Biotechnology, Nov 2007

  34. Goal of the OBO Foundry • to provide a suite of controlled structured vocabularies for the callibrated annotation of data to support integration and algorithmic reasoning across the entire domain of biomedicine • as biomedical knowledge grows, these ontologies must be evolved in tandem

  35. The Gene Ontology within the OBO Foundry

  36. The ontology isopenand available to be used by all. The ontology is instantiated in, a common formal language and shares a common formal architecture The developers of the ontology agree in advance to collaboratewith developers of other OBO Foundry ontology where domains overlap. CRITERIA OBO FOUNDRY CRITERIA

  37. The developers of each ontology commit to its maintenance in light of scientific advance, and to soliciting community feedback for its improvement. • They commit to working with other Foundry members to ensure that, for any particular domain, there is community convergence on a single controlled vocabulary. • http://obofoundry.org OBO FOUNDRY CRITERIA

  38. Orthogonality = modularity • one ontology for each domain • no need for semantic matching • no need for ontology integration • no need for mappings (which are in any case too expensive, too fragile, very difficult to keep up-to-date as mapped ontologies change)

  39. Orthogonality • is our best (perhaps our only) hope of solving the data silo problem • Why do computer engineers hate orthogonality (and like ‘relativism’ – every project its own, new ontology) – so much?

  40. All OBO Foundry ontologies work in the same way • we have data (biosample, haplotype, clinical data, survey data, ...) • we need to make this data available for not just string-based search and algorithmic processing • we create a consensus-based ontology for annotating the data

  41. We have data BioHealthBase:Tuberculosis Database, VFDB: Virulence Factor DB TropNetEurop:Dengue Case Data BioHealthBase: Influenza Database PathPort: Pathogen Portal Project IMBB: Malaria Data

  42. We need to annotate this data to allow retrieval and integration of • sequence and protein data for pathogens • case report data for patients • clinical trial data for drugs, vaccines • epidemiological data for surveillance, prevention • ... Goal: to make data deriving from different sources comparable and computable

  43. We need common controlled vocabularies to describe these data in ways that will assure comparability and cumulation What content is needed to adequately cover the infectious domain? • Host-related terms (e.g. carrier, susceptibility) • Pathogen-related terms (e.g. virulence) • Vector-related terms (e.g. reservoir) • Terms for the biology of disease pathogenesis (e.g. evasion of host defense) • Population-level terms (e.g. epidemic, endemic, pandemic)

  44. IDO provides a common template It contains terms (like ‘pathogen’, ‘vector’, ‘host’) which apply to organisms of all species involved in infectious disease and its transmission Disease- and organism-specific ontologies are then built as refinements of the IDO core – the common core guarantees some level of comparability of data

  45. IDO Processes

  46. IDO Qualities

  47. IDO Roles

More Related