1.06k likes | 1.2k Views
An Introduction to Ontology for Evolutionary Biology. Barry Smith. Who am I?. Stanford Medical Informatics University of San Francisco Medical Center The Mayo Clinic University at Buffalo (PI of Dissemination and Ontology Best Practices).
E N D
An Introduction to Ontology for Evolutionary Biology Barry Smith
Who am I? • Stanford Medical Informatics • University of San Francisco Medical Center • The Mayo Clinic • University at Buffalo (PI of Dissemination and Ontology Best Practices) NCBO: National Center for Biomedical Ontology (NIH Roadmap Center)
NCBO will offer Technology for uploading, browsing, and using biomedical ontologies Methods to make the online “publication” of ontologies more like that of journal articles Tools to enable the biomedical community to put ontologies to work on a daily basis
Who am I? Co-PI Protein Ontology Advisory Boards of Ontology for Biomedical Investigations Cleveland Clinic Semantic Database in Cardiothoracic Surgery Gene Ontology Scientific Advisory Board Advancing Clinico-Genomic Trials on Cancer (ACGT)
W-LOV World’s Longest Ontology Video Introduction to Biomedical Ontologies This 8-lecture course provides a basic introduction to ontology, with special reference to applications in the field of biomedical research. It is designed to be of interest to both philosophers and those with a background in the life sciences. 1. What is an ontology and what is it useful for? 2. Basic Formal Ontology: An upper-level ontology for scientific research 3. Open Biomedical Ontologies (OBO) and the Web Ontology Language (OWL) 4. The OBO Relation Ontology 5. An ontological introduction to biomedicine: Defining organism, function and disease6. The Gene Ontology (GO), the Foundational Model of Anatomy (FMA) and the Infectious Disease Ontology (IDO) 7. The OBO Foundry: A suite of biomedical ontologies to support reasoning and data integration 8. Further applications http://ontology.buffalo.edu/smith/Ontology_Course.html
How to do biology across the genome? MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGEMKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
The GO idea: through annotation of data what cellular component? what molecular function? what biological process?
three types of data what cellular component? what molecular function? what biological process?
The GO Idea GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem
The GO Idea GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem
The GO Idea GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem
Benefits of GO • rooted in experimental biology • links people to data and to literature • links data to data (comparability) • across species (human, mouse, yeast, fly ...) • across granularities (molecule, cell, organ, organism, population) • links medicine to biological science • serves cumulation of scientific knowledge in algorithmically tractable form
How to extend the GO methodology to other areas of the life sciences? OBO (Open Biomedical Ontologies) created 2001 in Ashburner and Lewis a shared portal for (so far) 60 ontologies http://obo.sourceforge.net with a common OBO flatfile format 16
In 2004 reform efforts initiated linking GO to other ontologies and data sources via formal relations id: CL:0000062 name: osteoblast def: "A bone-forming cell which secretes an extracellular matrix. Hydroxyapatite crystals are then deposited into the matrix to form bone." is_a: CL:0000055 relationship: develops_from CL:0000008 relationship: develops_from CL:0000375 GO + Cell type = Osteoblast differentiation: Processes whereby an osteoprogenitor cell or a cranial neural crest cell acquires the specialized features of an osteoblast, a bone-forming cell which secretes extracellular matrix. New Definition
OBO Foundry http://obofoundry.org
Goal: create the ontology resources for evolutionary biology
Administrative/database ontologies • Highly task-dependent – reusability and compatibility not (always) important • Entities may be brought into existence by the ontology itself (convention ...) • If there is no field for gender in our database, then persons do not have gender • Can be secret, local, temporary • Are comparable to software artifacts
Scientific ontologies are comparable to scientific theories • must be open, based on consensus • must be compatible with neighboring scientific ontologies and with results of scientifc research • must be stable, evolve gracefully in tandem with the advance of knowledge • must be evidence-based (testable)
Foundry ontologies are scientific ontologies • Every representational unit in the ontology must be such that the developers believe it to refer to some entity on the basis of the best current scientific evidence • Important role of instances that we can observe in the laboratory
Ontologies are like science texts – they are representations of what is general in reality aka universals, kinds, types, categories, species, genera, ... aka universals, kinds, types, categories, species, genera, ...
A central distinction • universal vs. instance • (catalog vs. inventory) • (science text vs. diary) • (human being vs. Arnold Schwarzenegger)
For scientific ontologies • it is generalizations (universals) that are important • For databases it is (normally) instances that are important • = particulars in reality: • mouse #0000000001 • tail #000000004 • video image #23300014, etc.
Ontologies are representations of what is general in reality aka universals, kinds, types, categories, species, genera, ... instances in reality are linked to universals via the instance_of relation aka universals, kinds, types, categories, species, genera, ...
The distinction between universals and instances allows us to provide clear formal definitions of the relations which connect ontology terms A is_a B =def. A is narrower in meaning than B cancer documentation is_a cancer
The distinction between universals and instances allows us to provide clear logical definitions of the relations which connect ontology terms A is_a B =def. every instance of A is an instance of B
part_of A part_of B =def. every instance of A is an instance-level part of some instance of B Mary’s heart instance-level part of Mary cell nucleus part_of cell
Organ Part Organ Subdivision Anatomical Space Anatomical Structure Organ Cavity Subdivision Organ Cavity Organ Organ Component Serous Sac Tissue Serous Sac Cavity Subdivision Serous Sac Cavity is_a Pleural Sac Pleura(Wall of Sac) Pleural Cavity part_of Parietal Pleura Visceral Pleura Interlobar recess Mediastinal Pleura Mesothelium of Pleura FMA Foundational Model of Anatomy
Kinds of relations <universal, universal>: is_a, part_of, ... <instance, universal>: this cell instance_of the universal cell <instance, instance>: Mary’s heart part_of Mary
Foundry principle for definitions Definitions should be of the following form an A =def. a B which Cs where B is the is_a parent of A and C is some differentia Definitions are rooted in the is_a hierarchy
OBO Relation Ontology 1.0 “Relations in Biomedical Ontologies”, Genome Biology, April 2005
instances derives_from C1 c1att1 C c att time C' c' att ovum zygote derives_from sperm
transformation_of same instance C1 C c at t c at t1 time pre-RNA mature RNAchild adultpupa larva
transformation_of C2 transformation_of C1 =def. any instance of C2 was at some earlier time an instance of C1 fetus transformation_of embryo larva transformation_of pupa adult transformation_of child
C1 C c at t c at t1 embryological development
two continuants fuse to form a new continuant C1 c1att1 C c att C' c' att fusion
one initial continuant is replaced by two successor continuants C1 c1att1 C c att C2 c2att1 fission
one continuant detaches itself from an initial continuant, which itself continues to exist C c att c att1 C1 c1att budding
one continuant is absorbed by a second continuant C1 c1att1 C c att C' c' att capture
New ‘regulates' relations in GO def: "A relation between a process and a process. A regulates B if the unfolding of A affects the frequency, rate or extent of B. A is called the regulating process, B the regulated process“ A regulates B =def. A is a process type and B is a process type and every instance of A is such that its unfolding affects the frequency, rate or extent of some instance of B.
Relations proposed for RO 2.0 inheres_in has_input has_function has_quality realization_of directly_descends_from descends_from (CARO) homologous_to (CARO)
An ontology is a representation of universals • We learn about universals in reality from looking at the results of scientific experiments as expressed in the form of scientific theories – which describe, not what is particular in reality, but what is general
A photographic image is a representation of an instance We learn about instances in reality by performing scientific experiments on the basis of scientific hypotheses and describing the results in general terms provided (ideally) by ontologies
Mature OBO Foundry ontologies • Cell Ontology (CL) • Foundational Model of Anatomy (FMA) • Gene Ontology (GO) • Phenotypic Quality Ontology (PATO) • Relation Ontology (RO) • Sequence Ontology (SO)