710 likes | 850 Views
The Relation Ontology. Barry Smith. Concepts, Types and Frames. Concepts, Types and Frames. TLR2:MyD88 complex. has_output. TLR2-MyD88 binding. has_disposition. has_participant. TLR2. LTA binding. has_participant. MyD88. regulated_by. preceded_by. has_lower_level_granularity.
E N D
The Relation Ontology Barry Smith
TLR2:MyD88 complex has_output TLR2-MyD88 binding has_disposition has_participant TLR2 LTA binding has_participant MyD88 regulated_by preceded_by has_lower_level_granularity has_part process TLR2-TLR2 ligand binding has_participant TIR-TIR binding TIR domain TLR-2 signalling pathway
How to do biology across the genome? • MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGEMKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
what cellular component? what molecular function? what biological process?
GO used in curation of literature what cellular component? what molecular function? what biological process?
and in integration of databases GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem
The GO Idea GlyProt MouseEcotope Holliday junction helicase complex DiabetInGene GluChem
The GO Idea GlyProt MouseEcotope sphingolipid transporter activity DiabetInGene GluChem
GO used in reasoning Clark et al., 2005 is_a part_of
GO provides a controlled system of representations for use in annotating data • multi-species • multi-disciplinary • multi-granularity, from molecules to population
Gene products involved in cardiac muscle development in humans
$100 mill. invested in literature curation using GO over 11 million annotations relating gene products described in the UniProt, Ensembl and other databases to terms in the GO
GO allows a new kind of biological research based on analysis and comparison of the massive quantities of annotations linking GO terms to the gene products described in scientific literature and in scientific databases
GO is amazingly successful in overcoming data silo problems • but it covers only • cellular components • molecular functions • biological processes
The OBO Foundry – to extend the GO to enable intelligent integration of gigantic bodies of heterogeneous data across the entire domain of the life sciences, including clinical medicine – to create an evolving, map-like, computable representation of the entire domain of biological and medical reality
The OBO Foundry Initial Candidate Members • GO Gene Ontology • CL Cell Ontology • SO Sequence Ontology • ChEBI Chemical Ontology • PATO Phenotype (Quality) Ontology • FMA Foundational Model of Anatomy • ChEBI Chemical Entities of Biological Interest • CARO Common Anatomy Reference Ontology • PRO Protein Ontology
The OBO Foundry Under development • Disease Ontology • Infectious Disease Ontology • Mammalian Phenotype Ontology • Plant Trait Ontology • Environment Ontology • Ontology for Biomedical Investigations • Behavior Ontology • RNA Ontology • RO Relation Ontology
A success story in top-down information integration • Ontologies configured as extensions of a single upper level ontology (BFO) • Used by 100s of researchers to promote interoperability of experimental data in scores of high-throughput domains of biology and medicine via semantic annotation
The linguistic approach • Bottoms-up, focused on linguistic properties manifested by the contents of a large corpus viewed from a cognitive perspective (mapping/modeling meanings or concepts rather than entities in reality)
Automatic mining of “assocations” from MEDLINE FACTA: Finding Associated Concepts with Text Analysis • What diseases are related to a particular chemical? • What proteins are related to a particular disease? http://text0.mib.man.ac.uk/software/facta/
For the linguistic approach • fiction may be no less important than fact • English has no privileged status (the larger the corpus, the better) • consistency (and thus additivity) of annotations is not important, because cognitive perspectives differ • goal is automatic generation of semantic annotations via pattern- matching
For the scientific approach • factual discourse alone important • English is lingua franca • regimentation is allowed • goal of truth: to create a single computer-processable map of reality via painstaking Handarbeit • truth is one we strive for consistency of annotations
The linguistic approach is concerned with knowledge representation • The scientific approach is concerned with reality representation
Relation Ontology • supports consistent linkage of OBO Foundry ontologies through a common system of formally defined relations • to enable reasoning both within and across ontologies, and thus also within and between the literature annotated in its terms
Relation Ontology • instance_of • is_a (= is a subtype of) • depends_on • part_of • inheres_in • has_input • has_participant • …. • http://obofoundry.org/ro/
Basic Formal Ontology (BFO) Continuant Occurrent (Process, Event) Independent Continuant Dependent Continuant http://ifomis.uni-saarland.de/bfo/
Fundamental Dichotomy • Continuants preserve their identity through change • Occurrents (aka processes) • have temporal parts • unfold themselves in successive phases • exist only in their phases • have all their parts of necessity
instance_of types Continuant Occurrent process, event Independent Continuant thing Dependent Continuant quality .... ..... ....... instances
types vs. instances • compare OWL: T-box vs. A-box • (terminology vs. assertions)
3 kinds of (binary) relations • Between types • human is_a mammal • human heart part_ofhuman • Between an instance and a type • this human instance_of the type human • this human allergic_to the type tamiflu • Between instances • Mary’s heart part_of Mary • Mary’s aorta connected_to Mary’s heart
depends_on Continuant Occurrent process, event Independent Continuant thing Dependent Continuant quality quality depends on bearer .... ..... .......
Dependent continuants • the whiteness quality of this cheese • your role as lecturer • the disposition of this peach to ripen
depends_on Continuant Occurrent process Independent Continuant thing Dependent Continuant quality temperature depends on bearer .... ..... .......
depends_on Continuant Occurrent process, event Independent Continuant thing Dependent Continuant quality, … event depends on participant .... ..... .......
Type-level relations presuppose the underlying instance-level relations • A is_a B =def. A and B are types and all instances of A are instances of B • A part_of B =def. All instances of A are instance-level-parts-of some instance of B
The assertions linking terms in ontologies must hold universally Hence all type-level relations in RO are provided with All-Some definitions (For linguists, Some-Some relations are equally important)
Including only All-Some relations means: All relations evaluable as • Transitive • Symmetric • Reflexive • Anti-Symmetric All relations support logical reasoning – as contrasted with: is_related_to, is_associated_with, is_narrower_in_meaning_than …
Reasoning should be able to cascade from one relational assertion (A R1 B) to the next (B R2 C). • Find all DNA binding proteins should also Find all transcription factor proteins because • Transcription factor is_a DNA binding protein Only the All-Some structure guarantees such cascading of relational assertions
Organisms are continuants • they are entities which endure through time through gain and loss of parts • Processes are occurrents • they are entities which unfold through time, and have all their parts as a matter of necessity
human testis part_of adult human being • but not • human being has_part human testis • and not even • male human being has_part human testis
part_of for continuant types • A part_of B =def. • For all x, t if x instance_of A at t then there is some y, y instance_of B at t and x instance_level_part_of y at t • cell membrane part_of cell
part_of for occurrent types • A part_of B =def. • For all x, if x instance_of A then there is some y, y instance_of B and x instance_level_part_of y • EVERY A IS PART OF SOME B