1 / 45

@ Interontology08 , February 27, 2008

@ Interontology08 , February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal Scientist. Weather conditions. Open source ethic is mainstream Beginnings of a viable Semantic Web

hao
Download Presentation

@ Interontology08 , February 27, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. @Interontology08, February 27, 2008 The Semantic Web for Scientific Research: A ‘perfect storm’ for the development of Ontology Alan Ruttenberg Principal Scientist

  2. Weather conditions • Open source ethic is mainstream • Beginnings of a viable Semantic Web • Funders: products of public science not optimally used • Burgeoning quality-focused developer community

  3. Beginnings of a viable Semantic Web • Initial standardizations • OWL1.0 (OWL 1.1 WG in progress) • SPARQL • Viable tools • Scalable triple stores e.g. Virtuoso, Oracle… • Reasoners: Pellet, Fact++, CEL, QuOnto…

  4. Funders: Products of public science not optimally used • Both government and philanthropies • Data sharing mandates • Open access publication mandates • Recognition that Ontology can play key role (and funding) • Wonderweb, NCBO, JCOR, (more in Europe, beginnings in Australia, China) • E.g. NIH Ontology grants

  5. Burgeoning quality-focused developer community • W3C Semantic Web for Life Sciences Interest Group • Brings together scientists, medical researchers, science writers and informaticians from academia, government, non-profit organizations - health care, pharmaceuticals and industry vendors • Chartering of second phase in progress • OBO Foundry • Principle-based development of science-based ontologies with the goal of creating a suite of interoperable reference ontologies for biomedicine. • Process and governance are being refined • Groups are lining up to join

  6. Some projects I’m involved in • The challenge of data integration at Web scales • The Neurocommons • Collaborative Ontology Development • OBI – The Ontology for Biomedical Investigations • Identifying and working through aspects of Ontology • Working with, and on, the Basic Formal Ontology • What is a Gene Ontology Annotation?

  7. The Neurocommons Publications CCDB SAO NeuroMorpho OBO Ontologies Neuronbank PDSPki Gene ontology annotations NeuronDB Reactome AddGene Plasmids Coriell cells BAMS Allen Brain Atlas BrainPharm Antibodies Entrez Gene MESH Neurocommons text mining PubChem Mammalian Phenotype SWAN AlzGene Homologene

  8. What’s a (Science) Commons? • Built on open resources: public domain, open databases, open literature • Encoded in open architectures and technical standards

  9. Science Commons • Science Commons is a project of Creative Commons • Creative Commons provides free tools that let authors, scientists, artists, and educators easily mark their creative work with the freedoms they want it to carry • 140,000,000 objects on the Web under CC licenses in 40+ countries • 700+ peer-reviewed journals carry CC licensing, including Public Library of Science • Science Commons specializes CC to science • For consumers of knowledge: make it easy to use and re-use information and increase chances for discovery • For providers of knowledge: provide legal certainty and automated attribution and tracking • For funders: provide new metrics for tracking return on investment based on re-use

  10. Neurocomons approach • From OBO Foundry: Carefully model biology to enable integration of data sources. “Audit trail to reality” • From Web: Assign all biological entities URIs (lots already provided by OBO) and translate to OWL/RDF • From OWL: Add triples inferred by reasoner to increase expressiveness of queries with even simple query engine • From software engineering: Provide data via SPARQL first (API). Build tools on top of that. • From open source movement: Make it freely available, reproducible

  11. The Gene Ontology The gene ontology names many biological processes and tells us which genes are known to be involved in those processes.

  12. The Gene Ontology (a small portion) Biological Process is_a part_of Activation of innate immune response Cell surface pattern recognition receptor signaling pathway

  13. A simple query:Biological processes in dendrites? Alzheimer’s disease is characterized by neural degeneration. Among other things, there is damage to dendrites and axons, parts of nerve cells. What resources do we have available to learn more about biological processes in dendrites?

  14. Biological processes naming dendrites PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX go: <http://purl.org/obo/owl/GO#> PREFIX obo: <http://www.geneontology.org/formats/oboInOwl#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> select ?name?class?definition from <http://purl.org/commons/hcls/20070416> where { graph <http://purl.org/commons/hcls/20070416/classrelations> {?class rdfs:subClassOf go:GO_0008150} ?class rdfs:label ?name. ?class obo:hasDefinition ?def. ?def rdfs:label ?definition filter(regex(?name,"[Dd]endrite")) } URI for Biological Process (OBO Foundry principles guarantee unique names for each Universal)

  15. From the “console”

  16. But answers are also available by a “GET” • /sparql/?query=PREFIX%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E%0APREFIX%20go%3A%20%3Chttp%3A%2F%2Fpurl.org%2Fobo%2Fowl%2FGO%23%3E%0APREFIX%20obo%3A%20%3Chttp%3A%2F%2Fwww.geneontology.org%2Fformats%2FoboInOwl%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0A%0Aselect%20%20%3Fname%20%20%3Fclass%20%3Fdefinition%0Afrom%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2F20070416%3E%0Awhere%0A%7B%20%20%20graph%20%3Chttp%3A%2F%2Fpurl.org%2Fcommons%2Fhcls%2F20070416%2Fclassrelations%3E%0A%20%20%20%20%20%7B%3Fclass%20rdfs%3AsubClassOf%20go%3AGO_0008150%7D%0A%20%20%20%20%3Fclass%20rdfs%3Alabel%20%3Fname.%0A%20%20%20%20%3Fclass%20obo%3AhasDefinition%20%3Fdef.%0A%20%20%20%20%3Fdef%20rdfs%3Alabel%20%3Fdefinition%20%0A%20%20%20%20filter(regex(%3Fname%2C%22%5BDd%5Dendrite%22))%0A%7D%0A&format=&maxrows=50 So someone, somewhere else, can build something better *Note: Different query than previous slide

  17. Three levels of representing scientific knowledge • Record level: Represent database records. Inconsistent if two sources disagree about contents of a field. • Statement level: Represent what researchers say. Inconsistent if two people disagree about what a paper said • Domain level: OBO Foundry approach. Represent your best understanding of consensus. Inconsistent if facts contradict. • We need all three (but make clear which is which) • Next slide query is hybrid of Record/Domain

  18. A SPARQL query for processes involved in pyramidal neurons prefix go: <http://purl.org/obo/owl/GO#> prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> prefix owl: <http://www.w3.org/2002/07/owl#> prefix mesh: <http://purl.org/commons/record/mesh/> prefix sc: <http://purl.org/science/owl/sciencecommons/> prefix ro: <http://www.obofoundry.org/ro/ro.owl#> select ?genename ?processname where { graph <http://purl.org/commons/hcls/pubmesh> { ?paper ?p mesh:D017966 . ?article sc:identified_by_pmid ?paper. ?gene sc:describes_gene_or_gene_product_mentioned_by ?article. } graph <http://purl.org/commons/hcls/goa> { ?protein rdfs:subClassOf ?res. ?res owl:onProperty ro:has_function. ?res owl:someValuesFrom ?res2. ?res2 owl:onProperty ro:realized_as. ?res2 owl:someValuesFrom ?process. graph <http://purl.org/commons/hcls/20070416/classrelations> {{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166} union {?process rdfs:subClassOf go:GO_0007166 }} ?protein rdfs:subClassOf ?parent. ?parent owl:equivalentClass ?res3. ?res3 owl:hasValue ?gene. } graph <http://purl.org/commons/hcls/gene> { ?gene rdfs:label ?genename } graph <http://purl.org/commons/hcls/20070416> { ?process rdfs:label ?processname} } Mesh: Pyramidal Neurons Pubmed: Journal Articles Entrez Gene: Genes GO: Signal Transduction Inference required

  19. Google: 223,000 results

  20. Results Many of the genes are indeed related to Alzheimer’s Disease through gamma secretase (presenilin) activity DRD1, 1812 adenylate cyclase activation ADRB2, 154 adenylate cyclase activation ADRB2, 154 arrestin mediated desensitization of G-protein coupled receptor protein signaling pathway DRD1IP, 50632 dopamine receptor signaling pathway DRD1, 1812 dopamine receptor, adenylate cyclase activating pathway DRD2, 1813 dopamine receptor, adenylate cyclase inhibiting pathway GRM7, 2917 G-protein coupled receptor protein signaling pathway GNG3, 2785 G-protein coupled receptor protein signaling pathway GNG12, 55970 G-protein coupled receptor protein signaling pathway DRD2, 1813 G-protein coupled receptor protein signaling pathway ADRB2, 154 G-protein coupled receptor protein signaling pathway CALM3, 808 G-protein coupled receptor protein signaling pathway HTR2A, 3356 G-protein coupled receptor protein signaling pathway DRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second messenger SSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second messenger MTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide second messenger CNR2, 1269 G-protein signaling, coupled to cyclic nucleotide second messenger HTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second messenger GRIK2, 2898 glutamate signaling pathway GRIN1, 2902 glutamate signaling pathway GRIN2A, 2903 glutamate signaling pathway GRIN2B, 2904 glutamate signaling pathway ADAM10, 102 integrin-mediated signaling pathway GRM7, 2917 negative regulation of adenylate cyclase activity LRP1, 4035 negative regulation of Wnt receptor signaling pathway ADAM10, 102 Notch receptor processing ASCL1, 429 Notch signaling pathway HTR2A, 3356 serotonin receptor signaling pathway ADRB2, 154 transmembrane receptor protein tyrosine kinase activation (dimerization) PTPRG, 5793 transmembrane receptor protein tyrosine kinase signaling pathway EPHA4, 2043 transmembrane receptor protein tyrosine kinase signaling pathway NRTN, 4902 transmembrane receptor protein tyrosine kinase signaling pathway CTNND1, 1500 Wnt receptor signaling pathway

  21. What happens when data is discoverable, queryable, and accessible on the open web? Allen Brain Institute Servers http://hcls1.csail.mit.edu/map/#Kcnip3@2850,Kcnd1@2800 Javascript http://www.brainmap.org://….0205032816_B.aff/TileGroup3/1-0-1.jpg SPARQLAJAX URL Query GoogleMapsAPI Neurocommons Servers

  22. Others can “view source”, use our code in their own applications

  23. Background Technology So far about 350M triples in Openlink Virtuoso (~20Gb) Commodity Hardware: 2x2core duo/2 disks/8G Ram Biggest so far is MeSH associations to articles (200M triples) Smaller, from 10K to 10M triples/source A small fraction of biological knowledge (another element of the perfect storm is that computer hardware is so cheap and powerful)

  24. Results are success, but process more so • Sample of three interesting cases on the way to the neurocommons • Integration of Senselab • Finding and addressing inconsistency • Modeling Gene Ontology Annotations

  25. Process(1): NeuronDB • Started with homegrown ontology. Problem: How to link with anything else • Eg. No links to evidence, “receptors” versus proteins with receptor activity (like GOA) • Process, iterate many times, fixing OWL, GO understanding/conformance, augmenting what is in ontology. • Ends with something that links with GO Function. Accepted process for how to move both NeuronDB and GO forward. • Next slides – in detail how the discussion/teaching goes

  26. Words mix up functions and objects Ligand Neurotransmitter Hormone Peptide Looking for peptides?

  27. Foundry approach connects words to their corresponding entities in reality PeptideReceptorLigand - A peptide that has a function which makes it able to bind to a receptor PeptideNeurotransmitter - A peptide expressed in a neuron that has a function which makes it able to regulate another neuron PeptideHormone - A peptide that produced in one organ and having an regulatory effect in another. Peptide- A “short” polymer of amino acids Looking for peptides?

  28. Peptides from CHEBIChemical Entities of Biological Interest

  29. Hormone Activity from GO Molecular Function

  30. Towards RDF/OWL(1) ALL instances of PeptideHormone are an instance of Peptide that has_roleSOME instance of HormoneActivity

  31. Towards RDF/OWL(3) ALL instances of PeptideHormone are an instance of Peptide that has_roleSOME instance of HormoneActivity

  32. Towards RDF/OWL(3) - Instances

  33. Towards RDF/OWL(4) URIs chebi:25905 = <http://purl.org/obo/owl/CHEBI#CHEBI_25905>

  34. Towards OWL(5) : triples chebi:25905 rdfs:subClassOf chebi:16670. chebi:25905 rdfs:subClassOf _:1. :_1 owl:onProperty ro:hasRole. :_1 owl:someValuesFrom go:GO_00179. …

  35. SPARQLing: Put ?variables where you are looking for matches chebi:25905 rdfs:subClassOf chebi:16670. chebi:25905 rdfs:subClassOf _:1. :_1 owl:onProperty ro:hasRole. :_1 owl:someValuesFrom go:GO_00179. select ?moleculeClass where { ?moleculeClass rdfs:subClassOf chebi:16670. ?moleculeClass rdfs:subClassOf ?res. ?res owl:onProperty ro:hasRole. ?res owl:someValuesFrom go:GO_00179. } ?moleculeClass = chebi:25905

  36. Process(2): Inconsistency! • Once Neurondb is coded properly, and an OWL reasoner is run, it declares the ontology inconsistent • Problem: There are contradictory assertions about whether a particular ionic current occurs in a particular cell type. • What to do? “Three levels of representing scientific knowledge” tell us how inconsistency arises in each • Inconsistency is NOT acceptable, but might this be an issue of confusion over desired level?

  37. The dispute: Ionic current? Yes or No One investigation Another investigation Illustration – not the particular cell/current

  38. Resolving the inconsistency • If at the statement level, there need be no inconsistency if the assertions are qualified as being statements of someone. Choice 1: Rework representation to make this so • If at the domain level, then only one can be right. Choice 2) As curator make judgement about which is right, or, see if information missing in the representation that would have this not be a contradiction. • Resolution: Domain level is desired. Closer examination of papers find results from different species. • Example of “ontological commitment” and dealing with consequences.

  39. Process(3): What is a GO Annotation

  40. Problems with integrating annotations with other knowledge • What are the entities? • What are the relationships between the process and the entities. • How can we make All-Some statements involving annotations?

  41. A closer look Ask me about evidence?

  42. Semantic Web technology and ontology in the service of science Let our tools help us find mistakes (and other insights) by having representation that is good enough to be wrong. Expressed formally, and in conjunction with a reasoner, we might find that it can't possibly be there are instances of this class (unsatisfiable)

  43. Public science: What we’d like to do better • Broader knowledge base - cells, anatomy, physiology, behavior, protocols, reagents • Beyond simple interaction: More precise representations of mechanism to be able to query and exploit computationally • Built in a open, scalable, scientifically credible way, to encourage sustained contribution, and to take advantage of “web effects”

  44. How do we get there? • Interoperation is paramount, but modeling is hard: Work with the OBO Foundry • Build a skilled community • Use (open!) Semantic Web Technologies to enable web effects • Support and nurture a growing and vigorous community (SWAN, BIRN, OBI) all of whom build on the rest and enable others to build more • Work to advance key technologies and infrastructure - text mining, structured abstracts, query, reasoning. • Recruit more ontologists! (That’s you)

More Related