1 / 27

Towards ontology driven navigation of the lipid bibliosphere

Towards ontology driven navigation of the lipid bibliosphere. Chistopher J. O.Baker, Rajaraman Kanagasabai, Wee Tiong Ang, Anitha Veeramani, Hong-Sang Low , and Markus R. Wenk International Conference on Bioinformatics 2007 (InCoB 2007) 27-31 August 2007. Motivation.

hani
Download Presentation

Towards ontology driven navigation of the lipid bibliosphere

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards ontology driven navigation of the lipid bibliosphere Chistopher J. O.Baker, Rajaraman Kanagasabai, Wee Tiong Ang, Anitha Veeramani, Hong-Sang Low, and Markus R. Wenk International Conference on Bioinformatics 2007 (InCoB 2007) 27-31 August 2007

  2. Motivation • Lipid research in 21st century is in need of reliable & sensible integration of data from different sources. • Lipid nomenclature in biomedical literature is highly heterogeneous. • Semantic data integration is necessary for lipid research yet this is poorly achievable due to an absence of a single unified, consistent, and universally accepted lipid classification system.

  3. Objective • Develop a system that can facilitate the navigation of the lipid bibliosphere using a standardized lipid vocabulary with precise semantics. • To make use of the expressivity of a w3c endorsed standard, the web ontology language (OWL) for representing lipid nomenclature & hierarchy.

  4. Lipids have many properties and biologically related information that needs to be systematically captured in a domain model. Capture knowledge: The meaning of important vocabulary (classes, properties/relations and instance data in a domain model). Lipids have no universally accepted nomenclature. Provides a common terminology for a domain. Lipid nomenclature is not always intuitive. Make the content in information sources explicit. Semantics of lipid terminology can be ambiguous, synonym rich, non standard. Provides an index and query model to a repository of information. Integration of lipid data is hampered by a lack of unified classification system and presence of multiple data formats. Provides a basis for interoperability between information systems. Lipids Ontologies

  5. Lipid Ontology

  6. Lipid Upper Ontology • Implemented in OWL-DL language • Uses LIPIDMAPS systematic lipid nomenclature • 560 named classes • 352 lipid subclasses • 71 Object properties • 4 Data properties • Lipid instance: LIPIDMAPS systematic name • Depth: 8 levels

  7. Modeling lipid information • Multiple features of lipids are modeled in the Lipid_Specification concepts and are directly related to the lipid classification hierarchy found under the Lipid concept

  8. Linking lipids with other biological information Lipid-Protein • Modeled with Protein concept • Protein instance: Protein name from SWISPROT • Lipid concept is linked to the Protein concept via the InteractsWith_Protein property • Lipid-Disease • Modeled with Disease concept • Disease instance: Disease name from Disease Ontology • Lipid concept is linked to the Disease concept via the hasRole_In_Disease property

  9. A LIPID has many names • Phosphatidylcholine is an important component of the mucus layer in the large intestine. • The distribution of these pores was examined using 1,2-di-oleoyl-sn-glycero-3-phosphocholine (DOPC) phospholipid vesicles under a standard fluorescent microscope. • Lecithin is usually used as a synonym for pure phosphatidylcholine, which is the major component isolated from egg yolk or soy beans. 2-[[(2R)-2,3-di(octadecanoyloxy)propoxy]-hydroxyphosphoryl]oxyethyl-trimethylazanium

  10. Modelling Synonyms • Instances of names are connected via the properties • hasIUPAC_Synonym • hasLIPIPMAPS_Synonym • hasBroad_Lipid_Synonyn • hasExact_Lipid_Synonym • 4 types of name • LIPIDMAPS systematic name • IUPAC systematic name • Broad lipid name(non-systematic) • Exact lipid name(non-systematic)

  11. Literature Specification

  12. Literature-driven, ontology-centric …. • Content Delivery Platform -Automated • Document delivery from Pubmed-PDF / USPTO-HTML • Tools for conversion of docs to text-minable text • Text Mining -Customized and Automated • Regular Expressions, Named Entities, Relations, Co-reference • Knowledge EngineeringOntology Creation • Domain Modeling / Customized / Rapid Prototype • Knowledge Navigation / Ontology Interrogation ToolsInteractive • Visual Query, Natural Language Interfaces • Service platform for knowledge-intensive lipid navigation tasks

  13. Lipid Ontology as a knowledge integration vehicle • Major Knowledge Sources • Lipid Ontology • NLP tagged text • Database content • OWL interrogation • DL reasoning & inference • nRQL (new RACER Query Language) • Semantic query tools Knowledge navigation:

  14. 1 Document Content 2 Sentence Extraction Term List DB’s: Lipid names, LIPIDMAPS, Lipid Bank, KEGG classifications, Disease names, Protein names Stemmed Interactions 3 Sentence Detection: lipidinteractionprotein 4 Entity Recognition: term identification / assign lipid class 5 Normalization: collapse lipid synonyms 6 Relation Extraction: Lipid-Protein or Lipid Disease Document and sentence meta data "TLR4 binds to POPC", tagged as "<term category="protein"> TLR4</term> binds to <term category="lipid">POPC</term>" Complete Instantiated OWL-DL Ontology 7 Classification: Identify ontology classes and specify relations for all sentences, proteins, lipid subclasses. 8 Populate OWL ontology (JENA API) Ontology and Text Mining

  15. Indexed Lipid Sentences Lipid Class Lipid Instance Lipid Instance

  16. 2 sec/Doc User input query “lipid interact* protein” 110 full text papers Pubmed NLP tagging 87 docs tagged with relevant name entities 123 lipids, 361 proteins, 920 lipid-protein interactions Output for end user “Instantiated ontology” User Knowledge Navigation vehicle Ontology instantiation Knowledge integration pipeline • Specification • Content Acquisition pipeline: • Automated Pubmed query • Text format converter

  17. 2 sec/Doc User input query “lipid interact* protein” 110 full text papers Pubmed NLP tagging 87 docs tagged with relevant name entities 123 lipids, 361 proteins, 920 lipid-protein interactions Output for end user “Instantiated ontology” User Knowledge Navigation vehicle Ontology instantiation Knowledge integration pipeline • Specification • Text-mining & NLP: • BioText Suite for tokenization, • part of speech tagging, named entity • recognition, grounding, • association mining

  18. 2 sec/Doc User input query “lipid interact* protein” 110 full text papers Pubmed NLP tagging 87 docs tagged with relevant name entities 123 lipids, 361 proteins, 920 lipid-protein interactions Output for end user “Instantiated ontology” User Knowledge Navigation vehicle Ontology instantiation Knowledge integration pipeline • Specification • Ontology Instantiation pipeline: • custom script based on JENA API

  19. 2 sec/Doc User input query “lipid interact* protein” 110 full text papers Pubmed NLP tagging 87 docs tagged with relevant name entities 123 lipids, 361 proteins, 920 lipid-protein interactions Output for end user “Instantiated ontology” User Knowledge Navigation vehicle Ontology instantiation Knowledge integration pipeline • Specification • Knowledge Navigation platform: • Knowledge navigator or Knowlegator • RACER • nRQL

  20. OWL-DL Query with nRQL • nRQL queries are built on a Lisp syntax • Elementary query atoms, combinable into highly • expressive but syntactically complex A-box queries to • derive assertions about instance data (individuals). • Unary concept query (Instance Classification and retrieval) • Does this instance belong to this class? • What are instances of class X • To which classes does instance X belong ? • Binary role query • What instances are related by relation X • Binary role constraint query • Unary has known successor (Ancestor / Descendant) • Negation • Intersect / Conjunction • Union / Disjunction • Combinations (And / Union) Haarslev V., Moeller R., Wessel M., Querying the Semantic Web with Racer + nRQLIn Sean Bechhofer, Volker Haarslev, Carsten Lutz, Ralf Moeller (Eds) CEUR workshop proceedings of KI-2004 Workshop on Applications of Description Logics (ADL 04), Ulm, Germany, Sep 24 2004 The New Racer Query Language www.cs.concordia.ca/~haarslev/racer/racer-queries.pdf

  21. Knowledge Navigation Tool Query Composition Panel Results Panel Ontology Content Query Syntax Query Engine Dialogue Concept Properties Overview

  22. Lipid Ontology as a Query Model

  23. Query: Find documents containing sentences where lipids interact with proteins and the lipids are related to a disease.

  24. Summary • We build a lipid ontology in the Web Ontology Language (OWL) to represent the LIPIDMAPS classification hierarchy. • The ontology model resolves nomenclature inconsistencies by grounding lipid synonyms to a individual lipid names. • We report a document delivery system that in conjunction with a lipid specific text mining platform instantiates lipid sentences into the lipid ontology. • We facilitate navigation of lipid literature using a drag ‘n’ drop visual query composer which poses description logic queries to the OWL-DL ontology. • Lipid – disease and Lipid - protein statements in the lipid literature can be readily queried and made easily available to lipid researchers.

  25. Acknowledgement • A*STAR – Agency for Science and Technology, Singapore Government. • National University of Singapore, Graduate Student Travel Grant.

More Related