10 likes | 101 Views
Vision. “To create a single knowledge resource portal for the clinical and translational research community that would provide a ‘front door’ for a variety of resources.”. The Idea.
E N D
Vision • “To create a single knowledge resource portal for the clinical and translational research community that would provide a ‘front door’ for a variety of resources.” The Idea • There is a need for an alternative, engaging system that will assist in finding and leveraging opportunities for collaboration on a broader scale. • In the context of clinical and translational research, the ability to manage and reason upon complex and large-scale data sets is of particular importance, and remains an area of open research. • The goal of ResearchIQ is to empower non-technical domain experts to reason upon and pose questions related to such heterogeneous data sets. Design Phases The initial design and evaluation of Research-IQ was conducted in three phases [Fig 1]. ResearchIQ An ontology-anchored integrative query tool SatyajeetRaje, AnkushSrivastava, OmkarLele, Tara Payne http://www.ceti.cse.ohio-state.edu; http://citih.osumc.edu Introduction Technology: Semantic Web and Ontology • Semantic search seeks to improve search accuracy by • understanding searcher intent • the contextual meaning of terms as they appear in the searchable data space • It can be applied on the Web or within a closed system • The goal is to generate more relevant results. Figure 5: The Semantic Web • The Semantic Web is a "web of data“. • It extends the network of hyperlinked human-readable web pages by inserting machine-readable metadata about pages and how they are related to each other. [Fig 5] Figure 6: The Semantic Web stack Figure 7: An example Ontology • RDF - The Resource Description Framework is a family of W3C specifications originally designed as a metadata data model. • OWL - The Web Ontology Language is a family of knowledge representation languages for authoring ontologies. • SPARQL - An RDF query language; SPARQL allows for a query to consist of triple patterns, conjunctions, disjunctions, and optional patterns. • Ontology - An ontology is a standardized representation of knowledge as a set of concepts within a domain, and the relationships between those concepts. • It can be used to reason about the entities within that domain, and may be used to describe the domain. • “formal, explicit specification of a shared conceptualization” [Gruber, 1993] Figure 1: Overview of the three-phase design (Phases 1 and 2) and initial evaluation (Phase 3) process utilized for Research-IQ. System Architecture • The system takes existing data bases (like GenBank and Pubmed) or free text sources (OSU pro web site) as input. • The output is essentially an ontology anchored data store and an interface to query it. • Visualizing the results is equally important as this is a portal to several distinct resources and services. User Interface Figure 2: System Architecture Diagram Annotation • The annotation engine uses MetaMap, a tool provided by National Library of Medicine (NLM). • The “semantic view” of a document is a list of concepts from the different domain ontologies. • The data is then indexed using Lucene. • At the same time it is pushed in a triple store. • A derived ontology is generated for query purposes by inferencing on the acquired data based on the UMLS. Figure 3: Search engine home page Figure 4: Search engine page with data Discussion Significance Challenges • To the best of our knowledge, no such platforms have been developed in the clinical and translational research domain. • The pilot study and early results proved that • Ontologies can be used to implement Semantic Search successfully. • The results were more comprehensive than pure syntactic search. • Speed of annotating and querying • Evaluation of results • Dependence on other tools • Security • New data sources (New annotation pipelines) • Change in standards over time (New versions of the ontology) Figure 3: Annotation Pipeline Querying • The query is in the form of list of concepts. • The initial solution set and their scores are obtained by running the query through Lucene. • Using this set as seeds we propagate the scores through the triple store to find conceptually connected results. • The inferenced ontology is used as a graph to generate these results. Acknowledgements • This application was supported by Award Number UL1RR025755 from the National Center For Research Resources. The content is solely the responsibility of the developers and does not necessarily represent the official views of the National Center For Research Resources or the National Institutes of Health. • A special thanks to Prof. Rajiv Ramnath and Prof. Jay Ramanathan for initiating this project at CETI. Figure 4: Query Pipeline