270 likes | 288 Views
OntoQuest is a system designed to explore ontology terms, relate them to data sources, update databases, and maintain mapping consistency. It ingests OWL-expressed ontologies, enables ontology exploration through SPARQL queries, and supports data exploration via ontology classes. Developed on IBM's IODT, OntoQuest is user-friendly with a biologist-friendly GUI, APIs, Updater, Query Mediator, Reasoner, and Cache functionalities. It enhances data integration, primarily focusing on subcellular anatomy ontology with data sources such as Derby and MySQL. The system is under development, aiming at inference completeness and performance optimization.
E N D
OntoQuest: Exploring Ontological Data Made Easy Authors: Li Chen, Maryann Martone, Amarnath Gupta, Lisa Fong, Mona Wong-Barnum
Background • Many application domains in the natural sciences are rapidly building ontologies • To attempt to standardize the vocabulary of their domains • To record known relationships that have been established from years of scientific research in the discipline • To use the ontology as the common framework to exchange, assimilate and compare information • Experimental data collected by research groups • Curated data compiled from the literature • To establish relationships with data and ontologies from other domains to achieve interoperability and information integration
The Problem / Requirement • Need a system • To explore the ontology itself • To relate the terms and relationships in an ontology to data sources • To explore multiple data sources as part of the ontology exploration process • To update the databases through the ontology exploration tool • To update the ontology and propagate the effects of the update to the mappings between data sources and the ontology
The Problem / Requirement • Need a system • To explore the ontology itself (OWL) • To relate the terms and relationships in an ontology to data sources (RDBMS, RDF, XML) • To explore multiple data sources as part of the ontology exploration process (instance inference) • To update the databases through the ontology exploration tool(instance Inference triggered by update) • To update the ontology and propagate the effects of the update to the mappings between data sources and the ontology (mapping change triggered by update)
OntoQuest • Ingests any OWL-expressed ontology • Uses IBM’s IODT tool (modified) to shred the OWL ontology to a schema • Instances of ontology classes may reside locally or accessed from remote sources • Provides the ability for ontology exploration • By traversal of any transitive relationship • By SPARQL queries • Allows data exploration through ontology classes • Allows single instance updates
OntoQuest Builds on IODT • Our system is developed on top of an IBM integrated ontology toolkit • implements a high performance ontology repository built on relational database • A subset of W3C’s OWL and SPARQL query language • Uses description logic reasoner for class-level inference and a set of logic rules translated from DLP for instance-level inference • Hence, inference completeness and soundness on DLP can be guaranteed • Back-end database schema design supports efficient querying and inference, performance superior compared to Jena, Sesame etc.
Biologist-Friendly GUI SKIL APIs Updater Query Mediator Reasoner Cache IBM ToolKit SQL SQL . .
System Development Facts • OntoQuest has a domain user friendly GUI and a library of customized APIs • Updater: enable inserting classes and instances incrementally into the ontology repository • Query Mediator: form user’s request as a query against the global view; decompose it into sub-queries in forms of SQL and SPARQL and send to CCDB and CKB; reassemble the results and render an appropriate view (e.g. graphic) for the user • Reasoner: execute rules to compute indirect class memberships and properties • Cache: further enhance the system efficiency by caching or prefetching frequent query results • The system is still under development – some of the functionalities are not completed or need to be improved • e.g., propagation of ontology updates
Data Integration with OntoQuest • For every class, • the ids of the instances of the class are tracked from the respective data stores and maintained locally • a mapping is used to fetch instances of the class from the relevant store to a local instance store on demand • only the properties that are associated with the ontology classes are retrieved in a GAV fashion • all other properties are obtained (for now) only allowing the user to query the data source directly
The Application Setting for this Demo • The Ontology • Developed by the neuroscientists in our group • describes the subcellular anatomy of the nervous system, including cell types and their subcellular properties and multicellular domains • The knowledge base was constructed as a directed graph using the open source tool Protégé (http://protege.stanford.edu), a freely available knowledge management tool written in Java. • The ontology is expressed in OWL-DL • Since OWL-DL supports description logic, inferences are made from the property rules • e.g., protein Kv3.2 is located in the plasma membrane; if an instance of axon terminal expresses Kv3.2, then it must have a plasma membrane. • Data Sources • A Derby data store for literature-curated instances of subcellular anatomy (CKB) • A relational (MySQL) source containing experimental data from CCDB
Property Component Morphometrics Organelle Shape Cytoskeleton Distribution Cilium Orientation Specialization Inclusion Plasma Membrane Cytoplasm Subcellular Ontology Subcellular Space Nerve Cell Multi-cellular Domain Intercellular Junction Extracellular Space Synaptic Cleft Glomerulus Neuropil Node of Ranvier Pinceau Neuron Glia Compartment Dendrite Axon Cell body Spine Microglia Macroglia LEGEND Molecule Compartment subclass has-a Dendritic Spine Shaft Compartment Component Property Morphometrics Post synaptic SER Shape Actin Filament Component Distribution Ribosome PSD Orientation
Step 1: other options for expanding different types of hierarchies e.g., the compartment types for Neuroepithelial_Cell and those for Neuron
Step 2: get the detailed info (instances and properties) of the subclass Dendrite of Neuron_Compartment
Step 2a: accessing the property values for the selected class
Step 2b: the CCDB image page corresponding to the selected instance Dendritic_Tree_1 is shown here
Step 2’: some concept (like Cellular_Dependent_Continuant here) has properties but no instances in CKB
Step 3: right click on a concept in the hierarchy pops up a list of view functions to choose from
Step 4: aggregate the has_Component values of all Dendrite instances; the last row shows statistics summary You may also have noticed that instances of Dendrite include those of its subclasses (such as Dendrite_Tree)
Step 5: drill down to view instances of Dendrite_Tree, aggregate on several numeric type of property values
“Rules” for cellular assembly • What are the cellular components of a dendrite? • 29 instances of dendrite • 1. Microtubules • 2. Mitochondria • 3. Hypolemmal cisternae • 4. Plasma membrane • 5. Smooth endoplasmic reticulum • 6. Rough endoplasmic reticulum • 7. Polyribosomes • 8. Neurofilaments • Average diameter = 3.2 um • Average length = 150 um • How many dendrites does a Purkinje cell have? • 3 instances of Purkinje cell dendritic tree • 1. Avg branch order = 22 • 2. Number of primary dendrites = 1.3 • 3. Avg number of branches = 760 **Computes aggregate properties from instances