1 / 25

Ontology-driven Provenance Management in eScience: An Application in Parasite Research

Ontology-driven Provenance Management in eScience: An Application in Parasite Research. Satya S. Sahoo 1 , D. Brent Weatherly 2 , Raghava Mutharaju 1 , Pramod Anantharam 1 , Amit Sheth 1 , Rick L. Tarleton 2. 1 Kno.e.sis Center, Wright State University;

evan
Download Presentation

Ontology-driven Provenance Management in eScience: An Application in Parasite Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ontology-driven Provenance Management in eScience: An Application in Parasite Research Satya S. Sahoo1, D. Brent Weatherly2, Raghava Mutharaju1, Pramod Anantharam1, Amit Sheth1, Rick L. Tarleton2 1Kno.e.sis Center, Wright State University; 2 Center for Tropical and Emerging Diseases, University of Georgia ODBASE2009 Vilamoura, Algarve-Portugal November 05, 2009

  2. Parasite Research: Creation of Parasite Strains New Parasite Strains

  3. Provenance in Parasite Research Gene Name Related Queries from Biologists • Q2: List all groups in the lab that used a Target Region Plasmid? • Q3: Which researcher created a new strain of the parasite (with ID = 66)? • An experiment was not successful – has this experiment been conducted earlier? What were the results? Gene Knockout and Strain Creation* Sequence Extraction 3‘ & 5’ Region Drug Resistant Plasmid Gene Name Plasmid Construction Knockout Construct Plasmid T.Cruzi sample ? Transfection Transfected Sample Drug Selection Cloned Sample Selected Sample Cell Cloning Cloned Sample *T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia

  4. Provenance Management in Science • Provenance from the French word “provenir” describes the lineage or history of a data entity • For Verification and Validation of Data Integrity, Process Quality, and Trust • Issues in Provenance Management • Provenance Modeling • A Dedicated Query Infrastructure • Practical Provenance Management Systems

  5. Outline • Provenance Modeling: Provenir →Parasite Experiment ontology • Provenance Query Infrastructure • Provenance Query Engine • Evaluation Results • Query Optimization: Materialized Provenance Views

  6. Ontologies for Provenance Modeling • Advantages of using Ontologies • Formal Description: Machine Readability, Consistent Interpretation • Use Reasoning: Knowledge Discovery over Large Datasets • Problem: A gigantic, monolithic Provenance Ontology! – not feasible • Solution: Modular Approach using a Foundational Ontology FOUNDATIONAL ONTOLOGY PARASITE EXPERIMENT GLYCOPROTEIN EXPERIMENT OCEANOGRAPHY

  7. Provenir Ontology Gene Name Sequence Extraction 3‘ & 5’ Region Drug Resistant Plasmid AGENT Plasmid Construction Knockout Construct Plasmid T.Cruzi sample has_agent Transfection Transfection Machine DATA Transfected Sample Drug Selection participates_in Selected Sample PROCESS Cell Cloning Cloned Sample

  8. Provenir Ontology Schema SPATIAL THEMATIC TEMPORAL is_a is_a is_a located_in PARAMETER DATA COLLECTION is_a is_a AGENT has_temporal_value DATA participates_in has_agent PROCESS preceded_by

  9. Domain-specific Provenance: Parasite Experiment ontology PROVENIR ONTOLOGY agent has_agent is_a is_a data parameter has_participant is_a data_collection is_a process is_a spatial_parameter temporal_parameter domain_parameter is_a is_a is_a is_a is_a is_a transfection_machine location is_a drug_selection is_a is_a sample has_participant Time:DateTimeDescritption transfection cell_cloning is_a transfection_buffer strain_creation_ protocol Tcruzi_sample PARASITE EXPERIMENT ONTOLOGY has_parameter *Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia

  10. Outline • Provenance Modeling: Provenir →Parasite Experiment ontology • Provenance Query Infrastructure • Provenance Query Engine • Evaluation Results • Query Optimization: Materialized Provenance Views

  11. Provenance Query Classification Classified Provenance Queries into Three Categories • Type 1: Querying for Provenance Metadata • Example: Which gene was used create the cloned sample with ID = 66? • Type 2: Querying for Specific Data Set • Example: Find all knockout construct plasmids created by researcher Michelle using “Hygromycin” drug resistant plasmid betweenApril 25, 2008 and August 15, 2008 • Type 3: Operations on Provenance Metadata • Example: Were the two cloned samples 65 and 46 prepared under similar conditions – compare the associated provenance information

  12. Provenance Query Operators Four Query Operators – based on Query Classification • provenance () – Closure operation, returns the complete set of provenance metadata for input data entity • provenance_context() - Given set of constraints defined on provenance, retrieves datasets that satisfy constraints • provenance_compare () - adapt the RDF graph equivalence definition • provenance_merge () - Two sets of provenance information are combined using the RDF graph merge

  13. Answering Provenance Queries using provenance () Operator

  14. Outline • Provenance Modeling: Provenir →Parasite Experiment ontology • Provenance Query Infrastructure • Provenance Query Engine • Evaluation Results • Query Optimization: Materialized Provenance Views

  15. Provenance Query Engine • Available as API for integration with provenance management systems • Layer on top of a RDF Data Store (Oracle 10g), requires support for: • Rule-based reasoning • SPARQL query execution • Input: • Type of provenance query operator : provenance () • Input value to query operator: cloned sample 66 • User details to connect to underlying RDF store

  16. Outline • Provenance Modeling: Provenir →Parasite Experiment ontology • Provenance Query Infrastructure • Provenance Query Engine • Evaluation Results • Query Optimization: Materialized Provenance Views

  17. Evaluation Results • Queries expressed in SPARQL • Datasets using real experiment data

  18. Evaluation Results

  19. Outline • Provenance Modeling: Provenir →Parasite Experiment ontology • Provenance Query Infrastructure • Provenance Query Engine • Evaluation Results • Query Optimization: Materialized Provenance Views

  20. Query Optimization: Materialized Provenance Views • Materializes a single logical unit of provenance • Does not require query-rewriting • View updates: addressed by characteristics of provenance • Created using a memoization approach

  21. Provenance Query Engine Architecture QUERY OPTIMIZER TRANSITIVE CLOSURE

  22. Evaluation Results using Materialized Provenance Views

  23. Provenance Management System for Parasite Research

  24. Acknowledgement • Flora Logan – The Wellcome Trust Sanger Institute, Cambridge, UK • Priti Parikh – Kno.e.sis Center, Wright State University • Roger Barga– Microsoft Research, Redmond • Jonathan Goldstein – Microsoft Research, Redmond

  25. Thank you Contact email: satyasahoo@gmail.com Web: http://knoesis.wright.edu/researchers/satya/

More Related