170 likes | 295 Views
Provenance Management Framework. Satya S. Sahoo Kno.e.sis Center, Wright State University In Collaboration with Tarleton Lab, University of Georgia and Microsoft Research http://knoesis.wright.edu/research/semsci/application_domain/sem_prov /. Outline. Provenance – Introduction
E N D
Provenance Management Framework Satya S. Sahoo Kno.e.sis Center, Wright State University In Collaboration with Tarleton Lab, University of Georgia and Microsoft Research http://knoesis.wright.edu/research/semsci/application_domain/sem_prov/
Outline • Provenance – Introduction • Provenance Representation • Classification of Provenance Queries & Query Operators • Provenance Query Engine • T.cruzi SPSE Provenance Management System
Provenance – Introduction Gene Name • Provenance from the French word “provenir” describes the lineage or history of a data entity • For Verification and Validation of Data Integrity, Process Quality, and Trust • Application of Provenance Metadata beyond verification and validation –eScience Data Management Gene Knockout and Strain Creation* Sequence Extraction 3‘ & 5’ Region Drug Resistant Plasmid Gene Name Plasmid Construction Knockout Construct Plasmid T.Cruzi sample ? Transfection Transfected Sample Drug Selection Cloned Sample Selected Sample Cell Cloning Cloned Sample *T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
Outline • Provenance – Introduction • Provenance Representation • Classification of Provenance Queries & Query Operators • Provenance Query Engine • T.cruzi SPSE Provenance Management System
Provenir ontology Gene Name • A Common Provenance Model defined in OWL-DL – Provenir ontology • Provenance Metadata as RDF – allows use of Semantic Web Reasoning Framework • A Suite of Domain-specific Provenance ontologies - Provenir as Common Reference Model • Three Base Classes – 8 specialized Sub-classes, Eleven Foundational Relations – reuse of Relation Ontology Sequence Extraction contained_in 3‘ & 5’ Region Drug Resistant Plasmid AGENT Plasmid Construction Knockout Construct Plasmid T.Cruzi sample has_agent Transfection Transfection Machine DATA Transfected Sample Drug Selection participates_in Selected Sample PROCESS Cell Cloning Cloned Sample
Domain-specific Provenance: Parasite Experiment ontology PROVENIR ONTOLOGY agent has_agent is_a is_a data parameter has_participant is_a data_collection is_a process is_a spatial_parameter temporal_parameter domain_parameter is_a is_a is_a is_a is_a is_a transfection_machine location is_a drug_selection is_a is_a sample has_participant Time:DateTimeDescritption transfection cell_cloning is_a transfection_buffer strain_creation_ protocol Tcruzi_sample PARASITE EXPERIMENT ONTOLOGY has_parameter *Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia
Outline • Provenance – Introduction • Provenance Representation • Classification of Provenance Queries & Query Operators • Provenance Query Engine • T.cruzi SPSE Provenance Management System
Provenance Query Classification Classified Provenance Queries into Three Categories • Type 1: Querying for Provenance Metadata • Example: Which gene was used create the cloned sample with ID = 65? • Type 2: Querying for Specific Data Set • Example: Find all knockout construct plasmids created by researcher Michelle using “Hygromycin” drug resistant plasmid betweenApril 25, 2008 and August 15, 2008 • Type 3: Operations on Provenance Metadata • Example: Were the two cloned samples 65 and 46 prepared under similar conditions – compare the associated provenance information
Provenance Query Operators Four Query Operators – based on Query Classification • provenance () – Closure operation, returns the complete set of provenance metadata for input data entity • provenance_context() - Given set of constraints defined on provenance, retrieves datasets that satisfy constraints • provenance_compare () - adapt the RDF graph equivalence definition • provenance_merge () - Two sets of provenance information are combined using the RDF graph merge
Outline • Provenance – Introduction • Provenance Representation • Classification of Provenance Queries & Query Operators • Provenance Query Engine • T.cruzi SPSE Provenance Management System
Provenance Query Engine • Support Provenance Query Operators over a RDF store • Provenance Query Engine based on Jena plug-in for Oracle RDF store (support for SPARQL specification) • Developed as an API, compatible with any RDF store with support for Rules • Maps Query Operators to Domain-specific Provenance ontology – uses RDFS Entailment Rules • Query Optimization: Defined a new class of materialized views called Materialized Provenance Views (MPV) • MPV defined by Provenir ontology
Outline • Provenance – Introduction • Provenance Representation • Classification of Provenance Queries & Query Operators • Provenance Query Engine • T.cruzi SPSE Provenance Management System
Conclusions • A Common Model of Provenance – Interoperable, Consistent Interpretation and well-defined Semantics • Categorization of Provenance Queries – Query Operators • Provenance Query Engine • Application of Provenance Metadata beyond Verification and Validation – eScience Data Management PROVENANCE ALGEBRA MATERIALIZED PROVENANCE VIEW
Acknowledgement • D. Brent Weatherly – Tarleton Lab, University of Georgia • Flora Logan – The Wellcome Trust Sanger Institute • Roger Barga– Microsoft Research • Jonathan Goldstein – Microsoft Research • RaghavaMutharaju– Kno.e.sis Center, Wright State University • PramodAnantharam- Kno.e.sis Center, Wright State University
More Resources at: • Satya S. Sahoo et. al, "Where did you come from...Where did you go?" An Algebra and RDF Query Engine for Provenance, (http://knoesis.wright.edu/library/resource.php?id=00706) • Trykipedia: A Wiki-based public resource for Parasite Researchers http://knoesis.wright.edu/trykipedia • Provenance Management Framework: http://knoesis.wright.edu/research/semsci/application_domain/sem_prov/ • T.cruzi Semantic Problem Solving Environment: http://knoesis.wright.edu/research/semsci/application_domain/sem_life_sci/tcruzi_pse/