210 likes | 432 Views
Provenir ontology: Towards a Framework for eScience Provenance Management. Satya S. Sahoo , Amit P. Sheth Kno.e.sis Center, Wright State University. Microsoft eScience Workshop 2009 Pittsburgh, Oct 16. Outline. Provenance: A Tale of Two Use Cases
E N D
Provenir ontology: Towards a Framework for eScience Provenance Management Satya S. Sahoo, Amit P. Sheth Kno.e.sis Center, Wright State University Microsoft eScience Workshop 2009 Pittsburgh, Oct 16
Outline • Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research
Provenance in GlycoProtein Analysis Cell Culture extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 Separation technique I n Glycopeptides Fraction PNGase n Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction Data reduction ms peaklist ms/ms peaklist binning Peptide identification Parent protein and peptide list N-dimensional array Peptide list Data correlation Signal integration ? Proteolytic enzyme
Provenance in Parasite Research Gene Name • Provenance from the French word “provenir” describes the lineage or history of a data entity • For Verification and Validation of Data Integrity, Process Quality, and Trust • Issues in Provenance Management • Interoperability • Consistent Modeling • Reduce Terminological Heterogeneity Gene Knockout and Strain Creation* Sequence Extraction 3‘ & 5’ Region Drug Resistant Plasmid Gene Name Plasmid Construction Knockout Construct Plasmid T.Cruzi sample ? Transfection Transfected Sample Drug Selection Cloned Sample Selected Sample Cell Cloning Cloned Sample *T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
Outline • Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research
Ontologies for Provenance Modeling • Advantages of using Ontologies • Formal Description: Machine Readability, Consistent Interpretation • Use Reasoning: Knowledge Discovery over Large Datasets • Problem: A gigantic, monolithic Provenance Ontology! – not feasible • Solution: Modular Approach using a Foundational Ontology FOUNDATIONAL ONTOLOGY PARASITE EXPERIMENT GLYCOPROTEIN EXPERIMENT OCEANOGRAPHY
Outline • Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research
Provenir Ontology Gene Name Sequence Extraction 3‘ & 5’ Region Drug Resistant Plasmid AGENT Plasmid Construction Knockout Construct Plasmid T.Cruzi sample has_agent Transfection Transfection Machine DATA Transfected Sample Drug Selection participates_in Selected Sample PROCESS Cell Cloning Cloned Sample
Provenir Ontology Schema SPATIAL THEMATIC TEMPORAL is_a is_a is_a located_in PARAMETER DATA COLLECTION is_a is_a AGENT has_temporal_value DATA participates_in has_agent PROCESS preceded_by
Domain-specific Provenance: Parasite Experiment ontology PROVENIR ONTOLOGY agent has_agent is_a is_a data parameter has_participant is_a data_collection is_a process is_a spatial_parameter temporal_parameter domain_parameter is_a is_a is_a is_a is_a is_a transfection_machine location is_a drug_selection is_a is_a sample has_participant Time:DateTimeDescritption transfection cell_cloning is_a transfection_buffer strain_creation_ protocol Tcruzi_sample PARASITE EXPERIMENT ONTOLOGY has_parameter *Parasite Experiment ontology available at: http://wiki.knoesis.org/index.php/Trykipedia
Outline • Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research
Provenance Query Classification Classified Provenance Queries into Three Categories • Type 1: Querying for Provenance Metadata • Example: Which gene was used create the cloned sample with ID = 65? • Type 2: Querying for Specific Data Set • Example: Find all knockout construct plasmids created by researcher Michelle using “Hygromycin” drug resistant plasmid betweenApril 25, 2008 and August 15, 2008 • Type 3: Operations on Provenance Metadata • Example: Were the two cloned samples 65 and 46 prepared under similar conditions – compare the associated provenance information
Provenance Query Operators Four Query Operators – based on Query Classification • provenance () – Closure operation, returns the complete set of provenance metadata for input data entity • provenance_context() - Given set of constraints defined on provenance, retrieves datasets that satisfy constraints • provenance_compare () - adapt the RDF graph equivalence definition • provenance_merge () - Two sets of provenance information are combined using the RDF graph merge
Provenance Query Engine Architecture QUERY OPTIMIZER • Available as API for integration with provenance management systems • Input: • Type of provenance query operator : provenance () • Input value to query operator: cloned sample 65 • User details to connect to underlying Oracle RDF store TRANSITIVE CLOSURE
Outline • Provenance: A Tale of Two Use Cases • Provenance Ontologies: A Modular Approach • Provenir: A Foundational Model of Provenance • Provenance Query Infrastructure • Application to Parasite Research
Conclusions • Provenir ontology as a foundational model for provenance • Extensible to model domain-specific provenance • Parasite Experiment ontology • Trident ontology • ProPreO ontology • Query Infrastructure to support provenance modeled using Provenir ontology • Application in a NIH-funded project for Parasite Research
Acknowledgement • Roger Barga– Microsoft Research, eScience • D. Brent Weatherly – Center for Tropical and Emerging Diseases, University of Georgia • Flora Logan – The Wellcome Trust Sanger Institute, Cambridge, UK • RaghavaMutharaju– Kno.e.sis Center, Wright State University • PramodAnantharam- Kno.e.sis Center, Wright State University
References • Provenir ontology: http://wiki.knoesis.org/index.php/Provenir_Ontology • Provenance Management in Parasite Research: http://knoesis.wright.edu/library/resource.php?id=00712 • Provenance Management Framework: http://knoesis.wright.edu/research/semsci/application_domain/sem_prov/ • T.cruzi Semantic Problem Solving Environment: http://knoesis.wright.edu/research/semsci/application_domain/sem_life_sci/tcruzi_pse/