160 likes | 253 Views
Abstract Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance. Daniel Zinn Bertram Ludäscher University of California at Davis Presented at IPAW 2010. Motivation. Phylogenetic Tree of Primates. Scientific Workflows. Newick Tree. Aligned AA-Sequences.
E N D
Abstract Provenance Graphs: Anticipating and Exploiting Schema-Level Data Provenance Daniel Zinn Bertram Ludäscher University of California at Davis Presented at IPAW 2010
Motivation Phylogenetic Tree of Primates
Scientific Workflows Newick Tree Aligned AA-Sequences AA-Sequences Clustal DrawTree Quicktree Actors Tokens int, string, record{..}, array[..], .. Ports Channels SciWF = Executable specification of Scientific Method
Virtual Data Assembly Lines (COMAD in Kepler) Data is organized as XML-like tree structures Encapsulate actor within a configurableshell XML XML XML XML Three configuration parameters: Scopeσ – to select scope of actor invocation Input assemblerγ – to create inputs for wrapped component Write expressionω – to write results back into data stream
Ex1: Phylogenetics Workflow Configurations Workflow Graph APG
Ex2: BUG! Idle Actor – No Input No input Configurations Workflow Graph APG No input
Ex3: Bug! Wrong input Configurations Workflow Graph APG
APGs for the Three Examples • Ex1: Desired result Ex2: Bug - No input Ex3: Bug - Too much input
Time-Collapsed Flowgraph • Only Show Collection-Structure at the End Collapse Provenance
Structure-Collapsed Flowgraph • Collapse the Collection edges
Summary • Abstract Provenance Graphs • … summarize potential Provenance graphs via graph homomorphisms • … are constructed via static analysis of the workflow without running the WF • … explain workflow’s data-flow • … make it easier to spot certain configuration bugs