240 likes | 454 Views
RWS Provenance Experiments in Kepler (Kepler + PR + RWS). Norbert Podhorszki Ilkay Altintas Bertram Ludaescher in collaboration with Shawn Bowers Timothy McPhillips. Initial Provenance Framework (IPAW’06, Altintas et al.). Vision: Modeled as a separate concern in the system
E N D
RWS Provenance Experiments in Kepler (Kepler + PR + RWS) Norbert Podhorszki Ilkay Altintas Bertram Ludaescher in collaboration with Shawn Bowers Timothy McPhillips
Initial Provenance Framework (IPAW’06, Altintas et al.) • Vision: • Modeled as a separate concern in the system • Optional drag and drop feature • Listen to execution and save information (customizable): • Context: who, what, where, when, and why that is associated with the run • Input data and its associated metadata • Workflow outputs and intermediate data products • Workflow definition (entities, parameters, connections): a specification of what exists in the workflow and can have a context of its own • Information about the workflow evolution -- workflow trail
Kepler System Architecture Authentication GUI …Kepler GUI Extensions… Vergil Documentation Provenance Recorder Smart Re-run / Failure Recovery SMS Kepler Object Manager Type System Ext Actor&Data SEARCH Kepler Core Extensions Ptolemy IPAW’06-Altintas et al.
Parametric and customizable Different report formats Variable levels of verbosity all, some, medium, on error Multiple cache destinations Saves information on User name, Date, Run, etc… Kepler Provenance Recorder (IPAW’06, Altintas et al)
[s!] ??? Read-Write-ReSet Model (IPAW’06, McPhillips et al) r … r w…w • r, r …. r, w, w, … w, r, … r, w, ... w, … firing • what about actor state? what about “real” dependencies? • reset event s defines when actor “cuts off” dependencies • a semantic notion, known to the actor [developer] (or part of a higher-order scheme) • r, r …. r, w, w, … w, [s!] r, … r, w, ... w, … A3 PS
Goals of the PR+RWS Experiments • Use the RWS model for Kepler workflows • both single-level and nested workflows (fun starts here :-) • Extend the Kepler Provenance Recorder • Modify the methods of the provenance listener class • Classes to store execution data about the workflow • To generate the send-receive relations of the tokens correctly • To count actor firings correctly • Disclaimer: Initially only one workflow run is targeted • (but approach can handle multiple actor firings due to pipeline parallelism .. ) • future: queries over several runs and workflow-provenance • (others in Kepler already doing this merge efforts in the future)
Implementation: Data Model • Port-actor relationship • portTable(Port, Actor, type) • type is r as real and v as virtual (transparent) • Token-object relationship • tokenTable(Token, Object) • Object-value relationship • objectTable(Object, Value, Type) • type is currently not recorded • RWS trace • traceTable(Port, Event, Token, FiringCounter) • event: r as read, w as write or s as state-reset
Implementation: Class Hierarchy • Extends the existing provenance execution listener with • Methods • More event listeners • Supporting classes • RWSPortInfo, RWSActorInfo • Data structures for building and containing info about the workflow (and counters for event record • RWSEvent • Handles RWS events
Execution: Initialization Initialization phase RWSPortInfo (info locally known at a port) for each port Generate RWS portMap initialize() RWSPortInfo (build connection info) for each port Generate RWS actorMap for each actor Create new RWS event list RWSActorInfo portTable Record static wf info
Execution: Event Handling and Modifications Just before run Subscribe to token listeners TokenSend TokenGet validate() Before model is executed. event handling methods are extended here When the workflow is modified changeExecuted() Sth is changed in the workflow Re-generate RWS portMap
Execution: During the workflow run When a token event occurs New RWS event w TokenSendEvent() tokenTable Print sent token’s info (token id, object id, value) For each connected transparent port objectTable Generate virtual TokenGet event traceTable New RWS event r TokenGetEvent() Generate virtual TokenSend event If it is a transparent port
A Kepler Workflow Implementation RWS TRACE Table # of elements size in KB portTable 81 4 tokenTable 30 2objectTable 30 3traceTable 86 6
Query 1.a Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. Answer a. list of actors that contributed to the result: (21 actors). They appear in reversed order as they were executed. ?- q1b_actors('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ActorList), print(ActorList). [ .pc.Convert_x, .pc.Slicer_x, .pc.SoftMean, .pc.Reslice3, .pc.Reslice2, .pc.Reslice4, .pc.Reslice1, .pc.AlignWarp3, .pc.RefImg, .pc.RefHdr, .pc.InputHdr3, .pc.InputImg3, .pc.AlignWarp2, .pc.InputHdr2, .pc.InputImg2, .pc.AlignWarp4, .pc.InputHdr4, .pc.InputImg4, .pc.AlignWarp1, .pc.InputImg1, .pc.InputHdr1 ]
Query 1.b Answer b. list of intermediate values created by the workflow (26 values). ?- q1b_values('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ValueList), print(ValueList). ["/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage4/atlas-x.pgm", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp3.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp2.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp4.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy4.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp1.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.hdr” ]
Improved PC workflow (cf. COMAD wf) • A more generic workflow to accepts any number of images • Smaller number of actors • This effects the number of values as it requires additional array operations • cf. also COMAD approach and Taverna approach (but we fire AlignWrap individually here) RWS TRACE Table # of elements size in KB portTable 42 2 tokenTable 51 3 objectTable 39 4 traceTable 150 9
Query 1 Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is. This should tell us the new brain images from which the averaged atlas was generated, the warping performed etc. Answer a. list of actors that contributed to the result: (15 actors). They appear in reversed order as they were executed. ?- q1b_actors('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ActorList), print(ActorList). [ .pca.Convert, .pca.Slicer , .pca.hdrrepeat, .pca.seqXYZ, .pca.imgrepeat, .pca.SoftMeanArray, .pca.imgarray, .pca.hdrarray, .pca.Reslice, .pca.AlignWarp, .pca.RefHdr, .pca.InputHdr, .pca.InputImg, .pca.RefImg, .pca.Ramp ]
Query 1 Answer b. list of intermediate values created by the workflow (33 values). It includes internal data values (arrays) additionally to the original file names. ?- q1b_values('"/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif"', ValueList), print(ValueList). [ "/usr/home/pnorbert/Provenance/ProvCh/data/output/atlas-x.gif", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage4/atlas-x.pgm", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.hdr", "x", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage3/atlas.img", { "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.img" }, { "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.hdr" }, "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced2.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced3.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage2/resliced4.img", "/usr/home/pnorbert/Provenance/ProvCh/data/out-stage1/warp1.warp", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.hdr", "/usr/home/pnorbert/Provenance/ProvCh/data/input/anatomy1.img", "/usr/home/pnorbert/Provenance/ProvCh/data/input/reference.img", 1, etc...
The trick • Multi-port of Ptolemy • two distinct channels going into S and out from S • A’s output is delivered to S.C • B’s output is delivered to S.D • S.C’s output is delivered to E • S.D’s output is delivered to F
Lineage of actors and values Who contributed to value C.1 arrived at E? ?- q1('"C.1"', ActorList, ValueList). ActorList = ['.WF15.S.C', '.WF15.S', '.WF15.A'] ValueList = ['"C.1"', '1', '1'] Who contributed to value D.2 arrived at F? ?- q1('"D.2"', ActorList, ValueList). ActorList = ['.WF15.S.D', '.WF15.S', '.WF15.B'] ValueList = ['"D.2"', '2', '2']
Single-level lineage of actors and values Who contributed to value C.1 arrived at E? ?- q1b('"C.1"', ActorList, ValueList). ActorList = ['.WF15.S', '.WF15.A'] ValueList = ['"C.1"', '1'] Who contributed to value D.2 arrived at F? ?- q1b('"D.2"', ActorList, ValueList). ActorList = ['.WF15.S', '.WF15.B'] ValueList = ['"D.2"', '2']
Conclusions • 1st attempt combining Kepler PR & Kepler RWS provenance model • Both published in IPAW 2006 • Query 1 was successfully answered. • Queries 2 and 3 are answerable, but hadn’t been implemented yet. • Queries on multiple runs and workflow design provenance is out of the scope of this initial prototype. • Other groups in Kepler focusing on this.
Some related references • Provenance Framework/Recorder: • Provenance Collection Support in the Kepler Scientific Workflow System,I.Altintas, O. Barney, E. Jaeger-Frank, IPAW2006, Chicago, Illinois, May 2006. • RWS Model: • A Model for User-Oriented Data Provenance in Pipelined Scientific Workflows, Shawn Bowers, Timothy McPhillips, Bertram Ludaescher, Shirley Cohen, Susan B. Davidson. International Provenance and Annotation Workshop (IPAW'06), Chicago, Illinois, USA, May 3-5, 2006.