1 / 11

Modeling Provenance through User views

Modeling Provenance through User views. Sarah Cohen-Boulakia Shirley Cohen Susan Davidson Thunyarat (Bam) Amornpetchkul Olivier Biton Database group, University of Pennsylvania. Our approach. Model of provenance Based on study of user requirements ( CIPRES )

shelly
Download Presentation

Modeling Provenance through User views

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling Provenance through User views Sarah Cohen-Boulakia Shirley Cohen Susan Davidson Thunyarat (Bam) Amornpetchkul Olivier Biton Database group, University of Pennsylvania Provenance Challenge, Sept. 2006

  2. Our approach • Model of provenance • Based on study of user requirements (CIPRES) • Based on careful studies of workflow systems (Kepler, MyGrid, Chimera) minimal information to reason about provenance • No workflow system is proposed • User views • Capability of workflow systems to group steps (forming boxes) and to zoom into boxes  Multi-granularity levels of provenance • Implemented in Oracle 10g and Java • Relational framework augmented with transitive closure • Java/Spring/JDBC: object layer and user interface Provenance Challenge, Sept. 2006

  3. Workflow Representation • Terminology • Step-classes(static) • An execution of a workflow generates a partial order of steps (dynamic) • Instances of step classes • Each step has input and outputdata input data reslice: step-class 8.reslice: step output data Provenance Challenge, Sept. 2006

  4. Provenance Trace • Base tables • Data(dataid, name, type), DataAttributes(dataid, attribute, value) • Data(1, Anatomy Image1, Anatomy Image) • DataAttributes(1, center, UChicago) • Center=UChicago • InstanceOf(Step,Step-Class,ts), StepParams(step, attribute, value), StageInstance(step, stage) • Input(stepId,dataId,ts) / Output(stepId,dataId,ts) stepId takes as input /produces dataId at time ts • Views • Process(stepId, stepClass, input, output, time) • … Provenance Challenge, Sept. 2006

  5. Provenance Queries Q1: Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is SELECT DISTINCT step, step-class, input, output FROM Process START WITH output = ( SELECT ID FROM DataID WHERE name = 'Atlas X Graphic' ) CONNECT BY PRIOR input = output ORDER BY step; Implements transitive closure. Necessary to return all the data used to (recursively) compute Atlas X Graphic. Provenance Challenge, Sept. 2006

  6. Provenance Queries (Cont.) • All the queries can be answered by our system • Code available on TWiki • Using SQL • Connect by operators • Joins with several tables (e.g. Parameters, DataAttribute) • Minus and Union operators • The generalization of Q7 (difference between workflows) is currently not answerable Provenance Challenge, Sept. 2006

  7. Workflow Variant:User Views • What are User views? • Level of detail the user wishes to track • Permissions given to the user • Ability of the user to see / know the sub-steps (distributed computation) • Why use User Views? • Throw away unimportant intermediate results • Better understanding of the workflow • Reduce the amount of work to be redone Box1 Box2 UBio UBlackBox UAdmin can see everything Provenance Challenge, Sept. 2006

  8. Querying within User Views • Need information from • Workflow: Step-class containment and user views • Cinput(sid,idid,tsi), Coutput(sid,idid,tso) •  View UProcess(usr, step, step-class, input, output) • Query: What are all the data items used to produce“Resliced Image1”? • SELECT * FROM uProcess upc WHERE usr = :userName START WITH outputName = 'Resliced Image1' CONNECT BY PRIOR upc.output = upc.input; UAdmin: Anatomy Header 1, Anatomy Image1, Reference Image, Reference Header, Wrap param1 UBio: Anatomy Header 1, Anatomy Image1, Reference Image, Reference Header UBlackBox: empty answer! Provenance Challenge, Sept. 2006

  9. Conclusion, Perspectives • Able to answer the queries, including • Data and Step provenance • Immediate and Deep (recursive) provenance • Variation of the workflow and queries considering user views • Multi-granularity levels of provenance • Only visible and necessary data are kept • Open questions • What is the meaning of “stage” in a workflow (with respect to user views)? • What are we expecting as an answer to the difference between two workflows (cf. query 7)?  • Are all the procedures of the workflow “biologically significant” (cf. user views)? Provenance Challenge, Sept. 2006

  10. Acknowledgements • Kepler Group • Shawn Bowers • Bertram Ludascher • Timothy McPhillips • Biologists from the CIPRES project • Members from the Database group, University of Pennsylvania • This work is supported by NSF grants 0513778, 0415810, and 0612177 Provenance Challenge, Sept. 2006

  11. User interface Provenance Challenge, Sept. 2006

More Related