110 likes | 252 Views
Modeling Provenance through User views. Sarah Cohen-Boulakia Shirley Cohen Susan Davidson Thunyarat (Bam) Amornpetchkul Olivier Biton Database group, University of Pennsylvania. Our approach. Model of provenance Based on study of user requirements ( CIPRES )
E N D
Modeling Provenance through User views Sarah Cohen-Boulakia Shirley Cohen Susan Davidson Thunyarat (Bam) Amornpetchkul Olivier Biton Database group, University of Pennsylvania Provenance Challenge, Sept. 2006
Our approach • Model of provenance • Based on study of user requirements (CIPRES) • Based on careful studies of workflow systems (Kepler, MyGrid, Chimera) minimal information to reason about provenance • No workflow system is proposed • User views • Capability of workflow systems to group steps (forming boxes) and to zoom into boxes Multi-granularity levels of provenance • Implemented in Oracle 10g and Java • Relational framework augmented with transitive closure • Java/Spring/JDBC: object layer and user interface Provenance Challenge, Sept. 2006
Workflow Representation • Terminology • Step-classes(static) • An execution of a workflow generates a partial order of steps (dynamic) • Instances of step classes • Each step has input and outputdata input data reslice: step-class 8.reslice: step output data Provenance Challenge, Sept. 2006
Provenance Trace • Base tables • Data(dataid, name, type), DataAttributes(dataid, attribute, value) • Data(1, Anatomy Image1, Anatomy Image) • DataAttributes(1, center, UChicago) • Center=UChicago • InstanceOf(Step,Step-Class,ts), StepParams(step, attribute, value), StageInstance(step, stage) • Input(stepId,dataId,ts) / Output(stepId,dataId,ts) stepId takes as input /produces dataId at time ts • Views • Process(stepId, stepClass, input, output, time) • … Provenance Challenge, Sept. 2006
Provenance Queries Q1: Find the process that led to Atlas X Graphic / everything that caused Atlas X Graphic to be as it is SELECT DISTINCT step, step-class, input, output FROM Process START WITH output = ( SELECT ID FROM DataID WHERE name = 'Atlas X Graphic' ) CONNECT BY PRIOR input = output ORDER BY step; Implements transitive closure. Necessary to return all the data used to (recursively) compute Atlas X Graphic. Provenance Challenge, Sept. 2006
Provenance Queries (Cont.) • All the queries can be answered by our system • Code available on TWiki • Using SQL • Connect by operators • Joins with several tables (e.g. Parameters, DataAttribute) • Minus and Union operators • The generalization of Q7 (difference between workflows) is currently not answerable Provenance Challenge, Sept. 2006
Workflow Variant:User Views • What are User views? • Level of detail the user wishes to track • Permissions given to the user • Ability of the user to see / know the sub-steps (distributed computation) • Why use User Views? • Throw away unimportant intermediate results • Better understanding of the workflow • Reduce the amount of work to be redone Box1 Box2 UBio UBlackBox UAdmin can see everything Provenance Challenge, Sept. 2006
Querying within User Views • Need information from • Workflow: Step-class containment and user views • Cinput(sid,idid,tsi), Coutput(sid,idid,tso) • View UProcess(usr, step, step-class, input, output) • Query: What are all the data items used to produce“Resliced Image1”? • SELECT * FROM uProcess upc WHERE usr = :userName START WITH outputName = 'Resliced Image1' CONNECT BY PRIOR upc.output = upc.input; UAdmin: Anatomy Header 1, Anatomy Image1, Reference Image, Reference Header, Wrap param1 UBio: Anatomy Header 1, Anatomy Image1, Reference Image, Reference Header UBlackBox: empty answer! Provenance Challenge, Sept. 2006
Conclusion, Perspectives • Able to answer the queries, including • Data and Step provenance • Immediate and Deep (recursive) provenance • Variation of the workflow and queries considering user views • Multi-granularity levels of provenance • Only visible and necessary data are kept • Open questions • What is the meaning of “stage” in a workflow (with respect to user views)? • What are we expecting as an answer to the difference between two workflows (cf. query 7)? • Are all the procedures of the workflow “biologically significant” (cf. user views)? Provenance Challenge, Sept. 2006
Acknowledgements • Kepler Group • Shawn Bowers • Bertram Ludascher • Timothy McPhillips • Biologists from the CIPRES project • Members from the Database group, University of Pennsylvania • This work is supported by NSF grants 0513778, 0415810, and 0612177 Provenance Challenge, Sept. 2006
User interface Provenance Challenge, Sept. 2006