240 likes | 331 Views
Capturing provenance data. Dr Alison McKay (in place of Dr Richard Bagshaw) University of Leeds, School of Mechanical Engineering. Purpose of presentation. to present the DAME provenance research to discuss the experiences of deploying this technology in a Grid based systems.
E N D
Capturing provenance data Dr Alison McKay (in place of Dr Richard Bagshaw) University of Leeds, School of Mechanical Engineering
Purpose of presentation • to present the DAME provenance research • to discuss the experiences of deploying this technology in a Grid based systems
Outline of presentation • What do we mean by “provenance data”? • What are we aiming for? • What does achieving this goal entail? • What progress has been made to date? • What remains to be done?
Provenance Data • Recording the history of data and its place of origin
Workflow Script Provenance Viewer Workflow Instance Workflow Instance Workflow Instance Workflow Instance Workflow Instance Service Instance Workflow Manager Workflow Advisor Provenance Database DAME Provenance Architecture Workflow Definition (BPEL)
Outline of presentation • What do we mean by “provenance data”? • What are we aiming for? • What does achieving this goal entail? • What progress has been made to date? • What remains to be done?
Entry into Service Engine Launch Stage 1 New Project Planning Stage 4 In-Service Monitoring & Technical Support Business Concept Definition Stage 3 Propulsion System Realisation Stage 2 Full Concept Definition Identify the Need Preliminary Concept Definition Capability Acquisition RR Integrated Product Development process
DAME provenance data users Legal Implications Contractual Obligations Audit Trail Troubleshooting Re-run diagnosis Provenance Requirement
position of an engine, ie, its current state of health extra T T Potential benefits • failure mode curves • Position and shape depend on • engine type (from PDM/SDM) • engine state (eg, age) • events (eg, from QUOTE data) Time this line shows when failure occurs – its position and shape depends upon its operating environment
Specific tasks to be supported • Create an audit trail (Who, What, Where, Why, When, Which, hoW) • Re-execute a workflow process • repeat a workflow process (same Grid resources & services, sequence and data) • rerun a workflow process (same Grid resources & services and sequence on different data)
Outline of presentation • What are we aiming for? • What does achieving this goal entail? • What progress has been made to date? • What remains to be done?
Initial requirements • Support the re-execution of workflows with new data * • Provide provenance data for the Workflow Advisor • Provide a viewer to captured provenance data * As opposed to repeating a given workflow using the same data and resources
DS&S perspective on requirements • Origin of data fully traceable • (Including time and date stamps) • Processed data traceable through application software • Any human interaction/annotations must be captured
Research issues Specify Define Execute / deploy Product Product Data Management system Service Data Manager Process Workflow process definition Workflow execution data
Process definition (as defined) [GRID] resource callee id start GRID resource usage date_and_ time resource name caller end description process outcome why_used executed_by of id process element process definition (1) description process relationship composition relationship (1) related relating * process element relationship connection relationship
Process definition (as executed) Case Workflow Resource Case_id User_id Open_date Close_date Flight_start_date Deadline_date Tail_number Airline Airport Stand Quote_diagnosis Quote_status Engineer Engineer_active Engineer_why Analyst Analyst_active Analyst_why Expert Expert_active Expert_why Workflow_sequence_number Workflow_id Workflow_author_id Workflow_name Workflow_description Workflow_start_date Workflow_end_date Workflow_ip_data_type Workflow_op_data_type Workflow_diagnosis Workflow_status Resource_sequence_number Resource_id Resource_name Resource_type Resource_description Resource_start_time Resource_end_time Resource_location Resource_configuration Resource_version_number Resource_status Resource_req_no_of_processors Resource_req_memory Resource_req_operating_system Resource_req_op_sys_ver_number
MyGrid Workflow Provenance • Workflow instance capture • Workflow overview • Workflow ID, Status, Start Time, End Time, O/All input and outputs, Service List. • Service Invocations • Status, Start Time, End Time, WSDLURI, DataSets x 2. • Inputs and Outputs • ID, Name, Type, Value
Outline of presentation • What do we mean by “provenance data”? • What are we aiming for? • What does achieving this goal entail? • What progress has been made to date? • What remains to be done?
Look at SDM to select an engine Get XTO control files for selected engine Run XTO for selected engine XTO Control Files SDM MySQL-SDM2 XTO Legend Interface (transfer) resource Data storage resource Transient data resource Compute resource Application resource Interface (search) resource User executed process step XTO CR1 Data interface GRID resource
Software (Microsoft .Net) Software (Java) Software (Java) Product data database Graphical user interface Web service: Structure constructor Web service: Database BOM data viewer
Outline of presentation • What do we mean by “provenance data”? • What are we aiming for? • What does achieving this goal entail? • What progress has been made to date? • What remains to be done?
Remaining tasks • Support the re-execution of workflows with new data • Provide provenance data for the Workflow Advisor • Provide a viewer for captured provenance data • Provide audit trail for accountability purposes
Provenance research issues • Provenance requirements and scope • Provenance data security • Data storage format • Centralised provenance data • Stop points for audit trails • Repeatability of GRID resources
Longer term research Specify Define Execute / deploy Product Requirements definition Product Data Management system Service Data Manager Process Workflow process specification Workflow process definition Workflow execution data