220 likes | 395 Views
Karma Provenance Framework v2 Provenance Challenge Workshop/GGF18. Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University. Outline. Architecture of Karma Workflow Setup & Collecting Provenance Provenance Traces “canonical” Challenge Queries Suggested Variations.
E N D
Karma Provenance Framework v2Provenance Challenge Workshop/GGF18 Yogesh L. Simmhan Beth Plale, Dennis Gannon, Srinath Perera Indiana University
Outline • Architecture of Karma • Workflow Setup & Collecting Provenance • Provenance Traces • “canonical” Challenge Queries • Suggested Variations
Provenance Collection: Challenges & Uses • Linked Environments for Atmospheric Discovery (LEAD) project • Weather & Severe Storm Prediction Applications • Provenance on workflow (process) & data products at fine granularity • Dynamic, Long running workflows • Helps scientists to • search for workflows & data products • estimate data quality, • track workflow execution, and • analyze & mine data products from runs
Karma Provenance Framework • Lightweight – do not duplicate existing metadata cataloging effort • myLEAD personal metadata catalog • ResCat service & data registry • Glue to integrate metadata on data & services with runtime workflow information • Scalability1 – 500 users, 100’s of workflows, 10,000’s of data products [1] Performance Evaluation of the Karma Provenance Framework, Simmhan, Y., et al.; IPAW, 2006
Karma Provenance Framework • Key Provenance Activities generated during lifetime of wrokflow • Workflow | Service Invoked • Data Consumed • Data Produced • Sending Response • Activities modeled as XML messages • Published asynchronously by service|workflow|client • Presently use WS-Eventing messaging system • Activities stored in relational database
Message Bus WS-EventingService API Query for Workflow, Process, & Data Provenance Karma Provenance Service Provenance Browser Client Provenance Listener Provenance Query API Activity DB Subscribe & Listen to Activity Notifications WS-Messenger Notification Broker WorkflowInvoked & SendingResponse Activities Publish Provenance Activities as async Notifications ServiceInvoked & Sending Response, Data–Produced & –Consumed Activities Karma Architecture1 Workflow Engine Workflow Instance 10 Data Products Consumed & Produced by each Service Orchestration Service 1 Service 2 Service 9 Service 10 … 10C 10P 10P 10C 10P/10C 10P/10C [1] A Framework for Collecting Provenance in Data-Centric Scientific Workflows, Simmhan, Y., et al., Submitted to ICWS Conference, 2006
Provenance Challenge Workflow • Applications modeled as web-services • Generic Factory toolkit creates web-service wrappers for command-line applications • Service invokes a shell-script/application, passing command-line arguments • Created services automatically instrumented to generate provenance using Karma client library • Workflow composed as GPEL* script • XBaya Workflow composer GUI • Central GPEL workflow engine orchestrates execution *Grid Process Execution Language, an extension of the Business Process Execution Language (BPEL)
Provenance Traces – Building Block Queries • Data Provenance: get[Recursive]DataProvenance • What (ID), where (URL), when (Timestamp) • How (Process, inputs)
Provenance Traces – Building Block Queries • Process Provenance: getProcessProvenance • What (ID), when (Timestamp), who (Invoker) • State (execution/completion status) • Input & Output data products
Provenance Traces – Building Block Queries • Workflow Trace: getWorkflowTrace • What (ID), when (Timestamp), who (Invoker) • State (execution/completion status) • Process provenance of workflow steps
Provenance Challenge Queries • ! Answered by Karma Service API Directly • Answered by Karma Service API, with post-processing by client • ~ Answered by access to backend DB (SQL) • Not answered
Provenance Challenge Queries: Q1 • Find everything that caused Atlas X Graphic to be as it is • ! Answered by Karma Service API Directly • This is the recursive data provenance of the Atlas X Graphic file • A call to getRecursiveDataProvenance( ‘lead:uuid:1157946992-atlas-x.gif’) returns this [www]
Provenance Challenge Queries: Q2 • Find the process that led to Atlas X Graphic, excluding all prior to softmean • Answered by Karma Service API, with post-processing by client • First call getDataProvenance • Then recursively get data provenance till ‘SoftmeanService’ is seen Returns this [www] 1. let $dataList := ['lead:uuid:1157946992-atlas-x.gif'] 2. while ($dataList != empty) do // get data provenance for this level a. $dataProvenance = karma.getDataProvenance($dataList[0]) // print process information & remove data from list b. Print $dataProvenance; $dataList.delete(0) c. if ($dataProvenance.getProducedBy() == 'SoftmeanService') break; // found Softmean. Stop. // get input data used by this data & recurse up the tree d. foreach ($inputData in $dataProvenance.getUsingData()) do i. $dataList.add($inputData) 3. End
Provenance Challenge: Q4 • Find all invocations of align_warp with parameter "-m 12" that ran on a Monday • ~ Answered by access to backend DB (SQL) • Use SQL query to get matching invocations • Call getProcessProvenanceto get description of align_warp Returns this [www] SELECTinvokee.workflow_id, invokee.service_id, invokee.workflow_node_id, invokee.workflow_timestep, invoker.workflow_id, invoker.service_id, invoker.workflow_node_id, invoker.workflow_timestep FROM invocation_state_table invocation, entity_table invokee, entity_table invoker, notification_table notifications WHEREinvokee.entity_id = invocation.invokee_id AND invoker.entity_id = invocation.invoker_id AND notifications.source_id = invocation.invokee_id AND notifications.notification_type = 'ServiceInvoked' AND invokee.service_id = 'urn:qname:http://www.extreme.indiana.edu/karma/challenge06:AlignWarpService' ANDnotifications.notification_xml LIKE'%<ModelMenuNumber>12</ModelMenuNumber>%‘ ANDDayOfWeek(invocation.request_receive_time) = 2; // 1=Sunday, 2=Monday, ...
Provenance Challenge: Q9 • Find all the graphical atlas sets that have metadata annotation studyModality with values speech, visual or audio, and return all other annotations to these files. • Not answered • We do not expect to answer such queries through the provenance system • We push the provenance informationto external metadata management systems such as MyLEAD, which can answer such “join” queries on data product metadata and provenance
Variations of Workflow • Workflows with loops • Workflows whose structure changes dynamically • or, as a simpler case, workflows with conditional branches • Hierarchical composition of workflows • workflows invoking other workflows • ~Similar to user-views (UPenn), nested-workflows (myGrid), …
Variations of Queries • Find all [workflows | processes] with a particular execution status [completed | failed | waiting for input] • Dynamic attribute of provenance? • Query for client view and service view of the provenance • Check for differences
AcknowledgementsAlek Slominski (GPEL Engine)Satoshi Shirasuna (XBaya Composer)LEAD MembersNSF Questions www.extreme.indiana.edu/karma
Sample Activities Published • More here [www]