180 likes | 538 Views
Interactive Composition of Scientific Workflows. Yolanda Gil USC Information Sciences Institute. Jihie Kim Varun Ratnakar Marc Spraragen . Southern California Earthquake Center (SCEC): Community Modeling Environment. User-Guided Creation of Workflows in SCEC.
E N D
Interactive Composition of Scientific Workflows Yolanda Gil USC Information Sciences Institute Jihie Kim Varun Ratnakar Marc Spraragen
Southern California Earthquake Center (SCEC):Community Modeling Environment
User-Guided Creation of Workflows in SCEC • Problem: In order to bring sophisticated models to a wide range of users (civil engineers, city planners, disaster resp. teams), we need to provide assistance and automation while allowing users to guide the process • Choosing appropriate models (e.g., given site, eqk. Forecast) • Setting parameters through valid approximations (e.g., shear-wave velocity) • Complying with parameter value constraints (e.g., magnitude) • Detecting and resolving interacting constraints • Composing valid end-to-end pathways from individual models • Execution of pathway on grid resources • Same problems arise for scientists sharing models across SCEC institutions and across disciplines
The Process of Creating an Executable Workflow • Creating a valid workflow template • Selecting simulation models and connecting inputs and outputs • Adding other steps for data conversions/transformations • Creating instantiated workflow • Providing input data to pathway inputs (logical assignments) • Creating executable workflow • Given requirements of each model, find and assign adequate resources for each model • Select physical locations for logical names • Include data movement steps, including data deposition steps
The Process of Creating an Executable Workflow • Creating a valid workflow template • Selecting simulation models and connecting inputs and outputs • Adding other steps for data conversions/transformations • Creating instantiated workflow • Providing input data to pathway inputs (logical assignments) • Creating executable workflow • Given requirements of each model, find and assign adequate resources for each model • Select physical locations for logical names • Include data movement steps, including data deposition steps [Tangmurarunkit, Decker & Kesselman 03] This talk
A Valid Workflow Template Duration-Year Task Result: Hazard curve: SA vs. prob. exc. Fault-Grid-Spacing UTM Converter (get-Lat-Long- given-UTM) Lat. long UTM (, , , ) Rupture Offset PEER-Fault Gaussian Dist No Truncation Total Moment Rate Mag-Length-sigma Dip Ruptures Rake Hazard curve: SA vs. prob. exc. Magnitude (min) Hazard Curve Calculator: SA vs. prob. exc. Ruptures Magnitude (max) rfml Magnitude (mean) Rupture Lat Long. Velocity CVM-get- Velocity- at-point Field (2000) IMR: SA exc. prob. Lat Long. Site VS30 SA exc. probs. Site Basin-Depth-2.5 Lat Long. Basin-Depth Basin-Depth Calculator rfml SA Period Gaussian Truncation SA exc. prob. Std. Dev. Type
Challenges for Interactive Composition of Valid Workflow Templates • Automatic tracking of workflow constraints • User is notified if there are problems but does not have to keep track of details • Provide flexible interaction • User can start from initial data, from data products, or steps • User can specify abstract descriptions of steps and later specialize them • User can reuse, merge, or build from scratch • Proactive assistance • System should not just point out problems but help user by suggesting fixes (always) • And… how do we define what “valid” means?
Interactive Composition of Valid Workflow Templates: Approach Mixed-initiative system that helps users create, reuse, and combine pathways by exploiting: • Knowledge-based descriptions of components • Ontology of components and component types based on common features and parameter constraints • Analysis of (partially constructed) pathways based on AI planning techniques • Relate steps to goals and initial states, and interpret user actions in terms of incremental plan generation • Provide formal definitions of desirable properties of pathways Develop algorithm that integrates both techniques to check constraints and properties, guaranteeing correctness
Ontology of Components Domain Ontology Hazard-Level-with-Median F2-Hazard-Level Distance Basin-Depth Hazard-Level-with-SA Hazard-Level-with-PGA Hazard-Level-with-PGV Fault-Type IMR-Input-Parameter F2-SA-Median-wrt-VS30 Hazard-Level-with-SA-Median Hazard-Level-with-SA-Std-Dev Hazard-Level-with-SA-Prob-Exc Hazard-Level-with-Median Hazard-Level-with-Std-Dev Parameter Field-2000-Input-Parameter . . . . . . Compute-Hazard-Level- given-IMR-input-parameters Hazard-Level probability-function IMT Compute-Hazard-Level- with-SA- given-IMR-input-parameters Compute-Hazard-Level- with-PGV- given-IMR-input-parameters Compute-Hazard-Level-with-PGA- given-IMR-input-parameters . . . probability-function Compute-Hazard-Level-with-SA-Median- given-IMR-input-parameters Compute-Hazard-Level-with-SA-Std-Dev- given-IMR-input-parameters IMR Compute-Hazard-Level-with-SA-Prob-Exc- given-IMR-input-parameters Compute-F2-Hazard-Level- given-Field-2000-input-parameters . . . . . . . . . Compute-F2-SA-Median- given-Field-2000-input-parameters Compute-F2-SA-Median-wrt-Distance-JB- given-Fault-Type-&-Basin-Depth-&-… Compute-F2-SA-MEDIAN-wrt-VS30- given-Fault-Type-&-Basin-Depth-&-… . . . . . . F2-operation-SA-Median-Distance-JB F2-operation-SA-Median-VS30
Year l: Modeling and Using Simulation Code for Seismic Hazard Analysis with DOCKER [Gil & Ratnakar 02] Model developers can easily add simple constraints to model description and document their sources and criticality Declarative descriptions of models are linked to ontologies and reasoners System generates formal representations of model constraints in PowerLoom as well as XSD and WSDL User is allowed to override model constraints to accommodate analysis System reasons about model constraints and suggests alternative models
Desirable Properties of Workflow Templates • Satisfied iff the sources of input parameters for all components are specified • A parameter p input-parameters (c) is satisfied iff a link < co,po,ci,pi> L s.t. pi = p • Purposeful iff the workflow template specifies at least one end result • A workflow template <C, L, I, G> is purposeful G ≠ Ø. • Grounded iff each component has a unique assignment to an executable component • A workflow template <C, L, I, G> is grounded iff c C, c is grounded(c) • Complete iff satisfied, purposeful, and grounded • Acyclic iff no loops • A workflow template <C, L, I, G> is acyclic iff c C , c is not Linked to c. • Justified iff all components contribute to the end results • A component c C is justified iff c G or c2 G where c is Linked to c2. • Parsimonious iff there are no redundant links or components • A Link l <co,po,ci, pi> L is redundant iff link l2 <co’,po’,ci’, pi’> L s.t. l l2 and co = co’ and po’ = po and ci = ci’ and pi = pi’. • Well-Formed iff acyclic, justified, and parsimonious • Consistent iff all links satisfy defined component requirements and constraints • A Link <c1,p1, c2, p2> is type-consistentiff subtype-of(range(c1,p1),range(c2,p2)) • A Link <c1,p1, c2, p2> is semantically-consistentiff subsumes(range(c1,p1),range(c2,p2) • Correct iff complete, well-formed, and consistent
Assisting Users in Creating Workflow Templates • User interaction results in modifications to workflows • Specify desired result, external/user provided input • Add/remove step, add/remove link • Specialize step (e.g., IMR -> IMR-SA) • As user creates a workflow, intermediate stages result in possibly incorrect workflows • ErrorScan algorithm detects errors and generates possible fixes • Fixes are multi-step and “click-through” • Errors and fixes are ranked using heuristics • If no errors detected, workflow is guaranteed to be correct
Assisting Users in Creating Workflow Templates ErrorScan algorithm ErrorScan Input: Workflow W <C,L,I,G> Output: list of errors and corresponding fix suggestions I. If W is not purposeful, return Error. Suggestions: define end result e using types from the KB, AddEndResult (e). II. For each Component C in W: a. If C is not Justified, return Error. Suggestions p that is output-parameter (c), find components cj in the workflow or the KB that have pj as input- parameter(cj), and subsumes(pj,p), AddLink(c,p,cj,pj) b. If C is not grounded, return Error. Suggestions: ( Cj FindDirectSubtypes(c), SpecializeComponent(C, Cj). c. For each i in input-parameter(c): 1. If i is not Satisfied, return Error. Suggestions: cj C with output parameter pj such that subsumes(range(c,i),range(cj,pj)) AddLink(cj,pj,c,i). Suggestions: cj FindMatchingOutput (i)), AddLink(cj,pj,c,i). Suggestion:AddAndLinkComponent (W, AddInitialInput(i),range( i), c, i) III. For each Link L in W: a.If L is not Consistent, return Error. Suggestions: Ci FindInterPosingComponent(L), InterposeComponent (Ci, L). Suggestion: RemoveLink(L). b. If L is Redundant, return Error. Suggestion: RemoveLink (L). • User interaction results in modifications to workflows • Specify desired result, external/user provided input • Add/remove step, add/remove link • Specialize step (e.g., IMR -> IMR-SA) • As user creates a workflow, intermediate stages result in possibly incorrect workflows • ErrorScan algorithm detects errors and generates possible fixes • Fixes are multi-step and “click-through” • Errors and fixes are ranked using heuristics • If no errors detected, workflow is guaranteed to be correct
CAT: Composition Analysis Toolto Create Pathway Templates Declarative descriptions of models are linked to ontologies and reasoners System reasons about model constraints and points out errors and fixes User builds a pathway specification from library of models System guarantees correctness of pathway templates
Results: Scientific Workflow Template Executed as Workflow End result: Hazard Map (around USC area)
Conclusions and Future Work • Mixed-initiative approach to create workflows incorporates: • Knowledge representation and reasoning • Planning principles • Ongoing work to deploy ErrorScan as a grid service • Plan to integrate with automatic planning algorithm (Pegasus): • To complete workflow template upon user’s request • To provide more sophisticated suggestions • To handle resource assignment automatically • Instantiate workflow template with input data • Using mixed-initiative query planning system [Tuchinda et al IAAI 04] • Longer term: iterative closed-loop workflow creation and execution
http://www.isi.edu/ikcap/cat • http://www.isi.edu/ikcap/cat: On-line demonstration, access to portal, publications • Jihie Kim, Marc Spraragen, and Yolanda Gil, An Intelligent Assistant for Interactive Workflow Composition , Proceedings of the International Conference on Intelligent User Interfaces (IUI), 2004. • Jihie Kim and Yolanda Gil, Towards Interactive Composition of Semantic Web Services, AAAI Spring Symposium on Semantic Web Services, 2004. • Marc Spraragen. Mixed-Initiative Workflow Composition, AAAI student paper, 2004. • Jihie Kim and Yolanda Gil, Towards Interactive Composition of Semantic Web Services (Poster), 2nd International Semantic Web Conference (ISWC), 2003. • pegasus.isi.edu, www.isi.edu/ikcap/cognitive-grids • www.isi.edu/~gil, gil@isi.edu
Pegasus: Fully Automated Workflow Generation for Computational Grids (joint work with E. Deelman, J. Blythe, C. Kesselman, and GriPhyN participants) [Deelman et al JGC’03; Blythe et al IAAI’03; Blythe et al ICAPS’03; Gil et al IEEE IS’04] • Given: desired result and constraints • A desired result (high-level, metadata description) • A set of application components described in the Grid • A set of resources in the Grid (dynamic, distributed) • A set of constraints and preferences on solution quality • Find: an executable job workflow • A configuration of components that generates the desired result • A specification of resources where components can be executed and data can be stored • Approach: Use AI planning techniques to search the solution space and evaluate tradeoffs • Specified as initial state, goal state, components/steps • Exploit heuristics to direct the search for solutions and represent optimality and policy criteria