270 likes | 401 Views
CyberIntegrator: A Meta-Workflow System Designed for Solving Complex Scientific Problems using Heterogeneous Tools. Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign (UIUC)
E N D
CyberIntegrator: A Meta-Workflow System Designed for Solving Complex Scientific Problems using Heterogeneous Tools Peter Bajcsy, Rob Kooper, Luigi Marini, Barbara Minsker and Jim Myers National Center for Supercomputing Applications (NCSA) University of Illinois at Urbana-Champaign (UIUC) POC: Peter Bajcsy, email: pbajcsy@ncsa.uiuc.edu
Outline • Problem Formulation • Meta-Workflow Definitions • Past Work • Design • Workflow Requirements Driven by Environmental Observatories • Architecture of NCSA Meta-workflow Prototype Called CyberIntegrator • Implementation • Key Capabilities of CyberIntegrator • Use Cases • Environmental and Hydrological Engineering • Summary
Meta-Workflow Definition • Meta-workflow (MWF) definitions in the past: • (1) Workflow aspect: a workflow is an aggregation of tasks, a meta-workflow is an aggregation of workflows or a hierarchy of workflows • (2) Process management aspect:large activities have to be integrated, executed and evaluated in a process of conducting electronic commerce • Our meta-workflow definition includes multiple of its dimensions: • (1) hierarchical structure and organization of software, • combinatorial explosion of module connection • (2) heterogeneity of software tools and computational resources, • the number of different engines and software applications used by people for a reason • (3) usability of tool and workflow interfaces, • (4) community sharing of fragments and user friendly security, • (5) community knowledge and provenance, • (6) execution and built-in fault-tolerance, etc
Previous Work • Other efforts: • Business process workflow architectures - FlowMark, WSFL and BPEL: serving business community • Scientific workflow architectures - DAGMan, Taverna, SciFlo, Kepler, D2K, OGRE, CCA, Pegasus, GridFlow and Grid Ant, Triana and GSFL • Comparison: • Our work focuses on the simplicity of end user interactions with information technologies while utilizing all execution mechanisms transparently (workflow by example). • Our work creates provenance to recommendation pipelines for the benefit of a community (recommendations based on provenance information).
Research Topics • Data Translations: Semantic and syntactic mapping of data structures • Provenance Information: Granularity of gathered provenance information for recommendations, auditing and re-construction • HCI: User interface design issues and community dependencies • Meta-Data: Federation of distributed (data, tool, computational resource) registries • Execution: Just in time data delivery wrt. remote computing; Cost benefit analysis of data transfer vs. CPU requirements; Execution triggered by streaming data
Design Goals • Make scientific discoveries easier • Workflow by example (step-by-step experimentation) • Design friendly user interfaces • Build seamless access to heterogeneous data/tools/resources • Provide data and process provenance information • Recommend data, tools and computational resources • Derive higher level semantic tools
Meta-Workflow Features • Workflow by example • Support of heterogeneous executors • Workflows: GeoLearn, D2K, Kepler/Ptolemy • Applications: MS Excel, Im2Learn, ArcGIS • Web services: D2KWS • Provenance • Gathering & Meta-data repositories • Recommendations
Meta-Workflow R&D Drivers • Community drivers: • Environmental Science: CLEANER • Hydrological Science: CUAHSI • Science drivers: • Environmental Modeling of Nutrient Distribution • Monte Carlo simulations of maximum amount of pollution that a water body can receive each day and still retain its uses • Understanding the Dynamic Evolution of Land-Surface Variables in the Illinois River Basin • Data-driven analyses of multi-variable relationships from remote sensing data • Technology drivers: • Collaboratory Cyberenvironments
Summary • The problem of designing a highly interactive scientific meta-workflow system is very complex • Key capabilities of our meta-workflow prototype implementation called CyberIntegrator were demonstrated with two use cases. • We plan on building and deploying a practical tool for multiple communities. • Publications: • Image Spatial Data Analysis Group at NCSA: • URL: http://isda.ncsa.uiuc.edu • Questions: • Peter Bajcsy; Email: pbajcsy@ncsa.uiuc.edu
Terminology • Engines are stand-alone environments and applications that are used by many tools • Examples: Matlab, MS Excel, D2K, Im2Learn, ArcGIS, Kepler • Tools are solutions specific to a problem and consist of several algorithms • Examples: Image Calculator in Im2Learn, Pie chart visualization in MS Excel, … • Algorithms are code fragments that perform a specific operation in a tool • Examples: image addition operation in Image Calculator