10 likes | 147 Views
How the NCSA/LEAD Workflow Engine Manages Complex Workflows. Jay C. Alameda 1 , Albert L. Rossi 1 , Shawn D. Hampton 1 , Brian F. Jewett 2 , and Robert B. Wilhelmson 1,2. Univ. of Illinois: 1 Nat’l Center for Supercomputing Applications (NCSA) 2 Department of Atmospheric Sciences.
E N D
How the NCSA/LEADWorkflow Engine Manages Complex Workflows Jay C. Alameda1, Albert L. Rossi1, Shawn D. Hampton1, Brian F. Jewett2, and Robert B. Wilhelmson1,2 Univ. of Illinois: 1Nat’l Center for Supercomputing Applications (NCSA) 2Department of Atmospheric Sciences System Architecture Siege (I) After submitting a workflow, the PWE perspective allows drilling down to get individual node-state information. Includes Rich Client Platform front-end (SIEGE), workflow and Information services (PWE/VIZIER), resource-resident application container and scripting language (ELF/Ogrescript), message bus and relay agents. PWE Internals The Parametric Workflow Engine manages workflow state via a fixed number of threaded queues; state is persisted to a database for asynchronous handling. PWE receives status updates directly from ELF as well as through the monitoring or polling of the computational resources. Example of the Workflow Description XML (partial) submitted to PWE. These are stored locally on the user’s machine and can be edited through Siege. Siege (II) Information concerning users and resources can be conveniently configured through the Vizier perspective show below. ELF “Glide-In” PWE expands, configures and submits parametric nodes through the glide-in container mechanism, distributing parametric configurations via a tuple-space service. When this container begins to run, it retrieves these configurations and launches their corresponding members (e.g., WRF jobs). There are several event monitor views available in Siege; shown above is the one for events produced by a single node. Full details of event properties are visible through the familiar idiom of double-clicking an item line in the table. Workflow engine development and testing has been supported by LEAD, a NSF ITR (ATM-0331578 and others), and SCI03-30554, SCI04-38712, and SCI96-19019. Research efforts within the LEAD (Linked Environments for Atmospheric Discovery) program include workflow orchestration and fault tolerance for use with WRF; data mining; and on-demand and adaptive computing. For more info: < leadproject.org >