Craig A. Lee The Aerospace Corporation

Converged Web/Grid Servicesfor Content-based Networking, Event Notification, Resource Management and WorkflowShort Title: Event-Driven Workflows Craig A. Lee The Aerospace Corporation

Introduction • Goals • Automatically detect, ingest, and disseminate input data events • Automatically analyze the events and known data with minimal human-in-the-loop interaction • Automatically plan responses • Execute distributed workflows to enact the response • Focus: Event-Driven Workflows • aka Dynamic Workflows • Events delivered to decision-making elements that need to know • Decision makers plan responses as determined by policy • Responses executed as distributed workflows

Outline and Approach • Motivation • DDDAS – Dynamic Data-Driven Application Systems • Present Top-Level Concept and Scenario • Event Notification • Workflow Management • Event-Driven Workflows • Discuss Required Technologies • General Concepts • State of current implementations • Outstanding Issues

NSF OLD (serialized and static) NEW PARADIGM (Dynamic Data-Driven Simulation Systems) What is DDDAS Simulations (Math.Modeling Phenomenology Observation Modeling Design) Theory (First Principles) Simulations (Math.Modeling Phenomenology) Theory (First Principles) Experiment Measurements Field-Data User Experiment Measurements Field-Data User Dynamic Feedback & Control Loop Challenges: Application Simulations Development Algorithms Computing Systems Support Frederica Darema, NSF

NSF Examplesof Applications benefiting from the new paradigm • Engineering (Design and Control) • aircraft design, oil exploration, semiconductor mfg, structural eng • computing systems hardware and software design (performance engineering) • Crisis Management • transportation systems (planning, accident response) • weather, hurricanes/tornadoes, floods, fire propagation • Medical • customized surgery, radiation treatment, etc • BioMechanics /BioEngineering • Manufacturing/Business/Finance • Supply Chain (Production Planning and Control) • Financial Trading (Stock Mkt, Portfolio Analysis) DDDAS has the potential to revolutionize science, engineering, & management systems

NSF Fire Model • Sensible and latent heat fluxes from ground and canopy fire -> heat fluxes in the atmospheric model. • Fire’s heat fluxes are absorbed by air over a specified extinction depth. • 56% fuel mass -> H20 vapor • 3% of sensible heat used to dry ground fuel. • Ground heat flux used to dry and ignite the canopy. Kirk Complex Fire. U.S.F.S. photo Slide Courtesy of Cohen/NCAR

NSF Coupled atmospheric and wildfire models Slide Courtesy of Cohen/NCAR

Gas Phase Reactions NSF SiCl3H  HCl + SiCl2 SiCl2H2 SiCl2 + H2 SiCl2H2 HSiCl + HCl H2ClSiSiCl3 SiCl4 + SiH2 H2ClSiSiCl3 SiCl3H + HSiCl H2ClSiSiCl3 SiCl2H2 + SiCl2 Si2Cl5H  SiCl4 + HSiCl Si2Cl5H  SiCl3H + SiCl2 Si2Cl6 SiCl4 + SiCl2 Surface Reactions SiCl3H + 4s  Si(B) + sH + 3sCl SiCl2H2 + 4s  Si(B) + 2sH + 2sCl SiCl4 + 4s  Si(B) + 4sCl HSiCl + 2s  Si(B) + sH + sCl SiCl2 + 2s  Si(B) + 2sCl 2sCl + Si(B)  SiCl2 + 2s H2 + 2s  2sH 2sH  2s + H2 HCl + 2s  sH + sCl sH + sCl  2s + HCl AMAT Centura Chemical Vapor Deposition Reactor Operating Conditions Reactor Pressure 1 atm Inlet Gas Temperature 698 K Surface Temperature 1173 K Inlet Gas-Phase Velocity 46.6 cm/sec Slide Courtesy of McRae/MIT

A DDDAS Model(Dynamic, Data-Driven Application Systems) Discover, Ingest, Interact Models Discover, Ingest, Interact Computations Loads a behavior into the infrastructure sensors & actuators sensors & actuators sensors & actuators Cosmological: 10e-20 Hz. Humans: 3 Hz. Computational Infrastructure (grids, perhaps?) Subatomic: 10e+20 Hz. Spectrum of Physical Systems Craig Lee, IPDPS panel, 2003

Top-Level Concept • A Combined Event Notification and Workflow Management System • A highly flexible Event Notification system automatically delivers events of all manner to necessary recipients • A Workflow Management system schedules and coordinates all necessary actions in response to known events

Top-Level Concept Policy Decision Maker Communication Domain Sensed Events Decision Maker Decision Maker Abstract Plan discovery Response Resource Information Service Concrete Action register

Top-Level Concept Policy Content-Based Routing Domain Decision Maker Communication Domain Sensed Events Decision Maker Decision Maker Abstract Plan discovery Response Resource Information Service Concrete Action register

Top-Level Concept Policy Decision Maker Communication Domain Persistent Decision-making Computations Determined by Policy Sensed Events Decision Maker Decision Maker Abstract Plan discovery Response Resource Information Service Concrete Action register

Top-Level Concept Policy Decision Maker Communication Domain Sensed Events Decision Maker Decision Maker Abstract Plan discovery Response Resource Information Service Concrete Action register Grid Information Service

Top-Level Concept Policy Decision Maker Communication Domain Sensed Events Decision Maker Decision Maker Abstract Plan Dynamic Grid Workflow Management discovery Response Resource Information Service Concrete Action register

Required Technologies • Events delivered to decision-making elements that need to know • Event Notification Service Managed by Publish/Subscribe • Pre-defined Topics • Publication Advertisements • User-defined Attributes • Content-Based Routing • Distributed Hash Tables • Composible Name Spaces • Decision makers plan responses as determined by policy • Event analysis could require rule-based or other systems for deducing the “meaning” of sets of events • Planning requires “path construction” from current state to goal state • Semantic Analysis and Planning are Out of Scope for this briefing • Responses executed as distributed workflows • Workflow Engine independently manages • Scheduling of Data Transfer • Scheduling of Process Execution

What Must an Event Service Provide? • Standard event representation • What events look like • Extensible Metadata Schema • Delivery Properties • Security, Reliability • Registry and Discovery Services • Enable event producers and consumers to find each other • Registries directly support Service-Oriented Architectures • Direct Addressing • When producers/consumers are well-known to each other • Publish/Subscribe • When consumers need certain types of events rather than events from a particular producer

Managing Publish/Subscribe • Pre-defined Topics • Topic is a well-known, named channel carrying events of a pre-defined type • Producers must know on which channel to publish an event • Consumers must know which channel carries desired events • Well-known, named channel is similar to a multicast group • Channel creation and termination relatively infrequent • Publication Advertisements • Event Producer advertises that it produces events of type X • Event Consumers must discover Producers based on interesting event type • Consumers subscribe to interesting Producers by making direct connection • Consumers and Producers are explicitly known to each other • User-defined Attributes • User specifies desired events by specifying their attributes, or content • Attributes must be sufficiently specified to get what you want • Requires well-known attribute or meta-data schema • Producers and Consumers do not know each other • AKA, Content-Based Routing

Content-Based Routing • AKA, Message-Passing with Associative Addressing • Requires an associative matching operation • A fundamental and powerful capability • Enables a number of very useful capabilities and services (as we shall see) • But notoriously expensive to implement • How can matching be done efficiently in a wide-area grid environment??? • Can users and apps find a “sweet-spot” where content-based routing is constrained enough to be practical and provide capabilities that can’t be accomplished any other way? • Scale of Deployability

Uses for Content-Based Events • A general Grid Event Service • Resource Discovery • Fault Tolerance • Topology-Enhanced Communication • Distributed Simulations

General Architecture for Content-Based Event Notification System Peer-to-Peer Network Events are published to the P2P network which are then routed to subscribers Subscription “signals” propagate through the P2P Network

Example: Distributed Simulations • DMSO HLA • Defense Modeling and Simulation Office • High Level Architecture • Defines several services to enable federation of distributed simulations • Data Distribution Management • AKA Interest Management • Events are only distributed to those simulated entities that need the event, i.e., are interested in it • Can greatly reduce communication volume by not broadcasting all data to all hosts • Based on hyper-box intersection over some set of dimensions • Content-Based Routing -- Filtering

Interest Management Based on Hyperbox Intersection Model Events produced by U1 and consumed by S1 Subscription S1 Attribute Y Update Region U1 Subscription S2 Attribute X • Events defined over set of n attributes • Interest defined by set of n attribute subranges • These subranges form an n-dimensional hyperbox • Events are conveyed when hyperboxes intersect

Tank/Jet Fighter Engagement Color Key Got N Y N Want Y Red Tank Platoon B Red Tank Platoon A Blue Airstrike DARPA Active Networks Demo, Dec. 2000, Zabele, Braden, Murphy, Lee

Fundamental Design Issues • Two Major Issues (1) Local Matching Problem (2) Peer Network Design • Local Matching as a Database Query problem • Subscriptions are data • Events are queries on subscriptions • Many one- and multi-dimensional indexing schemes • Peer Network Design • Propagation of subscriptions, publication advertisements, and events throughout the network • Previous systems to study • CORBA Event Service • Java Event Service

Middleware Design Elements • Middleware separates • Application message-passing and logic • Topology construction, routing protocol management, and forwarding • Topology Construction • Build interconnected peer groups and peer network • Routing Protocols • Distributed resource name and peer associations across the peer network • Dynamic resource discovery • Forwarding Engine • Hop-by-hop request (or message) forwarding from source peer to destination peer through peer network over paths established by the routing protocols

CBR Implementation Approaches • Content-based routing attempts to deliver data using a publish/subscribe paradigm • Define data in different packet types • Apply different routing based on these packet types • Avoids complexity of traditional methods, while retaining power of publish/subscribe paradigm • No complex frameworks • Uses events • Two Major Implementation Approaches • Distributed Hash Tables (DHTs) • Composible Name Spaces

CBR Using DHTs • Routing and construction • d-dimension Cartesian coordinate space on a d-torus • This space is dynamically partitioned by nodes • Space holds (key,value) pairs hashed to points within it • Nodes are added by splitting existing zones, removed by joining zones • Pros • Scalable routing • Scalable indexing • Cons • Resulting overlay may or may not be observe important features of underlying physical network • Hashing Function in use must be pre-defined • No security considerations • Malicious nodes can act as malicious client, router AND server

Distributed Hashes • Distributed hashes redistribute nodes based on their IDs • Break physical and administrative locality • Resultant structure is dependent on logical ID assignment • Element addressed by logical ID • Independent from physical location Hashing Function

Current Work in DHT-based CBR • Pastry • Rowstron and Druschel • Rice University • Chord • Stoica, Morris, Karger, Kaashoek and Balakrishnan • MIT • Tapestry • Zhao, Kubiatowicz and Joseph • UC Berkeley

Pastry • Routing and construction • Nodes are assigned 128-bit nodeID indicating position in a circular space • Random assignment • Nodes maintain a routing table, neighborhood set and leaf set information • Node addition involves initializing node tables and announcing arrival • Node removal involves using neighborhood nodes to fill in missing routing information • Pros • Employs locality properties • Self-organizing • Cons • Security

Tapestry • Routing and construction • Based on Plaxton mesh using neighbor maps • Original Plaxton mesh is a small, static data structure enabling routing across an arbitrarily-sized network • Pros • Fault-tolerant through redundancy • Scalable • Cons • Security

Chord • Routing and construction • Keys are mapped to nodes using a distributed hash function • Ring organization • Queries are forwarded along the ring • Node additions and removals require successors to be notified and finger tables to be updated • Pros • Operates even with incorrect tables and missing nodes • Cons • All nodes must explicit know each other • Security

Chord Look-up Illustration n15 n0 DHT Index Space n10 n9 n6 data

Common Features Systems are overlays on existing networks (logical, location-independent organization) “Dog leg” paths possible Uses distributed hash tables in construction Tries to provide scalable wide-area infrastructure Differences Each system targets a slightly different set of optimizations to solve the general P2P problem Each system has slightly different strengths Comparison

CBR Based on Composible Name Spaces • Location Independent Name Space • Maps resource names to a set of equivalent peers • A peer system’s name space is unique to itself • Every resource in the peer system is uniquely named (globally unique w/in the peer system) • Two types of names: • Complete name: The name of a single resource • http://www.aero.org/CSRD/ActiveNets/wombats.html • Name space region: The name of a group of resources, indicated by a trailing wildcard • http://www.aero.org/CSRD/* • http://*.org/ • Name space could be as general as an XML DTD

FLAPPS:Forwarding Layer for Application-level Peer-to-Peer Services • Each peer sends, receives requests using service’s own resource name space • Multiple peer services operate on top of FLAPPS application sublayer • Framework approach to toolkit implementation • Individual peer’s needs determine deployed topology construction protocols, routing protocol, forwarding behaviors and directives

FLAPPS Design Elements (1/2) • Location Independent, Service-specific Name Space • Decomposable name space used to represent resources • A resource is an object or function offered by a remote peer • Name is a concatenation of name components • Name: n1 n2 n3 ... ni • Prefix name: n1 n2 n3 ... * • Service provides the name decomposition function • Peer Network and Topology Construction • Exploits overlay network systems to: • Local peers organize into peer groups • Interconnected peer groups create peer network • Variety in overlay network system allows service-specific peer network topology construction

FLAPPS Design Elements (2/2) • Routing Protocols • Establishes local peer reachability, forwarding path to remote peer resources • Reachability builds name to equivalent next-hop peer sets over time, dynamic resource discovery • Reachability updates customizable, provide data for forwarding behaviors • Forwarding • Hop-by-hop request, message relay from source peer through transit peers to remote peer • Next hop peer determined by longest prefix matching over decomposable name • Forwarding behavior: Next-hop peer selection function • Forwarding directive: sequentially applied forwarding behaviors

Managing a Wired and Ad Hoc Grid(in the field) with a FLAPPS Namespace Ad Hoc GRID Persistent GRID • Edge peers interface with persistent grid • Utilizes MDS to manage ad hoc configuration • Hoards ad hoc information based on activity • Understands interest-based routing weather.<lat,lon:lat,lon>[1] tracking.<lat,lon>[obj_id] sensor weather.<lat,lon:lat,lon>[12] actuator tracking.<lat,lon>[obj_id] weather.<lat,lon:lat,lon>[2] MDS • Bastion peer advertises aggregated resource names • Manages power-aware routing and forwarding • Understands ad hoc topology management

Grid Workflow Management • Dynamic organization of computing services • Applications typically built with task organization "hard-coded" • Workflow enables this to be decided "on-the-fly” • Independent scheduling of data transfer and process execution • Key Capability for all Workflow tools • Subsequent task may not exist when previous task completes • Where subsequent task is to execute may not even be decided • Output data may have to be buffered until it is needed/can be used • “Process programming” in a distributed environment

Workflow Design Considerations • Representation? • DAGs • XML • Creation • Eager vs. lazy binding of service to physical resources • Discovery • Eager vs. lazy binding of workflow to service • Data Persistence and Lifetime • How long does the data live where it is? • Workflow Engine – manages the workflow • Centralized? • Decentralized?

Survey of Grid Projects Involving Workflowhttp://www.extreme.indiana.edu/swf-survey

Workflow in GridRPC • GridRPC is a grid-enabled Remote Procedure Call • RPC is an established, widely used distributed programming tool • GridRPC API supports service discovery • Discovered service is represented by a function handle • Data transferred between client and service or between services is represented by a data handle • Function and data handles allow data transfer and service execution to be managed independently • This is a centralized approach • GridRPC under standardization at the Global Grid Forum • GGF the international standards body for grid computing • www.ggf.org, forge.gridforum.org/projects/gridrpc-wg

Data Handle Proposal • A data handle is a reference to data that may reside anywhere • Data and data handles may be created separately • Binding is the fundamental operation on DHs • DHs could be bound to existing data • DHs could be bound to where you want the data to be • Binding can be: • Explicit: user explicitly specifies bind operations and lifetime • Implicit: user specifies use modes where the run-time system manages the bind (GRAAL approach)

Operations on Data Handles, v2(General, operational semantics without using exact function signatures) • create() • Create a new DH and bind it to a specific machine. DHs are always created bound. DHs may be bound the local host or to a remote host. The data referenced by the DH is not valid after this call completes. • write() • Write data to the machine, referenced by the DH, that is maintaining storage for it. This storage does not necessarily have to be pre-allocated nor does the length have to be known in advance. If the DH is bound to the local host, then an actual data copy is not necessary. If the DH is bound to a remote host, then the data is copied to the remote host. The data referenced by the DH is valid after this call completes. • read() • Read the data referenced by the DH from whatever machine is maintaining the data. While reading remote data is implicitly making a copy of this data, this copy is not guaranteed to have any persistence properties or to be remotely accessible itself. Reading on an invalid DH is an error. • inspect() • Allow the user to determine if the data referenced by the DH is valid, what machine is referenced, the length of the data, and possibly its structure. Could be returned as XML. • delete_data() • Free the data (storage) referenced by the DH. • delete_handle() • Free just the DH.

Create and Bind DH Write to Data Handle Delete Data Delete Data Handle Read Data Generic Lifecycle of Data and a Data Handle Machine X Data is Invalid Data Data is Valid Data is again Invalid time

Simple RPC with Data on Client In Data on Client Out Data on Client Client Svc A create input data create input_DH bound to local host write input data to input_DH create output_DH bound to local host call( input_DH, output_DH ) read input_DH data sent execute service write output data on output_DH (neither input nor output data is subsequently available on this server) delete input data delete input_DH delete output data delete output_DH

call( input_DH, output_DH ) Send input data Simple RPC where the Input and Output Data Remain on the Server In Data on Svc A Out Data on Svc A Client Svc A create input data create input_DH and bind to Svc A create output_DH and bind to Svc A write input data to input_DH execute service Write data on output_DH return output_DH read output_DH data sent (input and output data still available on this server) delete input data delete input_DH delete output data delete output_DH (data no longer available)

Two Successive RPCs on the Same Server In Data on Client Out2 Data on Client Out1 Data on Svc A Client Svc A create input data create input_DH and bind to localhost create output1_DH and bind to Svc A call( input_DH, output1_DH ) read input_DH data sent execute service Write data on output1_DH return output_DH (output data still available on this server) create output2_DH and bind to local host call( outpu1t_DH, output2_DH) execute service write data on output2_DH delete all data delete all data handles

Craig A. Lee The Aerospace Corporation