HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

This talk discusses the workflow scripting and stream management capabilities of HPSearch and NaradaBrokering. It covers the components of workflow systems, standards and programming models, and the features of NaradaBrokering.

HPSearch and NaradaBrokering: Workflow Scripting and Stream Management

  1. Edinburgh December 3 2003 HPSearch and NaradaBrokering: Workflow Scripting and Stream Management PTLIU Laboratory for Community Grids Geoffrey Fox, Harshawardhan S. Gadgil, Shrideep Pallickara Indiana University, Bloomington IN 47404 http://www.hpsearch.org http://www.naradabrokering.org gcf@indiana.edu

  2. Backdrop • Workflow systems have several components • Development Environment – Graphical User Interface • Specification Language such as BPEL4WS • Some interface between specification and runtime (compiler?) • Run-time managing linkage of services, error handling and notification • This project contributes to • Procedural specification of workflow • Stream management part of run-time • Workflow is sufficiently complex that we ought to agree on a general architecture so we can each build parts and link together

  3. Comments on Standards • In this talk, workflow is synonymous with “Programming the Grid/Internet” • We never agreed on a programming model for the simpler case of “Programming a CPU” so not very likely we will agree on standards for workflow • We did roughly agree on standards within a particular language Fortran v. Java v C++ v C# v Lisp • We also had a little more agreement on common run-time than on languages but not complete

  4. What to remember about this talk • All streams (flow between ports of a web service) are handled by publish-subscribe messaging infrastructure • Allows robust data transfer with adaptive routing e.g. allows use of GridFTP • Supports full concurrency inter and intra streams • Data, Streams, Files, Web Services are manipulated by a scripting language analogous to Shell and Perl in UNIX • http://www.hpsearch.org has the details • Software will be included in open-source release of http://wwww.naradabrokering.org • NB Version 0.93 today; 1.0 February 04; 2.0 for SC04 includes HPSearch

  5. Scripting Environment I • HPSearch is designed as a scripting interface to the Internet (Grid) using currently the Rhino implementation of Javascript • Could use Python, Perl • Called HPSearch because could access variables either by URI or by search interface (To Google Web Service) but this is not relevant here • x = ‘wsdl:, 6)‘; • Or x = ‘wsdl:WSDL for WS/function(arguments)‘; • Returns x=11 if function adds its arguments • Could follow by y=x+1 setting y=12 etc. • Can access any data in this fashion and support normal capabilities supported in most languages (set x=data and use I/O) • Perhaps prefer all I/O to go through Web Services

  6. WS1 WS3 WS4 WS6 WS2 WS5 Script1 Script2 Script3 Scripting Environment II • So scripting environment can manipulate its own variables and methods as usual but can also invoke any web service with the wsdl: primitive • xpath: primitive evaluates an XPath query against a local variable defined (say by return from a Web service) as an XML instance • Can have multiple communicating scripting engines This is scripting control; workflow is between web services

  7. NaradaBrokering Audio/Video Conferencing Client Computer Modem Server Peers NaradaBrokering Broker Network Minicomputer Firewall Laptop computer Workstation Peers Audio/Video Conferencing Client PDA Web Service B Queues Stream Web Service A

  8. Service Consumer SOAP+HTTPGridFTPRTP …. Messaging Substrate Any Protocol satisfying QoS Grid Messaging Substrate SOAP+HTTPGridFTPRTP …. Standard client-server style communication. Consumer Service Substrate mediated communication removes transport protocol dependence. Protocols have become overloaded e.g. MUST use UDP for A/V latency requirements but MUSTn’t use UDP as firewall will not support ………

  9. NaradaBrokering • Based on a network of cooperating broker nodes • Cluster based architecture allows system to scale in size • Originally designed to provide uniform software multicast to support real-time collaboration linked to publish-subscribe for asynchronous systems. • Now has several core functions • Reliable order-preserving “Optimized” Message transport (based on performance measurement) in heterogeneous multi-link fashion with TCP, UDP, SSL, HTTP, and will add GridFTP • General publish-subscribe including JMS & JXTA and support for RTP-based audio/video conferencing • Distributed XML event selection using XPATH metaphor • QoS, Security profiles for sent and received messages • Interface with reliable storage for persistent events

  10. Laudable Features of NaradaBrokering • Is open sourcehttp://www.naradabrokering.org • Has end-point “plug-in” as well as standalone brokers • Will have a discovery service to find nearest brokers and manage topics • Does tunnel through many firewalls without requiring ports to be opened • Supports JXTA, JMS (Java Message Service) and more powerful native mode • Transit time < 1 millisecond per broker • Initial version of setup and broker network administration module • Currently expect to use HPSearch scripts to specify setup

  11. NaradaBrokering Naturally Supports • Filtering of events to support different client requirements (e.g,. PDA versus desktop, slow lines, different A/V codecs) • Virtualization of addressing, routing, interfaces • Federation and Mediation of multiple instances of Grid services as illustrated by • Composition of Gridlets into full Grids (Gridlets are single computers in P2P case) • JXTA with peer-group forming a Gridlet • Monitoring of messages for Service management and general autonomic functions • Fault tolerant data transport • Virtual Private Grid with fine-grain Security model

  12. NaradaBrokering Communication • Applications interface to NaradaBrokering through UserChannels which NB constructs as a set of links between NB Brokers acting as “waystations” which may need to be dynamically instantiated • UserChannels have publish/subscribe semantics with XML topics • Links implement a single conventional “data” protocol. • Interface to add new transport protocols within the Framework • Administrative channel negotiates the best available communication protocol for each link • Different links can have different underlying transport implementations • Implementations in the current release include support for TCP,UDP, Multicast, SSL, RTP and HTTP. • GridFTP most interesting new protocol • Supports communication through proxies and firewalls such as iPlanet, Netscape, Apache, Microsoft ISA and Checkpoint.

  13. Manipulating Streams • flow: primitive manages streams between Web services • There is service-oriented workflow where streams are typically implicit. Here HPSearch supports UNIX style pipe and tee and we have trivial examples • For stream-oriented, the streams are explicit. We have built a sophisticated system GlobalMMCS but it is currently not supported in HPSearch • HPSearch will become control engine for NaradaBrokering when streams are “just” message flows on the Grid. Here one would use NB discovery services – find streams – and monitor • In this view a client talking to a Web Service is workflow

  14. y1 y2 z1 z2 x HPSearch Flow Example • // The input file         • x = "file:///u/hgadgil/datafile.txt"; •   // Reverses every line in the i/p e.g. abcd becomes dcba    • y1 = ""; • // Computes the length of each line minus the last (\n or \r) • y2 = ""; • // And finally the outputs...         • z1 = "file:///u/hgadgil/reversed.txt";         • z2 = "file:///u/hgadgil/length.txt";      • `flow: x &> (y1 | z1), (y2 | z2)`; NaradaBrokering Queue T Pipe Pipe

  15. q storage1 y2 z2 Another Example • `flow: x &> (y1|z1 &> p,(q|storage1)), (y2|z2|storage2)`; • Note this approach allows for example all workflow streams to use RMI, GridFTP, RTP – your or rather NaradaBrokering’s choice y1 z1 p x storage2 NaradaBrokering Topic (Queue)

  16. Stream–oriented Workflow • As in audio-video conferencing and multimedia file delivery where it’s the media streams that are the “point” • Services generate and transform streams but one thinks of streams going through services rather than services generating streams • Multi-cast streams where video from one client sent to all other participants in a collaborative session common • One thinks of a stream being published and participants subscribing to it. Subscribe Pub/Sub Queue Publish

  17. Session Server XGSP-based Control Media Servers Filters NaradaBrokering All Messaging Admire SIP H323 Access Grid Native XGSP XGSP Web Service MCU Architecture Use Multiple Media servers to scale to many codecs and many versions of audio/video mixing; should allow all e-Scientists to be connected WebServices NB Scales asdistributed High Performance (RTP)and XML/SOAP and .. Gateways convert to uniform XGSP Messaging NaradaBrokering

  18. Elastic Dislocation Inversion Viscoelastic FEM Viscoelastic Layered BEM Elastic Dislocation Pattern Recognizers Fault Model BEM Service-oriented Workflow I • As in follow of data between different simulation programs where one has a program (which becomes a Web service) view and data flow between programs often not explicitly interesting

  19. Service-oriented Workflow II • Initial input and output files identified with perhaps a visualization as output • In many implementations such as ours in earthquake example one writes and reads files for stream interface • Sometimes one wants the intermediate output files • AVS and such visualization and image processing systems have such a model using streams • Multicast not important per-se; use a publish/subscribe mechanism as it is fault-tolerant and higher performance and not because of multi-cast support

  20. Streams and Data • Scripting engine can either define topics or find them out from NaradaBrokering discovery service • Run-time ensures that all I/O goes through NaradaBrokering • Note one either uses a proxy or builds NaradaBrokering interface into Web service • Proxy should be near Web Service as only NaradaBrokering “guarantees” firewall penetration, fault-tolerance, performance • NaradaBrokering needs improved discovery system • NaradaBrokering and Scripts are distributed so no central bottlenecks

  21. NB BrokerNetwork Client WS NBEndpt NBEndpt NB Streams NB BrokerNetwork WS Client Proxy Proxy NBEndpt NBEndpt NB Streams NaradaBrokering in practice • One can “best” insert NaradaBrokering end-point interface into each client or web service • But proxy model easiest for existing applications “Native Communication” – cannot use added value of NBincluding fault tolerance. Current GridFTP Implementation

  22. Entities in HPSearch • Each Script is a Web Service • Each Web Service, File, Web Page has a URI and can be accessed by a Script • HPSearch at its heart was URI’s bound to Javascript • Publish/Subscribe system defines topics which are the URI of streams. Note syntax is often • topic://Session URI/stream1 with classic hierarchical labeling • Scripts need discovery system to keep track of URI’s and in particular the session URI (which plays role of context) -- currently this is same as NaradaBrokering Discovery System • Pub/Sub Streams typically support conversations with related streams topic://Session URI/stream1/WS-A and topic://Session URI/stream1/WS-B to allow Web services A and B to interact

  23. Publish/Subscribe Topics • One has “data” which has perhaps an intrinsic URI • For files and web pages, we have as well the location URL • I think Publish/Subscribe topic is like the URI for streams and it is instantiated as a particular queue (or set of queues) in NB • In NB Topics are integers (for performance), URI style or general XML instances • Note that session topic can be thought of as “context” for messages sent to topic as it provides intrinsic information as to meaning of stream (cf. OGSI; WS-Addressing WS-Context WS-Reliable Messaging and WS-Routing) • Topics for streams and sessions virtualize destination, routing and context

  24. Role of Pub/Sub Queues • One can think of N/B as providing an operating service to transmit streams between end-points with various value-added capabilities • Messages are the units of a stream • Events are messages with time-stamps (which could be absent); so events are messages and vice versa • Streams are ordered collections of messages • NB manipulates streams and collections of streams • Delivery is guaranteed order preserving • NB provides a virtual stream desktop which you can use to manipulate streams in same way you manipulate files in conventional O/S

  25. Multiple Input and Output Ports • We can deal with Web Services with multiple input and output using an array notation but the &> Tee and | Pipe notation get clumsy • So can use explicit notation such as • x.port[0].publish = NBTopicA; • y1.port[0].subscribe = NBTopicA; • y2.port[0].subscribe = NBTopicA; • This would also be natural way of implementing stream-oriented workflow • Errors and notifications would be easy in this syntax • notifyTOPIC = SessionTOPIC + ‘/notify’; • x.notify.publish = notifyTOPIC; • scriptasaWS.port[1].subscribe = notifyTOPIC;

  26. HPSearch Administrative Interface to NB • One can build administrative policies and procedures by flowing administrative and monitoring data to appropriate scripting engines • performanceTOPIC = SessionTOPIC + ‘/performance’; • nbws = NBDiscover(“aggregateperformancews”) • nbws.performancedata.publish = performanceTOPIC; • scriptasaWS.port[2].subscribe = performanceTOPIC; • Niftyperformanceanalyser(scriptasaWS.port[2]); • ……. • This example pipes performance data from NaradaBrokering and spawns some analysis • NB provides for each link (broker to broker, broker to end-point) available bandwidth, used bandwidth, latency etc.

  27. Other NB Features to be added to HPSearch • Full details of available Brokers and Stable storage • Pending queue sizes • Message statistics – size, number per second, time since since last message – at brokers and end-points • Current stream sequence number at different parts of pipeline from source to destination • Heartbeat Information • Active Topics and list of publishers and subscribers (subject to security restrictions) • Fault tolerance statistics including those subscribed end-points which are “down”

  28. Mean transit delay for message samples in NaradaBrokering: Different communication hops 9 hop-2 hop-3 8 hop-5 7 hop-7 6 5 Transit Delay (Milliseconds) 4 3 2 1 0 100 1000 Message Payload Size (Bytes) Pentium-3, 1GHz, 256 MB RAM 100 Mbps LAN JRE 1.3 Linux

  29. Average delays per packet for 50 video-clients NaradaBrokering Avg=2.23 ms, JMF Avg=3.08 ms 60 NaradaBrokering-RTP JMF-RTP 50 40 30 Delay (Milliseconds) 20 10 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Packet Number

  30. Average jitter (std. dev) for 50 video clients. NaradaBrokering Avg=0.95 ms, JMF Avg=1.10 ms 8 NaradaBrokering-RTP JMF-RTP 7 6 5 4 Jitter (Milliseconds) 3 2 1 0 0 200 400 600 800 1000 1200 1400 1600 1800 2000 Packet Number

