Web Service based Community Grids for Research and Education

Web Service based Community Grids for Research and Education Fermilab August 17 2004 Geoffrey Fox Community Grids Lab Indiana University gcf@indiana.edu

Once I was on E110, E260, E350 ………. 0X 0X 0X0 0X0 E350 E260 -t 200 GeV hp “infinite” Progress in Jets but not so much in understanding low pT hadron-hadron collisions

Today’s Talk • Most of my work is in “e-Science” or perhaps “e-Lotsofotherthings” for which “Grid Technology” is an appropriate software model • I spend my time either developing “core Grid Software” or applying to various applications • For example, I have worked in e-learning since 1997 and taught most of my recent classes since then over the Internet • My experience is that “sustainability” (life-cycle costs) is most important characteristic of software so I tend to focus on broad based software with broad based applications • I will start with a few general remarks on Grid software and then discuss my work on technology and applications

Philosophy of Web Service Grids • Much of Distributed Computing was built by natural extensions of computing models developed for sequential machines • This leads to the distributed object (DO) model represented by Java and CORBA • RPC (Remote Procedure Call) or RMI (Remote Method Invocation) for Java • Key people think this is not a good idea as it scales badly and ties distributed entities together too tightly • Distributed Objects Replaced by Services • Note CORBA was considered too complicated in both organization and proposed infrastructure • and Java was considered as “tightly coupled to Sun” • So there were other reasons to discard • Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages

Web services • Web Services build loosely-coupled, distributed applications, based on the SOA principles. • Web Services interact by exchanging messages in SOAP format • The contracts for the message exchanges that implement those interactions are described via WSDL interfaces.

Importance of SOAP • SOAP defines a very obvious message structure with a header and a body • The header contains information used by the “Internet operating system” • Destination, Source, Routing, Context, Sequence Number … • The message body is only used by the application and will never be looked at by “operating system” except to encrypt, compress it etc. • Much discussion in field revolves around what is in header! • e.g. WSRF adds a lot to header

Web Services • Java is very powerful partly due to its many “frameworks” that generalize libraries e.g. • Java Media Framework • Java Database Connectivity JDBC • Web Services have a correspondingly collections of specifications that represent critical features of the distributed operating systems for “Grids of Simple Services” • Some 60 active WS-* specifications for areas such as • a. Core Infrastructure Specifications • b. Service Discovery • c. Security • d. Messaging • e. Notification • f. Workflow and Coordination • g. Characteristics • h. Metadata and State

WS-I Interoperability • Critical underpinning of Grids and Web Services is the gradually growing set of specifications in the Web Service Interoperability Profiles • Web Services Interoperability (WS-I) Interoperability Profile 1.0a." http://www.ws-i.org. gives us XSD, WSDL1.1, SOAP1.1, UDDI in basic profile and parts of WS-Security in their first security profile. • We imagine the “60 Specifications” being checked out and evolved in the cauldron of the real world and occasionally best practice identifies a new specification to be added to WS-I which gradually increases in scope

Web Services Grids and WS-I+ • WS-I Interoperability doesn’t cover all the capabilities need to support Grids • WS-I+ is designed to minimal extension of WS-I to support “most current” Grids: it adds support for • Enhanced SOAP Addressing (WS-Addressing) • Fault tolerant (reliable) messaging • Workflow as in IBM-Microsoft standard BPEL • Security and Notification best practice and support will probably get added soon • There are Web Service frameworks here but various IBM v Microsoft v Globus differences to be resolved • Portlet-based User Interfaces could be added • UK OMII Open Middleware Infrastructure Institute is adopting this approach to support UK e-Science program • Currently UK e-Science largely either uses GT2 (as in EDG) or Simple Web Services for “database Grids” • http://www.omii.ac.uk/

Application Specific Grids Generally Useful Services and Grids Workflow WSFL/BPEL Service Management (“Context etc.”) Service Discovery (UDDI) / Information Service Internet Transport  Protocol Service Interfaces WSDL Higher Level Services ServiceContext ServiceInternet Base Hosting Environment Protocol HTTP FTP DNS … Presentation XDR … Session SSH … Transport TCP UDP … Network IP … Data Link / Physical Bit level Internet Layered Architecture for Web Services and Grids

Working up from the Bottom • We have the classic (CISCO) Internet routing the flood of ordinary packets • Web Services build the “Service Internet” with • Fault Tolerance (WS-RM not TCP) • Security (WS-Security not IPSec/SSL etc.) • Information Services (UDDI/WS-Context not DNS/Configuration files) • At message/web service level and not packet/IP address level • Software-based Service Internet useful as computers “fast” • Familiar from Peer-to-peer networks and built as a software overlay network defining Grid (analogy is VPN becomes

R1 R2 Enterprise Grid Dynamic light-weight Peer-to-peer Collaboration Training Grid Students Information Grid Compute Grid Campus Grid Teacher 4 Overlay Networks With a 5th superimposed

What do we do at CGL? • 1) Built the Service Internet supporting Web Services and a variety of types of Grids • 2) Build the overlay networks to build and compose (federate) Grids http://www.naradabrokering.org • Build application Grids – especially in Earthquake Science http://www.servogrid.org • Building portlets as user interface components http://www.ogce.org • Building technology for high performance streams linking Web and Grid Services • Build community Grids with tools to support interactions • http://www.undergroundfilm.org for filmmakers • http://www.globalmmcs.org for collaboration • http://www.anabas.com for distance education • Building component Grid for Geographical Information Systems • Working on Grids for Sports Training

Consequences of Rule of the Millisecond • Useful to remember critical time scales • 1) 0.000001 ms – CPU does a calculation • 2) 0.001 to 0.01 ms – MPI latency • 3) 1 to 10 ms – wake-up a thread or process • 4) 10 to 1000 ms – Internet delay • 4) implies geographically distributed metacomputing can’t in general compete with parallel systems (OK for some cases) • 3) << 4) implies RPC not a critical programming abstraction as it ties distributed entities together and gains a time that is typically only 1% of inevitable network delay • However many service interactions are at their heart RPC but implemented differently at times e.g. asynchronously • 2) says MPI is not relevant for a distributed environment as low latency cannot be exploited • Even more serious than using RMI/RPC, current Object paradigms also lead to mixed up services with unclear boundaries and autonomy • Web Services are only interesting model for services today

Closely coupled Java/Python … Coarse Grain Service Model Service B Service A Module B Module A Messages Service B Service A 0.1 to 1000 millisecond latency Method Calls.001 to 1 millisecond Linking Modules • From method based to RPC to message based to event-based “Listener”Subscribe to Events Publisher Post Events Message Queue in the Sky

What is a Simple Service? • Take any system – it has multiple functionalities • We can implement each functionality as an independent distributed service • Or we can bundle multiple functionalities in a single service • Whether functionality is an independent service or one of many method calls into a “glob of software”, we can always make them as Web services by converting interface to WSDL • Simple services are gotten by taking functionalities and making as small as possible subject to “rule of millisecond” • Distributed services incur messaging overhead of one (local) to 100’s (far apart) of milliseconds to use message rather than method call • Use scripting or compiled integration of functionalities ONLY when require <1 millisecond interaction latency • Apache web site has many projects that are multiple functionalities presented as (Java) globs and NOT (Java) Simple Services • Makes it hard to integrate sharing common security, user profile, file access .. services

What is a Grid I? • You won’t find a clear description of what is Grid and how does differ from a collection of Web Services • I see no essential reason that Grid Services have different requirements than Web Services • Geoffrey Fox, David Walker, e-Science Gap Analysis, June 30 2003. Report UKeS-2003-01, http://www.nesc.ac.uk/technical_papers/UKeS-2003-01/index.html. • Notice “service-building model” is like programming language – very personal! • Grids were once defined as “Internet Scale Distributed Computing” but this isn’t good as Grids depend as much if not more on data as well as simulations • So Grids can be termed “Internet Scale Distributed Simple Services” and represent a way of collecting services together in same way that program (package) collects methods and objects together.

What is a Grid II? • So we build collections of Web Services which we package as component Grids • Visualization Grid • Sensor Grid • Utility Computing Grid • Person (Community) Grid • Physics Analysis Grid • Control Room Grid • We build bigger Grids by composing component Grids using the Service Internet

CPUs Clusters Compute Resource Grids Overlay and Compose Grids of Grids MPPs Methods Services Functional Grids Federated Databases Databases Data Resource Grids Sensor Sensor Nets Grids of Grids of Simple Services • Link via methods  messages  streams • Services and Grids are linked by messages • Internally to service, functionalities are linked by methods • A simple service is the smallest Grid • We are familiar with method-linked hierarchyLines of Code  Methods  Objects  Programs  Packages

Gas CIGrid Flood CIGrid … … Gas Servicesand Filters Flood Servicesand Filters Electricity CIGrid Portals Collaboration Grid Visualization Grid Sensor Grid GIS Grid Compute Grid Data Access/Storage Registry Metadata Core Grid Services Physical Network Security Notification Workflow Messaging Critical Infrastructure (CI) Grids built as Grids of Grids

Education Grids • Education Grids can be considered from at least two points of view • 1) Exploiting e-Science and other relevant research government or business grids whose resources can be adapted for use in education • Opportunity to make education more “real” and to give students an idea what scientific research is like • 2) Support the virtual organization that is the teacher and learner community • Actually this community is heterogeneous with teachers, learners, parents, employers, publishers, informal education, university staff …. • Build the Education Grid as a Grid of Grids

Typical Science GridService such as Research Database or simulation Science Grids Bioinformatics Particle Physics Earth Science ……. Campus orEnterprise Administrative Grid Transformed by Grid Filterto form suitable for education Education Grid Publisher Grid Learning Management or LMS Grid Student/Parent … Community Grid Digital Library Grid Informal Education (Museum) Grid Inservice Teachers Preservice Teachers School of Education Teacher Educator Grids Education as a Grid of Grids

Education Grid View of e-Science Resource e-Science Resource Filter Education Grid of Grids • Services in an Education Grid fall into three classes • 1) Those that special to Education such as quiz (as in IMS), learning plan or grading services • 2) Those that are important but can be taken from other Grids such as collaboration and security • 3) Those that come from other Grids and are refactored for education • The simulation is reduced in size • The bioinformatics database interface is simplified Education Grid

Field Trip Data ? GISGrid Discovery Services RepositoriesFederated Databases Streaming Data Sensors Database Database Sensor Grid Database Grid Research Education SERVOGrid Compute Grid Customization Services From Researchto Education Data FilterServices ResearchSimulations Analysis and VisualizationPortal EducationGrid Computer Farm Geoscience Research and Education Grids

What to do? • Develop a planning grid of interested parties • Grow a teacher and teacher education grid • This would largely be a community/collaboration Grid • Develop prototypes such as Quarknet separating science and teaching side into separate grids • Develop interface/transformational material • Note we do not try to make a single seamless grid but rather multiple federated grids • Use BitTorrent not GridFTP (or rather transform between them) • Supply education compute resources on demand • Make a deal with Google for free searches • Develop the online instruments, databases, web pages, physics-based games, simulations that are science grids with educational transforms • Videos and MP3’s of Scientists in action • Develop collaborative whiteboards/ video/ imagery/ chats/ white-papers/ experts-on-demand that form a community grid • Instant Messenger, audio/video conferencing • Content annotation critical • Develop a hub linking multiple education-transformed science grids together

Undergroundfilm.org is/will be a community grid for educational film makers (run by Community Grids Laboratory) Has viewer evaluation of content Will offer services such as transforming formats Digital object archives for animation etc.

http://www.yafro.com supports digital camera images (as on modern cell phones) and builds community around discussion of this

e-Sports and e-Textilemanufacturing • These are hopefully being set up as collaborations between • Indiana University and Beijing Sport University • Basket ball coaches (teacher) interact with aspiring NBA players in China • Martial Arts masters in China train neophytes in Indiana • Faculty recreational sports adviser works from university with faculty exercising at home • Clothes designers in USA and manufacturers in Hong Kong • Each supported by collaborative annotatable multimedia tools for images and real-time video streams • Test in Undergroundfilm.org site • Analogous tools for discussing any streaming data (such as from physics and environmental science

NaradaBrokering can Support • Service level Internet with Software Overlay (ad-hoc) Network • Virtualize inter-service communication • Federate different Grids Together • Scalable pervasive audio-video conferencing – “Video over IP” • General collaborative Applications and Web services • Build next generation clients interacting with messages not method-based user interrupts (Message-based MVC) • Unify peer-to-peer networks and Grids • Handle streams as in “media or sensor Grids” • Handle events as in WS-Notification • Agent protocols as XML Messaging (Semantic Web)

NaradaBrokering Audio/Video Conferencing Client Computer Modem Server Peers NaradaBrokering Broker Network Minicomputer Firewall Laptop computer Workstation Peers Audio/Video Conferencing Client PDA Web Service B Queues Stream Server-enhanced Messaging NB supports messages and streams

Current NaradaBrokering Features

NaradaBrokering Service Integration S1 P2 S2 P1 S2 S1 S? Any Transport Service P? Proxy NB Transport Standard SOAP Transport S1 S2 Proxy Messaging Handler Messaging Notification Internal to Service: SOAP Handlers/Extensions/Plug-ins Java (JAX-RPC) .NET Indigo and special cases: PDA's gSOAP, Axis C++

M(n) M(n+1) Service B Service A Mechanisms for Reliable Messaging I • There are essentially sequence numbers on each message • Unreliable transmission detected by non-arrival of a message with a particular sequence number • Remember this is “just some TCP reliability” built at application level • One can either use ACK’s – Receiver (service B) positively acknowledges messages when received • Service A fully responsible for reliability • Or NAK’s – Service B is partially responsible and tracks message numbers – sends a NAK if sequence number missing

Mechanisms for Reliable Messaging II • Each message has a retransmission time; messages are retransmitted if ACK’s not received in time • Uses some increasing time delay if retransmit fails • Note need to be informed (eventually) that OK to throw away messages at sender; pure NAK insufficient • Note this is reliability from final end-point to beginning end-point: TCP reliability is for each link and has different grain size and less flexible reliability mechanisms • There are several efficiency issues • Divide messages into groups and sequence within groups • Do not ACK each message but rather sequences of messages • NAK based system attractive if high latency (some mobile devices) on messaging from receiver back to sender

Custom Message Reliability Filter 2 NaradaBroker Filter 1 2 second PDA reply latency! Different endpoints may well need different reliability schemes. Another reason to use application layer. NaradaBrokering offers universal support Wireless Optimized WS-RM WS-RM WS-Reliability

Virtualizing Communication • Communication specified in terms of user goal and Quality of Service – not in choice of port number and protocol • Protocols have become overloaded e.g. MUST use UDP for A/V latency requirements but CAN’t use UDP as firewall will not support ……… • A given communication can involve multiple transport protocols and multiple destinations – the latter possibly determined dynamically NB Brokers FastLink FirewallHTTP B1 SatelliteUDP A Hand-HeldProtocol B2 Software Multicast Dial-upFilter NB Broker B3 Client Filtering

Performance Monitoring • Every broker incorporates a Monitoring service that monitors links originating from the node. • Every link measures and exposes a set of metrics • Average delays, jitters, loss rates, throughput. • Individual links can disable measurements for individual or the entire set of metrics. • Measurement intervals can also be varied • Monitoring Service, returns measured metrics to Performance Aggregator.

NaradaBrokering and Fault Tolerance GridFTP plus NaradaBrokering • As well as reliable messaging, NaradaBrokering supports performance based dynamic routing • Choose both route and protocol (UDP, Parallel TCP ..) • It will also support automatic fail-over among replicated services subscribing to same message stream • Provides scriptable control of streams for custom management schemes • Saves ALL messages in faulttolerant storage for eithersession replay or recovery • Will support reliable BitTorrentP2P file swapping model (better than GridFTP?)

Pure SOAP SOAP over UDP Binary over UDP Mirror Mirror on the wallWho is the fastest most reliable of them all?Web Services!!! • Application layer “Internet” allows one to optimize message streams and the cost of “startup time”, Web Services can deliver the fastest possible interconnections with or without reliable messaging • Typical results from Grossman (UIC) comparing Slow SOAP over TCP with binary and UDP transport (latter gains a factor of 1000) 7020 5.60

SOAP Tortoise and UDP Hare II • Mechanism only works for streams – sets of related messages • SOAP header in streams is constant except for sequence number (Message ID), time-stamp .. • So negotiate stream in Tortoise SOAP – ASCII XML over HTTP and TCP – • Deposit basic SOAP header through connection • Agree on firewall penetration, reliability mechanism, binary representation and fast transport protocol • Typically transport UDP plus WS-RM • Fast transport (On a different port) with messages just having “FastMessagingContextToken”, Sequence Number, Time stamp if needed • RTP packets have essentially this • Could add stream termination status • Can monitor and control with original negotiation stream • Can generate different streams optimized for different end-points

NaradaBrokering and Common Service Information and Metadata • WS-RF and WS-GAF approach state with different approaches to contextualization – supplying a common “context” (Shared token or more generally (resource) metadata) • NaradaBrokering supports such a common context either as pool of messages or as message-based access to a “database” (Context Service) • Notification Service simplest such information tool

Web Service Notification I • WS-EventingWeb Services Eventing(BEA, Microsoft, TIBCO) January 2004http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnglobspec/html/WS-Eventing.asp • WS-NotificationFramework for Web Services Notification withWS-Topics, WS-BaseNotification, andWS-BrokeredNotification(OASIS) OASIS Web Services Notification TC Set up March 2004http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=wsnand http://www-106.ibm.com/developerworks/library/specification/ws-notification/ • JMSJava Message Service V1.1March 2002http://java.sun.com/products/jms/docs.html

Broker Subscribe Publish Queues Messages Supports creation and subscription of topics Service B Service B Service A Service A Notification Architecture Publish • Point-to-Point • Or Brokered • Note that MOM (Message Oriented Middleware) uses brokered messaging for ALL transmission and not just “special” notification messages Subscribe

Classic Publish-Subscribe

Web Service Notification II • WS-Eventing is quite similar to WS-BaseNotification and provides service to service notification • WS-Notification is similar to CORBA event service and adds brokers to mediate notification which has several advantages • Don’t need queues and lists of subscribers on each service • Solution scales to any number of publishers/subscribers • JMS well known successful non Web Service brokered notification system • Topics defined in WS-Topics can also provide contextualization • Expect this area to clarify reasonably soon

Mean transit delay for message samples in NaradaBrokering: Different communication hops 9 hop-2 hop-3 8 hop-5 7 hop-7 6 5 Transit Delay (Milliseconds) 4 3 2 1 0 100 1000 Message Payload Size (Bytes) Pentium-3, 1GHz, 256 MB RAM 100 Mbps LAN JRE 1.3 Linux

NaradaBrokering and NTP • NaradaBrokering includes an implementation of the Network Time Protocol (NTP) • All entities within the system use NTP to communicate with atomic time servers maintained by organizations like NIST and USNO to compute offsets • Offset is the computed difference between global time and the local time. • The offset is computed based on the time returned from multiple atomic time servers. • The NTP algorithms weighs results from individual time clocks based on the distance of the atomic server from the entity. • This ensures that all entities are within 1 millisecond of each other. • The timestamps account for clock drifts that take place on machines • Time returned is based on software clocks which can slow down with increased computing load on the machine.

Web Service based Community Grids for Research and Education