600 likes | 755 Views
Grids for Real-time and Streaming Applications. PPAM 2005 – 6 th International Conference on Parallel Processing and Applied Mathematics Poznan Poland September 11-14 2005 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories
E N D
Grids for Real-time and Streaming Applications PPAM 2005 – 6th International Conference on Parallel Processing and Applied Mathematics Poznan Poland September 11-14 2005 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 http://grids.ucs.indiana.edu/ptliupages/presentations/PPAMPoznanSep12-05.ppt gcf@indiana.eduhttp://www.infomall.org
What to Remember • Grids are Services exchanging Messages • Developing messaging paradigm for Grids using Message Oriented Middleware or Software Overlay Network • Just as MPI is messaging/programming model for parallel computing • Web Service container replaces computer • Service replaces process • A stream is an ordered set of messages • NaradaBrokering replaces MPI with different applications and different requirements • Service Internet replaces Internet: messages replace packets • (Sub)Grids replace Libraries
Four Application Areas • Data Assimilation applied to link the data deluge (satellites, sensors, seismometers) in real time to large scale parallel simulations • Department of Defense (and Homeland Security) have built the Global Information Grid with a target architecture NCOW (Network Centric Operations and warfare) • They submit no jobs; rather stream data to brokers from they are filtered and distributed • Includes their rather dated distributed simulation HLA • Audio-Video Conferencing implemented with services and Grid messaging • Hand-held Grid linking PDA/cell-phones to Grids
NaradaBrokering Queues Stream NB supports messages and streams NB role for Grid is Similar to MPI role for MPP Brokers are likeRouters orNetwork Handlers
Typical use of Grid Messaging Filter or Datamining Sensor Grid Post afterProcessing Post beforeProcessing Web Feature Service NaradaBrokering Notify WFS (GIS data) Database Archives Subscribe HPSearch Manages GIS Grid WS-Context Stores dynamic data GeographicalInformation System
Mean transit delay for message samples in NaradaBrokering: Different communication hops 9 hop-2 hop-3 8 hop-5 7 hop-7 6 5 Transit Delay (Milliseconds) 4 3 2 1 0 100 1000 Pentium-3, 1GHz, 256 MB RAM 100 Mbps LAN JRE 1.3 Linux Message Payload Size (Bytes)
Consequences of Rule of the Millisecond • Useful to remember critical time scales • 1) 0.000001 ms – CPU does a calculation • 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency • 2b) 0.001 to 0.01 ms – Overhead of a Method Call • 3) 1 ms – wake-up a thread or process (do simple things on a PC) • 4) 10 to 1000 ms – Internet delay • 2a), 4) implies geographically distributed metacomputing can’t in general compete with parallel systems • 3) << 4) implies a software overlay network is possible without significant overhead • We need to explain why it adds value of course! • 2b) versus 3) and 4) describes regions where method and message based programming paradigms important
Field Trip Data Database ? GISGrid Discovery Services RepositoriesFederated Databases Streaming Data Sensors Database Sensor Grid Database Grid Research Education SERVOGrid Compute Grid Customization Services From Researchto Education Data FilterServices ResearchSimulations Analysis and VisualizationPortal EducationGrid Computer Farm Grid of Grids: Research Grid and Education Grid
Databases and/or Sensors Data Data Filter Filter Filter Data Filter Data OGSA-DAIGrid Services AnalysisControl Visualize Grid Data Filter This Type of Grid integrates with Parallel computing Multiple HPC facilities but only use one at a time Many simultaneous data sources and sinks HPC Simulation Grid Data Assimilation Other Gridand Web Services Distributed Filters massage data For simulation SERVOGrid (Complexity) Computing Model
GIS and Sensor Grids • OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors • GML Geography Markup language defines specification of geo-referenced data • SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors • Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information • Grid workflow links services that are designed to support streaming input and output messages • We are building Grid (Web) service implementations of these specifications for NASA’s SERVOGrid
WMS uses WFS that uses data sources <gml:featureMember> <fault> <name> Northridge2 </name> <segment> Northridge2 </segment> <author> Wald D. J.</author> <gml:lineStringProperty> <gml:LineStringsrsName="null"> <gml:coordinates> -118.72,34.243 -118.591,34.176 </gml:coordinates> </gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember>
Example of Data Mining and GIS Grid Data Mining Grid Databases with NASA, USGS features SERVOGrid Faults NASA WMS WFS3 WFS1 WFS2 WMS handling Client requests UDDI SOAP HTTP WMS Client WMS Client
Filter PI Data Mining Filter WS-Context WFS3 GIS Grid Databases with NASA,USGS features SERVOGrid Faults Data Mining Grid from Grid of Grids WFS4 SOAP Pipeline UDDI HPSearch“Workflow” Traditional Execution Grid NaradaBrokering System Services
Hot spots calculations--areas of increased earthquake probability in the forecast time-- calculations are re-plotted on the map as features.
Raw to GML via NaradaBrokering • The Scripps Orbit and Permanent Array Center (SOPAC) GPS station network data published in RYO format is converted to ASCII and GML
Typical use of Grid Messaging in NASA Sensor Grid GIS Grid Grid Eventing Datamining Grid
Google Map Client Archived Real Time Databases withSERVOGrid Faults Sensor Grid Google Central HTTP WFS2 WFS1 Google Map Client Helper Services SOAP DoD and Homeland Security can in a crisis combine custom geo-referenced data with that available from hundreds of thousands of computers from Microsoft, Yahoo and Google Just build simple services using Interoperability standards! UDDI
Real Time GPS and Google Maps Subscribe to live GPS station. Position data from SOPAC is combined with Google map clients. Select and zoom to GPS station location, click icons for more information.
Integrating Archived Web Feature Services and Google Maps Google maps can be integrated with Web Feature Service Archives to filter and browse seismic records.
Grid Principles Needed I • Data deluge and Data Assimilation • Web Services used for all capabilities to achieve interoperability and sustainability • High performance Service containers and handlers • Service Architectures: OGSA (GGF) or NCOW (DoD) • Grids composed hierarchically in Grid of Grids approach to Grid libraries • Gateways linking Grids to “legacy” systems or to other Grids • Sessions which are the dynamic grouping of (10-1000) services involved in solving a problem to be distinguished from the huge Grid-world over which information is slowly varying • Registries and metadata Services • Need to optimized for both Grid-world (worldwide scaling) and Sessions (update times of a few milliseconds)
Grid Principles Needed II • Interoperability through protocols and interfaces • A major reason we are doing this and unlike MPI • Difference between semantics and representation • and consequence for interoperability • Law of the Millisecond • Use Grid messaging if latencies are inevitably > 1ms • Distributed management of Streams (messages) for performance and QoS • Must not centralize streams or their management • Workflow of Services and Composition of Streams • Services and Messages are both “first class” entities • Our workflow challenges simple compared to other projects
WFS OGSA-DAI etc. • The Web Feature Service WFS from OGC (Open Geospatial Consortium) is a “domain specific database” holding data or meta-data • It provides a GML (Geography Markup Language) interface to a MySQL database • It filters GML store and GML query requests into SQL • XML databases are currently much slower than this strategy • Example of Semantics (XML) versus representation (SQL) difference • OGSA-DAI offers Grid interface to databases – we could use but don’t as we only need to expose WFS and not MySQL to Grid
Role of WS-Context • There are many WS-* specifications addressing meta-data and both many approaches and many trade-offs • We hear about Distributed Hash Tables (Chord) to achieve scalability in large scale networks • Managed dynamic workflows as in sensor integration and collaboration require • Fault-tolerance and ability to support dynamic changes with few millisecond delay • But only a modest number of involved services (up to 1000’s in a session) • Need Session NOT Service/Resource meta-data so don’t use WS-RF • We are building a WS-Context compliant metadata catalog supporting distributed or central paradigms • Use for OGC Web catalog service with UDDI for slowly varying meta-data • 3 XML Databases: UDDI WS-Context WFS stored as SQL
Controlling Streaming Data • NaradaBrokering capabilities can be created by messages (as in WS-*) and by a scripting interface that allows topics to be created and linked to external services • Firewall traversal algorithms and network link performance data can be accessed • HPSearch offers this via JavaScript • This scripting engine provides a simple workflow environment that is useful for setting up Sensor Grids • Should be made compatible with Web Service workflow (BPEL) and streaming workflow models Triana and Kepler • Using WS-Management as interaction protocol
H1 H2 H3 H4 Body Service F1 F2 F3 F4 Container Workflow Container Handlers SOAP Message Structure I • SOAP Message consists of headers and a body • Headers could be for Addressing, WSRM, Security, Eventing etc. • Headers are processed by handlers or filters controlled by container as message enters or leaves a service • Body processed by Service itself • The header processing defines the “Web Service Distributed Operating System” • Containers queue messages; control processing of headers and offer convenient (for particular languages) service interfaces • Handlers are really the core Operating system services as they receive and give back messages like services; they just process and perhaps modify different elements of SOAP Message – WS standards specify handler structure
Application Specific Grids Generally Useful Services and Grids Workflow WSFL/BPEL Service Management (“Context etc.”) Service Discovery (UDDI) / Information Service Internet Transport Protocol Service Interfaces WSDL Higher Level Services ServiceContext ServiceInternet Base Hosting Environment Protocol HTTP FTP DNS … Presentation XDR … Session SSH … Transport TCP UDP … Network IP … Data Link / Physical Bit level Internet (OSI Stack) Layered Architecture for Web Services and Grids
WS-* implies the Service Internet • We have the classic (CISCO, Juniper ….) Internet routing the flood of ordinary packets in OSI stack architecture • Web Services build the “Service Internet” or IOI (Internet on Internet) with • Routing via WS-Addressing not IP header • Fault Tolerance (WS-RM not TCP) • Security (WS-Security/SecureConversation not IPSec/SSL) • Data Transmission by WS-Transfer not HTTP • Information Services (UDDI/WS-Context not DNS/Configuration files) • At message/web service level and not packet/IP address level • Software-based Service Internet possible as computers “fast” • Familiar from Peer-to-peer networks and built as a software overlay network defining Grid (analogy is VPN) • SOAP Header contains all information needed for the “Service Internet” (Grid Operating System) with SOAP Body containing information for Grid application service
Merging the OSI Levels • All messages pass through multiple operating systems and each O/S thinks of message as a header and a body • Important message processing is done at • Network • Client (UNIX, Windows, J2ME etc) • Web Service Header • Application • EACH is < 1ms (except forsmall sensor clients andexcept for complex security) • But network transmissiontime is often 100ms or worse • Thus no performance reasonnot to mix up places processingdone IP TCP App SOAP
What is a Simple Service? • Take any system – it has multiple functionalities • We can implement each functionality as an independent distributed service • Or we can bundle multiple functionalities in a single service • Whether functionality is an independent service or one of many method calls into a “glob of software”, we can always make them as Web services by converting interface to WSDL • Simple services are gotten by taking functionalities and making as small as possible subject to “rule of millisecond” • Distributed services incur messaging overhead of one (local) to 100’s (far apart) of milliseconds to use message rather than method call • Use scripting or compiled integration of functionalities ONLY when require <1 millisecond interaction latency • Apache web site has many (pre Web Service) projects that are multiple functionalities presented as (Java) globs and NOT (Java) Simple Services • Makes it hard to integrate “globs” sharing common security, user profile, file access .. services
CPUs Clusters Compute Resource Grids Overlay and Compose Grids of Grids MPPs Methods Services Component Grids Federated Databases Databases Data Resource Grids Sensor Sensor Nets Grids of Grids of Simple Services • Link via methods messages streams • Services and Grids are linked by messages • Internally to service, functionalities are linked by methods • A simple service is the smallest Grid • We are familiar with method-linked hierarchyLines of Code Methods Objects Programs Packages
Component Grids • So we build collections of Web Services which we package as component Grids • Visualization Grid • Sensor Grid • Management Grid • Utility Computing Grid • Collaboration Grid • Earthquake Simulation Grid • Control Room Grid • Crisis Management Grid • Intelligence Data-mining Grid • We build bigger Grids by composing component Grids using the Service Internet
Electricity CIGrid Security Notification Workflow Messaging Gas CIGrid Earthquake Grid … … Gas Servicesand Filters Earthquake Data & Simulation Services Portals Collaboration Grid Visualization Grid Sensor Grid GIS Grid Compute Grid Data Access/Storage Registry Metadata Core Grid Services Physical Network Critical Infrastructure (CI) Grids built as Grids of Grids
Google plus GIS Grid Integratedwith Los Alamos CI Simulations for DHS Natural Gas Layer Energy Power Layer
Mediation and Transformation Services External facing Interfaces Port Port Port InternalInterfaces InternalInterfaces InternalInterfaces Port Port Port Port Port Port Port Port Port Messaging Subgrid or service Subgrid or service Subgrid or service Mediation and Transformation in a Grid of Grids and Simple Services
H1 H2 H3 H4 Body hp1 hp2 hp3 hp4 hp5 bp3 bp1 bp2 SOAP Message Structure II • Content of individual headers and the body is defined by XML Schema associated with WS-* headers and the service WSDL • SOAP Infoset captures header and body structure • XML Infoset for individual headers and the body capture the details of each message part • Web Service Architecture requires that we capture Infoset structure but does not require that we represent XML in angle bracket <content>value</content> notation Infoset representssemantic structure of message and itsparts
High Performance XML I • There are many approaches to efficient “binary” representations of XML Infosets • MTOM, XOP, Attachments, Fast Web Services • DFDL is one approach to specifying a binary format • Assume URI-S labels Scheme and URI-R labels realization of Scheme for a particular message i.e. URI-R defines specific layout of information in each message • Assume we are interested in conversations where a stream of messages is exchanged between two services or between a client and a service i.e. two end-points • Assume that we need to communicate fast between end-points that understand scheme URI-S but must support conventional representation if one end-point does not understand URI-S
F1 F2 F3 F4 Container Handlers High Performance XML II • First Handler Ft=F1 handles Transport protocol; it negotiates with other end-point to establish a transport conversation which uses either HTTP (default) or a different transport such as UDP with WSRM implementing reliability • URI-T specifies transport choice • Second Handler Fr=F2handles representation and it negotiates a representation conversation with scheme URI-S and realization URI-R • Negotiation identifies parts of SOAP header that are present in all messages in a stream and are ONLY transmitted ONCE • Fr needs to negotiate with Service and other handlers illustrated by F3 and F4 below to decide what representation they will process
H1 H2 H3 H4 Body Ft Fr F3 F4 Container Handlers High Performance XML III • Filters controlled by Conversation Context convert messages between representations using permanent context (metadata) catalog to hold conversation context • Different message views for each end point or even for individual handlers and service within one end point • Conversation Context is fast dynamic metadata service to enable conversions • NaradaBrokering will implement Fr and Ft using its support of multiple transports, fast filters and message queuing; Conversation ContextURI-S, URI-R, URI-T Replicated Message Header Transported Message Handler Message View ServiceMessage View Service
NaradaBrokering Web Service WS1 WS2 WS3 NaradaBrokering Web Service Collaboration Shared Output port with replicated recipients Shared Input Port with replicated services
NaradaBrokering WS1 WS4 WS2 WS5 WS3 WS6 Pipelined Web Service Collaboration • In a workflow, one can invoke collaborative streams on any flow and this splitting is between output port of one and input of next Web Service in chain WS-A WS-B Shared Output Port Shared Input Port
Gateway Gateway Gateway Gateway XGSP Media Service WS-Context Collaboration Grid NaradaBroker Audio Mixer HPSearch Video Mixer UDDI NaradaBroker Transcoder Thumbnail WS-Security Replay NaradaBroker Record Annotate SharedWS SharedDisplay WhiteBoard