670 likes | 687 Views
Explore the various types and visions of grids, including utility computing, e-Science, and peer-to-peer grids. Learn about the international grid infrastructure and the challenges of managing large-scale distributed data sources. Discover how grids support diverse applications such as crisis simulations, financial modeling, and media servers.
E N D
Remarks on Gridsin CGL and Worldwide Tsinghua Dec 5 2005 Geoffrey Fox Computer Science, Informatics, Physics Pervasive Technology Laboratories Indiana University Bloomington IN 47401 gcf@indiana.edu http://www.infomall.org
Internet Scale Distributed Services • Grids use Internet technology and are distinguished by managing or organizing sets of network connected resources • Classic Web allows independent one-to-one access to individual resources • Grids integrate together and manage multiple Internet-connected resources: People, Sensors, computers, data systems • Organization can be explicit as in • TeraGrid which federates many supercomputers; • Deep Web Technologies IR Grid which federates multiple data resources; • CrisisGrid which federates first responders, commanders, sensors, GIS, (Tsunami) simulations, science/public data • Organization can be implicit as in Internet resources such as curated databases and simulation resources that “harmonize a community”
Different Visions of the Grid • Grid just refers to the technologies • Or Grids represent the full system/Applications • DoD’s vision of Network Centric Computing can be considered a Grid (linking sensors, warfighters, commanders, backend resources) and they are building the GiG (Global Information Grid) • Utility Computing or X-on-demand (X=data, computer ..) is major computer Industry interest in Grids and this is key part of enterprise or campus Grids • e-Science or Cyberinfrastructure are virtual organization Grids supporting global distributed science (note sensors, instruments are people are all distributed • Skype (Kazaa) VOIP system is a Peer-to-peer Grid (and VRVS/GlobalMMCS like Internet A/V conferencing are Collaboration Grids) • Commercial 3G Cell-phones and DoD ad-hoc network initiative are forming mobile Grids
Types of Computing Grids • Running “Pleasing Parallel Jobs” as in United Devices, Entropia (Desktop Grid) “cycle stealing systems” • Can be managed (“inside” the enterprise as in Condor) or more informal (as in SETI@Home) • Computing-on-demand in Industry where jobs spawned are perhaps very large (SAP, Oracle …) • Support distributed file systems as in Legion (Avaki), Globus with (web-enhanced) UNIX programming paradigm • Particle Physics will run some 30,000 simultaneous jobs • Distributed Simulation HLA/RTI style Grids • Linking Supercomputers as in TeraGrid • Pipelinedapplications linking data/instruments, compute, visualization • Seamless Access where Grid portals allow one to choose oneof multiple resources with a common interfaces • Parallel Computing typically NOT suited for a Grid (latency)
Analysis and Visualization Large Disks Old Style Metacomputing Grid Large Scale Parallel Computers Spread a single large Problem over multiple supercomputers
Utility and Service Computing • An important business application of Grids is believed to be utility computing • Namely support a pool of computers to be assigned as needed to take-up extra demand • Pool shared between multiple applications • Natural architecture is not a cluster of computers connected to each other but rather a “Farm of Grid Services” connected to Internet and supporting services such as • Web Servers • Financial Modeling • Run SAP • Data-mining • Simulation response to crisis like forest fire or earthquake • Media Servers for Video-over-IP • Note classic Supercomputer use is to allow full access to do “anything” via ssh etc. • In service model, one pre-configures services for all programs and you access portal to run job with less security issues
Towards an International Grid Infrastructure UK NGS Leeds Manchester Starlight (Chicago) US TeraGrid Netherlight (Amsterdam) Oxford RAL SDSC NCSA PSC UCL UKLight SC05 Local laptops in Seattle and UK All sites connected by production network (not all shown) Computation Steering clients Service Registry Network PoP
Information/Knowledge Grids • Distributed (10’s to 1000’s) of data sources (instruments, file systems, curated databases …) • Data Deluge: 1 (now) to 100’s petabytes/year (2012) • Moore’s law for Sensors • Possible filters assigned dynamically (on-demand) • Run image processing algorithm on telescope image • Run Gene sequencing algorithm on compiled data • Needs decision support front end with “what-if” simulations • Metadata (provenance) critical to annotate data • Integrate across experiments as in multi-wavelength astronomy Data Deluge comes from pixels/year available
Data Deluged Science • Now particle physics will get 100 petabytes from CERN using around 30,000 CPU’s simultaneously 24X7 • Exponential growth in data and compare to: • The Bible = 5 Megabytes • Annual refereed papers = 1 Terabyte • Library of Congress = 20 Terabytes • Internet Archive (1996 – 2002) = 100 Terabytes • Weather, climate, solid earth (EarthScope) • Bioinformatics curated databases (Biocomplexity only 1000’s of data points at present) • Virtual Observatory and SkyServer in Astronomy • Environmental Sensor nets • In the past, HPCC community worried about data in the form of parallel I/O or MPI-IO, but we didn’t consider it as an enabler of new science and new ways of computing • Data assimilation was not central to HPCC • DoE ASCI set up because didn’t want test data!
Virtual Observatory Astronomy GridIntegrate Experiments Radio Far-Infrared Visible Dust Map Visible + X-ray Galaxy Density Map
International Virtual Observatory Alliance • Reached international agreements on Astronomical Data Query Language, VOTable 1.1, UCD 1+, Resource Metadata Schema • Image Access Protocol, Spectral Access Protocol and Spectral Data Model, Space-Time Coordinates definitions and schema • Interoperable registries by Jan 2005 (NVO, AstroGrid, AVO, JVO) using OAI publishing and harvesting • So each Community of Interest builds data AND service standards that build on GS-* and WS-*
myGrid Project • Imminent ‘deluge’ of data • Highly heterogeneous • Highly complex and inter-related • Convergence of data and literature archives
The Williams Workflows A B C A: Identification of overlapping sequence B: Characterisation of nucleotide sequence C: Characterisation of protein sequence
Field Trip Data Database ? GISGrid Discovery Services RepositoriesFederated Databases Streaming Data Sensors Database Sensor Grid Database Grid Research Education SERVOGrid Compute Grid Customization Services From Researchto Education Data FilterServices ResearchSimulations Analysis and VisualizationPortal EducationGrid Computer Farm Grid of Grids: Research Grid and Education Grid
SERVOGrid Requirements • Seamless Access to Data repositories and large scale computers • Integration of multiple data sources including sensors, databases, file systems with analysis system • Including filtered OGSA-DAI (Grid database access) • Rich meta-data generation and access with SERVOGrid specific Schema extending openGIS (Geography as a Web service) standards and using Semantic Grid • Portalswith component model for user interfaces and web control of all capabilities • Collaboration to support world-wide work • Basic Grid tools: workflow and notification • NOT metacomputing
Portal Architecture Clients (Pure HTML, Java Applet ..) Aggregation and Rendering Portlet Class:WebForm SERVOGrid (IU) Web/Gridservice Computing Remoteor ProxyPortlets Web/Gridservice Data Stores Portlet Class GridPort etc. Portlet Class Web/Gridservice Instruments (Java) COG Kit Portlet Class Hierarchical arrangement Portal Internal Services LocalPortlets Clients Portal Portlets Libraries Services Resources
Each Service has its own portlet Individual portlet for the Proxy Manager Use tabs or choose different portlets to navigate through interfaces to different services 2 Other Portlets
Databases and/or Sensors Data Data Filter Filter Filter Data Filter Data OGC or OGSA-DAIGrid Services AnalysisControl Visualize Grid Data Filter This Type of Grid integrates with Parallel computing Multiple HPC facilities but only use one at a time Many simultaneous data sources and sinks HPC Simulation Grid Data Assimilation Other Gridand Web Services Distributed Filters massage data For simulation SERVOGrid (Complexity) Computing Model
Simulation and the Grid • Simulation on the Grid is distributed but its rarelyclassical distributed simulation • It is either managing multiple jobs that are identical except for parameters controlling simulation – SETI@Home style of “desktop grid” • Or workflow that roughly corresponds to federation • The workflow is designed to supported the integration of distributed entities • Simulations (maybe parallel) and Filtersfor example GCF General Coupling Framework from Manchester • Databases and Sensors • Visualization and user interfaces • RTI should be built on workflow and inherit WS-*/GS-* and NCOW CES built on same
Two-level Programming I Service Data • The Web Service (Grid) paradigm implicitly assumes a two-level Programming Model • We make a Service (same as a “distributed object” or “computer program” running on a remote computer) using conventional technologies • C++ Java or Fortran Monte Carlo module • Data streaming from a sensor or Satellite • Specialized (JDBC) database access • Such services accept and produce data from users files and databases • The Grid is built by coordinating such services assuming we have solved problem of programming the service
Service1 Service3 Service2 Service4 Two-level Programming II • The Grid is discussing the composition of distributed serviceswith the runtime interfaces to Grid as opposed to UNIX pipes/data streams • Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs • Such interpretative environments are the single processor analog of Grid Programming • Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately
Web Service 1 WS 2 WS N-1 Web Service N 3 Layer Programming Model Level 1 Programming inside services Application expressed in in Java Fortran C++ MPI etc. WS-* Infrastructure Level 2 Programming choosing services by virtualization Application Semantics (Metadata, Ontology) Semantic Grid Level 3 Grid Programming composing multiple services Service Workflow, Transactions, Mediation Substantial work in UK e-Science program, international semantic web community
Consequences of Rule of the Millisecond Classic Programming • Useful to remember critical time scales • 1) 0.000001 ms – CPU does a calculation • 2a) 0.001 to 0.01 ms – Parallel Computing MPI latency • 2b) 0.001 to 0.01 ms – Overhead of a Method Call • 3) 1 ms – wake-up a thread or process either? • 4) 10 to 1000 ms – Internet delay: Workflow • 2a), 4) implies geographically distributed metacomputing can’t in general compete with parallel systems • 3) << 4) implies a software overlay network is possible without significant overhead • We need to explain why it adds value of course! • 2b) versus 3) and 4) describes regions where method and message based programming paradigms important
Four Data Streaming Application Areas • Data Assimilation applied to link the data deluge (satellites, sensors, seismometers) in real time to small and large scale parallel simulations • Use in Earthquake Science • Department of Defense (and Homeland Security) have built the Global Information Grid with a target architecture NCOW (Network Centric Operations and warfare) • They submit no jobs; rather stream data to brokers from which they are filtered and distributed • Includes their rather dated distributed simulation HLA • Audio-Video Conferencing implemented with services and Grid messaging • Hand-held Grid linking PDA/cell-phones to Grids
SS Database SS SS SS SS SS SS SS Raw Data Data Information Knowledge Wisdom AnotherGrid Decisions AnotherGrid SS SS SS SS FS FS OS MD MD FS Portal OS OS FS OS SOAP Messages OS FS FS FS AnotherService FS FS MD MD OS MD OS OS FS Other Service FS FS FS FS MD OS OS OS FS FS FS MD MD FS Filter Service OS AnotherGrid FS MetaData FS FS FS MD Sensor Service SS SS SS SS SS SS SS SS SS SS AnotherService
Key Concepts • Grid of Grids (System of Systems) allows “library” approach to composing Grids • Service Oriented architectures (Web or Grid services) are attractive for many/most distributed systems • There are many applications that are NOT best considered as jobs and files (classic Grid) but rather as streams and filters (services) • Services exchanging messages becomesServices exchanging streams (sets of messages) • Publish-Subscribe messaging gives better QoS and management than point to point messaging with negligible performance loss • Always use standards including those for GIS
What Type of Services are there? • There are a horde of support services supplying security, collaboration, database access, user interfaces • The support services are either associated with system or application • We will study the WS-* and GS-* which implicitly or explicitly define many support services • There are generalized filter services which are applications that accept messages and produce new messages with some data derived from that in input • Simulations (including PDE’s and reactive systems) • Data-mining • Transformations • Agents • Reasoning are all termed filters here • There are services like “author ontology”, “parse RDF” or “attach provenance” that directly support Semantic Grid • But all services and their interactions are bathed in sea of meta-data and so implicitly need and support the Semantic Grid
It’s a Composite Hierarchical World • Filters can be a workflow which means they are “just collections of other simpler services” • One needs meta-data to control the workflow • Services are programs that accept messages and produce messages • Grids are a distributed collection of services supporting managed shared resources • Management requires meta-data • Grids are distributed systems that accept distributed messages and produce distributed result messages • Can always talk about Grids and view a service or a workflow as a special case of a Grid • It just requires meta-data to send a message to a Grid and it routed to “correct computer” holding “requested service” • Meta-data allows mapping of virtual to real addresses
GIS Grids and Sensor Grids • OGC has defined a suite of data structures and services to support Geographical Information Systems and Sensors • GML Geography Markup language defines specification of geo-referenced data • SensorML and O&M (Observation and Measurements) define meta-data and data structure for sensors • Services like Web Map Service, Web Feature Service, Sensor Collection Service define services interfaces to access GIS and sensor information • Grid workflow links services that are designed to support streaming input and output messages • We are building Grid (Web) service implementations of these specifications for NASA’s SERVOGrid
WMS uses WFS that uses data sources <gml:featureMember> <fault> <name> Northridge2 </name> <segment> Northridge2 </segment> <author> Wald D. J.</author> <gml:lineStringProperty> <gml:LineStringsrsName="null"> <gml:coordinates> -118.72,34.243 -118.591,34.176 </gml:coordinates> </gml:LineString> </gml:lineStringProperty> </fault> </gml:featureMember>
Electric Power and Natural Gas data from LANL Interdependent Critical Infrastructure Simulations Zoom-in Zoom-out FeatureInfo mode Measure distance mode Clear Distance Drag and Drop mode Refresh to initial map
Typical use of Grid Messaging in NASA Sensor Grid GIS Grid Grid Eventing Datamining Grid
Typical use of Grid Messaging Filter or Datamining Sensor Grid Post afterProcessing Post beforeProcessing Web Feature Service NaradaBrokering Notify WFS (GIS data) Grid Database Archives Subscribe HPSearch Manages GIS Grid WS-Context Stores dynamic data GeographicalInformation System
Real Time GPS and Google Maps Subscribe to live GPS station. Position data from SOPAC is combined with Google map clients. Select and zoom to GPS station location, click icons for more information.
Integrating Archived Web Feature Services and Google Maps Google maps can be integrated with Web Feature Service Archives to filter and browse seismic records.
What is Happening? • Grid ideas are being developed in (at least) four communities • Web Service – W3C, OASIS, (DMTF) • Grid Forum (High Performance Computing, e-Science) • Enterprise Grid Alliance (Commercial “Grid Forum” with a near term focus) • Service Standards are being debated • Grid Operational Infrastructure is being deployed • Grid Architecture and core software being developed • Apache has several important projects as do academia; large and small companies • Particular System Services are being developed “centrally” – OGSA or GS-* framework for this in GGF; WS-* for OASIS/W3C/Microsoft-IBM • Lots of fields are setting domain specific standards and building domain specific services • USA started but now Europe is probably in the lead and Asia will soon catch USA if momentum (roughly zero for USA) continues
4: Application or Community of InterestSpecific Services such as “Run BLAST” or “Look at Houses for sale” 3: Generally Useful Services and Features Such as “Access a Database” or “Submit a Job” or “ManageCluster” or “Support a Portal” or “Collaborative Visualization” 2: System Services and Features Handlers like WS-RM, Security, Programming Models like BPELor Registries like UDDI 1: Container and Run Time (Hosting) Environment The Grid and Web Service Institutional Hierarchy OGSA GS-*and some WS-* GGF/W3C/…. WS-* fromOASIS/W3C/Industry Apache Axis.NET etc. Must set standards to get interoperability
Philosophy of Web Service Grids • Much of Distributed Computing was built by natural extensions of computing models developed for sequential machines • This leads to the distributed object (DO) model represented by Java and CORBA • RPC (Remote Procedure Call) or RMI (Remote Method Invocation) for Java • Key people think this is not a good idea as it scales badly and ties distributed entities together too tightly • Distributed Objects Replaced by Services • Note CORBA was considered too complicated in both organization and proposed infrastructure • and Java was considered as “tightly coupled to Sun” • So there were other reasons to discard • Thus replace distributed objects by services connected by “one-way” messages and not by request-response messages
Stateful Interactions • There are (at least) four approaches to specifying state • OGSIuse factories to generate separate services for each session in standard distributed object fashion • Globus GT-4and WSRF use metadata of a resource to identify state associated with particular session • WS-GAFuses WS-Context to provide abstract context defining state. Has strength and weakness that reveals less about nature of session • WS-I+ “Pure Web Service” leaves state specification the application – e.g. put a context in the SOAP body • I think we should smile and write a great metadata service hiding all these different models for state and metadata
3 XML Databases of Importance • WS-Context controlling a workflow • (Extended) UDDI supporting semantic service discovery • WFS or ASFS (see later) provides application specific data/meta-data repository) • These have different performance, scalability and data unit size requirement • In our implementation, each is currently “just an Oracle/MySQL” database front ended by filters that convert between XML (GML for WFS) and object-relational Schema • Example of Semantics (XML) versus representation (SQL) difference • OGSA-DAI offers Grid interface to databases – we could use but don’t as we only need to expose WFS and not MySQL to Grid
Information Management/Processing • SOAP messages transport information expressed in a semantically rich fashion between sources and services that enhance and transform information so that complete system provides • Semantic Web technologies like RDF and OWL help us have rich expressivity • Data Information Knowledgetransformation • We build application specific information management/transformation systems ASIS for each application domain • One special domain is the system itself where the metadata associated with services, sessions, Grids, messages, streams and workflow is itself managed and supported by an SIIS
Generalizing a GIS • Geographical Information Systems GIS have been hugely successful in all fields that study the earth and related worlds • They define Geography Syntax (GML) and ways to store, access, query, manipulate and display geographical features • In SOA, GIS corresponds to a domain specific XML language and a suite of services for different functions above • However such a universal information model has not been developed in other areas even though there are many fields in which it appears possible • BIS Biological Information System • MIS Military Information System • IRIS Information Retrieval Information System • PAIS Physics Analysis Information System • SIIS Service Infrastructure Information System
ASIS Application Specific Information System I • a) Discovery capabilities that are best done using WS-* standards • b) Domain specific metadata and data including search/store/access interface. (cf WFS). Lets call generalization ASFS (Application Specific Feature Service) • Language to express domain specific features (cf GML). Lets call this ASL (Application Specific language) • Tools to manipulate information expressed in language and key data of application (cf coordinate transformations). Lets call this ASTT (Application specific Tools and Transformations) • ASL must support Data sources such as sensors (cf OGC metadata and data sensor standards) and repositories. Sensors need (common across applications) support of streams of data • Queries need to support archived (find all relevant data in past) and streaming (find all data in future with given properties) • Note all AS Services behave like Sensors and all sensors are wrapped as services • Any domain will have “raw data” (binary) and that which has been filtered to ASL. Lets call ASBD (Application Specific Binary Data)
Filter, Transformation, Reasoning, Data-mining, Analysis ASRepository AS Tool (generic) AS Service (user defined) AS Tool (generic) ASVS Display AS“Sensor” Messages using ASL ASIS Application Specific Information System II • Lets call this ASVS (Application Specific Visualization Services) generalizing WMS for GIS • The ASVS should both visualize information and provide a way of navigating (cf GetFeatureInfo) database (the ASFS) • The ASVS can itself be federated and presents an ASFS output interface • d) There should be application service interface for ASIS from which all ASIS service inherit • e) There will be other user services interfacing to ASIS • All user and system services will input and output data in ASL using filters to cope with ASBD
WS-* implies the Service Internet • We have the classic (CISCO, Juniper ….) Internet routing the flood of ordinary packets in OSI stack architecture • Web Services build the “Service Internet” or IOI (Internet on Internet) with • Routing via WS-Addressing not IP header • Fault Tolerance (WS-RM not TCP) • Security (WS-Security/SecureConversation not IPSec/SSL) • Data Transmission by WS-Transfer not HTTP • Information Services (UDDI/WS-Context not DNS/Configuration files) • At message/web service level and not packet/IP address level • Software-based Service Internet possible as computers “fast” • Familiar from Peer-to-peer networks and built as a software overlay network defining Grid (analogy is VPN) • SOAP Header contains all information needed for the “Service Internet” (Grid Operating System) with SOAP Body containing information for Grid application service
WS-I Interoperability • Critical underpinning of Grids and Web Services is the gradually growing set of specifications in the Web Service Interoperability Profiles • Web Services Interoperability (WS-I) Interoperability Profile 1.0a." http://www.ws-i.org. gives us XSD, WSDL1.1, SOAP1.1, UDDIin basic profile and parts of WS-Security in their first security profile. • We imagine the “60 Specifications” being checked out and evolved in the cauldron of the real world and occasionally best practice identifies a new specification to be added to WS-I which gradually increases in scope • Note only 4.5 out of 60 specifications have “made it” in this definition