1 / 50

The Grid: Beyond the Hype

The Grid: Beyond the Hype. Ian Foster Argonne National Laboratory University of Chicago Globus Alliance www.mcs.anl.gov/~foster. Seminar, Duke, September 14, 2004. Grid Hype. The Shape of Grids to Come?. Energy Internet. Internet Hype?. eScience & Grid: 6 Theses.

talia
Download Presentation

The Grid: Beyond the Hype

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Grid:Beyond the Hype Ian Foster Argonne National Laboratory University of Chicago Globus Alliance www.mcs.anl.gov/~foster Seminar, Duke, September 14, 2004

  2. Grid Hype

  3. The Shape of Grids to Come? Energy Internet InternetHype?

  4. eScience & Grid: 6 Theses • Scientific progress depends increasingly on large-scale distributed collaborative work • Such distributed collaborative work raises challenging problems of broad importance • Any effective attack on those problems must involve close engagement with applications • Open software & standards are key to producing & disseminating required solutions • Shared software & service infrastructure are essential application enablers • A cross-disciplinary community of technology producers & consumers is needed

  5. Global Knowledge Communities: E.g., High Energy Physics

  6. The Grid “Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations” • Enable integration of distributed resources • Using general-purpose protocols & infrastructure • To achieve better-than-best-effort service

  7. The Grid (2) • Dynamically link resources/services • From collaborators, customers, eUtilities, … (members of evolving “virtual organization”) • Into a “virtual computing system” • Dynamic, multi-faceted system spanning institutions and industries • Configured to meet instantaneous needs, for: • Multi-faceted QoX for demanding workloads • Security, performance, reliability, …

  8. Infra- structure Software, Standards Computer Science Discipline Advances Problem-Driven, Collaborative Research Methodology Build Deploy Deploy Global Community Apply Apply Design Apply Apply Analyze

  9. Problem-Driven, Collaborative Research Methodology Infra- structure Software, Standards Build Deploy Deploy Global Community Apply Apply Design Apply Apply Analyze Computer Science Discipline Advances

  10. Many sources of data, services, computation Security & policy must underlie access & management decisions Discovery R R RM RM Registries organize services of interest to a community Access RM Resource management is needed to ensure progress & arbitrate competing demands RM RM Policy service Security service Policy service Security service Data integration activities may require access to, & exploration/analysis of, data at many locations Exploration & analysis may involve complex,multi-step workflows Resource/Service Integrationas a Fundamental Challenge

  11. Earth Simulator LHC Exp. Current accelerator Exp. Grav. Wave Nuclear Exp. Astronomy Atmospheric Chemistry Group Scale Metrics: Participants, Data, Tasks, Performance, Interactions, …

  12. Profound Technical Challenges • Support collaborative work • Define primitive protocols • Build reusable software • Package & deliver software • Deploy & operate services • Operate infrastructure • Upgrade infrastructure • Perform troubleshooting • Etc., etc., etc. How do we, in dynamic, scalable, multi-institutional, computationally & data-rich settings: • Negotiate & manage trust • Access & integrate data • Construct & reuse workflows • Plan complex computations • Detect & recover from failures • Capture & share knowledge • Represent & enforce policies • Achieve end-to-end QoX • Move data rapidly & reliably

  13. Grid TechnologiesAddress Key Requirements • Infrastructure (“middleware”) for establishing, managing, and evolving multi-organizational federations • Dynamic, autonomous, domain independent • On-demand, ubiquitous access to computing, data, and services • Mechanisms for creating and managing workflow within such federations • New capabilities constructed dynamically and transparently from distributed services • Service-oriented, virtualization

  14. Protocols and/or tools for use in dynamic, scalable, multi-institutional, computationally & data-rich settings for: Large-scale distributedsystem architecture Cross-org authentication Scalable community-based policy enforcement Robust & scalable discovery Wide-area scheduling High-performance, robust, wide-area data management Knowledge-based workflow generation High-end collaboration Resource & service virtualization Distributed monitoring & manageability Application development Wide area fault tolerance Infrastructure deployment & management Resource provisioning & quality of service Performance monitoring & modeling Computer Science Contributions

  15. www.griphyn.org/chimera Collaborative Workflow: Virtual Data “I’ve come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes.” “I’ve detected a calibration error in an instrument and want to know which derived data to recompute.” Data Virtual Data System consumed-by/ generated-by created-by Transformation Derivation execution-of “I want to apply an astronomical analysis program to millions of objects. If the results already exist, I’ll save weeks of computation.” “I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I won’t have to write one from scratch.”

  16. RDP=2 MaxRDP 10 nodes fail then 12 95% RDP 10 rejoin 900s later 10 90%RDP RDP=1 8 Stress RDP 8 6 6 4 Maxximum link stress . 4 2 2 0 0 0 240 480 720 960 3120 3360 3600 3840 1200 1440 2640 2880 1680 2400 1920 2160 Time (sec) AdaptiveUnstructured Multicast Application overlay D” E” C” A” B” D’ Base overlay E’ C’ A’ B’ D Physical topology E C A B “UMM: A dynamically adaptive, unstructured multicast overlay” M. Ripeanu et al.

  17. Problem-Driven, Collaborative Research Methodology Infra- structure Software, Standards Build Deploy Deploy Global Community Apply Apply Design Apply Apply Analyze Computer Science Discipline Advances

  18. Open Standards & Software • Standardized & interoperable mechanisms for secure & reliable: • Authentication, authorization, policy, … • Representation & management of state • Initiation & management of computation • Data access & movement • Communication & notification • Good quality open source implementations to accelerate adoption & development • E.g., Globus Toolkit

  19. Managed shared virtual systems Research Open Grid Services Arch Web services, etc. Real standards Multiple implementations Globus Toolkit Internet standards Defacto standard Single implementation Evolution of Open GridStandards and Software Increased functionality, standardization Custom solutions 1990 1995 2000 2005 2010

  20. WS Core Enables Frameworks:E.g., Resource Management Applications of the framework(Compute, network, storage provisioning,job reservation & submission, data management,application service QoS, …) WS-Agreement(Agreement negotiation) WS Distributed Management(Lifecycle, monitoring, …) WS-Resource Framework & WS-Notification(Resource identity, lifetime, inspection, subscription, …) Web services(WSDL, SOAP, WS-Security, WS-ReliableMessaging, …)

  21. WSRF & WS-Notification • Naming and bindings (basis for virtualization) • Every resource can be uniquely referenced, and has one or more associated services for interacting with it • Lifecycle (basis for fault resilient state mgmt) • Resources created by services following factory pattern • Resources destroyed immediately or scheduled • Information model (basis for monitoring, discovery) • Resource properties associated with resources • Operations for querying and setting this info • Asynchronous notification of changes to properties • Service groups (basis for registries, collective svcs) • Group membership rules & membership management • Base Fault type

  22. Local processor manager is “front-ended” with A Web service interface J J Other kinds of resources are also “modeled” as WS-Resources Grid Scheduler J Service Level Network Storage Blades Notification A R R R R R R R R R Bringing it All Together Scenario: Resource management & scheduling Grid Scheduleris a Web Service Grid “Jobs” and “tasks” are also modeled using WS-Resources and Resource Properties WS-Resource used to “model” physical processor resources Service Level Agreement is modeled as a WS-Resource WS-Notification can be used to “inform” the scheduler when processor utilization changes Lifetime of SLA Resource tied to the duration of the agreement WS-Resource Properties “project” processor status (like utilization)

  23. The Globus Alliance & Toolkit(Argonne, USC/ISI, Edinburgh, PDC) • An international partnership dedicated to creating & disseminating high-quality open source Grid technology: the Globus Toolkit • Design, engineering, support, governance • Academic Affiliates make major contributions • EU: CERN, Imperial, MPI, Poznan • AP: AIST, TIT, Monash • US: NCSA, SDSC, TACC, UCSB, UW, etc. • Significant industrial contributions • 1000s of users worldwide, many contribute

  24. DARPA, NSF begin funding Grid work Globus Toolkit History:An Unreliable Memoir Only Globus.Org; not downloads from: NMI UK eScience EU DataGrid IBM Platform etc. GT 2.0 Released GT 2.2 Released Physiology of the Grid Paper Released GT 2.0 beta Released NSF GRIDS CenterInitiated Anatomy of the Grid Paper Released Significant Commercial Interest in Grids GT 1.1.4 and MPICH-G2 Released The Grid: Blueprint for a New Computing Infrastructure published NSF & European Commission Initiate Many New Grid Projects First EuroGlobus Conference Held in Lecce GT 1.1.3 Released MPICH-G released Early Application Successes Reported GT 1.1.2 Released Globus Project wins Global Information Infrastructure Award GT 1.0.0 Released GT 1.1.1 Released NASA initiatesInformation Power Grid

  25. GlobusToolkitContributorsInclude • Grid Packaging Technology (GPT) NCSA • Persistent GRAM Jobmanager Condor • GSI/Kerberos interchangeability Sandia • Documentation NASA, NCSA • Ports IBM, HP, Sun, SDSC, … • MDS stress testing EU DataGrid • Support IBM, Platform, UK eScience • Testing and patches Many • Interoperable tools Many • Replica location service EU DataGrid • Python hosting environment LBNL • Data access & integration UK eScience • Data mediation services SDSC • Tooling, Xindice, JMS IBM • Brokering framework Platform • Management framework HP • $$ DARPA, DOE, NSF, NASA, Microsoft, EU

  26. BIRN Biomedical Grid NSF Middleware Init. Virtual Data Toolkit Earth System Grid IBM Grid Toolbox UK eScience Grid Platform Globus Butterfly Grid EU DataGrid Access Grid Fusion Grid MPICH-G2 … NEESgrid TeraGrid GT-Based Grid Tools & Solutions Globus Toolkit

  27. Problem-Driven, Collaborative Research Methodology Infra- structure Software, Standards Build Deploy Deploy Global Community Apply Apply Design Apply Apply Analyze Computer Science Discipline Advances

  28. Infrastructure • Broadly deployed services in support of virtual organization formation and operation • Authentication, authorization, discovery, … • Services, software, and policies enabling on-demand access to important resources • Computers, databases, networks, storage, software services,… • Operational support for 24x7 availability • Integration with campus infrastructures • Distributed, heterogeneous, instrumented systems can be wonderful CS testbeds

  29. Infrastructure Status • Many infrastructure deployments worldwide • Community-specific & general-purpose • From campus to international • Most based on GT technology • U.S. examples: TeraGrid, Grid2003, NEESgrid, Earth System Grid, BIRN • Major open issues include practical aspects of operations and federation • Scalability issues (number of users, sites, resources, files, jobs, etc.) also arising

  30. NSF Network for Earthquake Engineering Simulation (NEES) Transform our ability to carry out research vital to reducing vulnerability to catastrophic earthquakes

  31. NEESgrid User Perspective Secure, reliable, on-demand access to data, software, people, and other resources (ideally all via a Web Browser!)

  32. How it Really Happens(with the Globus Toolkit) GlobusGRAM ComputeServer SimulationTool GlobusGRAM ComputeServer WebBrowser CHEF Globus IndexService Camera TelepresenceMonitor DataViewerTool Camera GlobusDAI Database service CHEF ChatTeamlet GlobusMCS/RLS GlobusDAI Database service MyProxy GlobusDAI Database service CertificateAuthority Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces

  33. Grid2003: An Operational Grid • 28 sites (2100-2800 CPUs) & growing • 400-1300 concurrent jobs • 7 substantial applications + CS experiments • Running since October 2003 Korea http://www.ivdgl.org/grid2003

  34. Open Science Grid Components • Computers & storage at 28 sites (to date) • 2800+ CPUs • Uniform service environment at each site • Globus Toolkit provides basic authentication, execution management, data movement • Pacman installation system enables installation of numerous other VDT and application services • Global & virtual organization services • Certification & registration authorities, VO membership services, monitoring services • Client-side tools for data access & analysis • Virtual data, execution planning, DAG management, execution management, monitoring • IGOC: iVDGL Grid Operations Center

  35. DOE Earth System Grid Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models www.earthsystemgrid.org

  36. Earth System Grid

  37. Problem-Driven, Collaborative Research Methodology Infra- structure Software, Standards Build Deploy Deploy Global Community Apply Apply Design Apply Apply Analyze Computer Science Discipline Advances

  38. UIUC Experimental Model U. Colorado Experimental Model NCSA Computational Model m1 f2 f1 f2 f1 m1 NEESgridMulti-site Online Simulation Test All computational models written in Matlab.

  39. NEESgridMultisite OnlineSimulation Test(July 2003) Illinois (simulation) Colorado Illinois

  40. F1 m1, q1 f2 e NTCP SERVER NTCP SERVER NTCP SERVER f1, x1 F2 = m1 f2 f1 MOST: A Grid Perspective UIUC Experimental Model U. Colorado Experimental Model SIMULATION COORDINATOR NCSA Computational Model

  41. Grid2003 Applications To Date • CMS proton-proton collision simulation • ATLAS proton-proton collision simulation • LIGO gravitational wave search • SDSS galaxy cluster detection • ATLAS interactive analysis • BTeV proton-antiproton collision simulation • SnB biomolecular analysis • GADU/Gnare genone analysis • Various computer science experiments www.ivdgl.org/grid2003/applications

  42. Genome sequence analysis ExampleGrid2003Workflows Sloan digital sky survey Physics data analysis

  43. Example Grid3 Application:NVO Mosaic Construction • Construct custom mosaics on demand from multiple data sources • User specifies projection, coordinates, size, rotation, spatial sampling NVO/NASA Montage: A small (1200 node) workflow Work by Ewa Deelman et al., USC/ISI and Caltech

  44. Concluding Remarks Infra- structure Software, Standards Build Deploy Deploy Global Community Apply Apply Design Apply Apply Analyze Computer Science Discipline Advances

  45. eScience & Grid: 6 Theses • Scientific progress depends increasingly on large-scale distributed collaborative work • Such distributed collaborative work raises challenging problems of broad importance • Any effective attack on those problems must involve close engagement with applications • Open software & standards are key to producing & disseminating required solutions • Shared software & service infrastructure are essential application enablers • A cross-disciplinary community of technology producers & consumers is needed

  46. GlobalCommunity

  47. switchfabric compute storage Utility Computing is One of Several Commercial Drivers computing utility or GRID virtual data center value programmable data center UDC grid-enabled systems • Utility computing • On-demand • Service-orientation • Virtualization Tru64, HP-UX, Linux clusters Open VMS clusters, TruCluster, MC ServiceGuard today shared, traded resources (Based on a slide from HP)

  48. Significant Challenges Remain • Scaling in multiple dimensions • Ambition and complexity of applications • Number of users, datasets, services, … • From technologies to solutions • The need for persistent infrastructure • Software and people as well as hardware • Currently no long-term commitment • Institutionalizing multidisciplinary approach • Understand implications on the practice of computer science research

  49. Thanks, in particular, to: • Carl Kesselman and Steve Tuecke, my long-time Globus co-conspirators • Gregor von Laszewski, Kate Keahey, Jennifer Schopf, Mike Wilde, Argonne colleagues • Globus Alliance members at Argonne, U.Chicago, USC/ISI, Edinburgh, PDC • Miron Livny, U.Wisconsin Condor project, Rick Stevens, Argonne & U.Chicago • Other partners in Grid technology, application, & infrastructure projects • DOE, NSF, NASA, IBM for generous support

  50. For More Information • Globus Alliance • www.globus.org • Global Grid Forum • www.ggf.org • Open Science Grid • www.opensciencegrid.org • Background information • www.mcs.anl.gov/~foster • GlobusWORLD 2005 • Feb 7-11, Boston 2nd Edition www.mkp.com/grid2

More Related