1 / 118

Goals

Goals. Provide a comprehensive survey of the state of the art in Grids & Grid technologies Present concepts that can help categorize diverse Grid technologies and projects Review major Grid projects & contributors Explain relationships with major industrial technologies

clea
Download Presentation

Goals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Goals • Provide a comprehensive survey of the state of the art in Grids & Grid technologies • Present concepts that can help categorize diverse Grid technologies and projects • Review major Grid projects & contributors • Explain relationships with major industrial technologies • Introduce the activities of the Global Grid Forum

  2. Tutorial Overview • The opportunity • Requirements • The systems problem • The programming problem • Major Grid R&D projects • Grids and industry • Organization • Recap and conclusions

  3. The Opportunity

  4. Why Grids? • A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour • 1,000 physicists worldwide pool resources for petaop analyses of petabytes of data • Civil engineers collaborate to design, execute, & analyze shake table experiments • Climate scientists visualize, annotate, & analyze terabyte simulation datasets • An emergency response team couples real time data, weather model, population data

  5. Why Grids? (contd) • A multidisciplinary analysis in aerospace couples code and data in four companies • A home user invokes architectural design functions at an application service provider • An application service provider purchases cycles from compute cycle providers • Scientists working for a multinational soap company design a new product • A community group pools members’ PCs to analyze alternative designs for a local road

  6. The Fundamental Concept Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals—in the absence of central control, omniscience, trust relationships

  7. Why Now? • Moore’s law improvements in computing produce highly functional endsystems • The Internet and burgeoning wired and wireless provide universal connectivity • Changing modes of working and problem solving emphasize teamwork, computation • Network exponentials produce dramatic changes in geometry and geography

  8. Network Exponentials • Network vs. computer performance • Computer speed doubles every 18 months • Network speed doubles every 9 months • Difference = order of magnitude per 5 years • 1986 to 2000 • Computers: x 500 • Networks: x 340,000 • 2001 to 2010 • Computers: x 60 • Networks: x 4000 Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.

  9. Requirements

  10. One View of Requirements • Identity & authentication • Authorization & policy • Resource discovery • Resource characterization • Resource allocation • (Co-)reservation, workflow • Distributed algorithms • Remote data access • High-speed data transfer • Performance guarantees • Monitoring • Adaptation • Intrusion detection • Resource management • Accounting & payment • Fault management • System evolution • Etc. • Etc. • …

  11. Programming Problem Systems Problem Another View: “Three Obstaclesto Making Grid Computing Routine” • New approaches to problem solving • Data Grids, distributed computing, peer-to-peer, collaboration grids, … • Structuring and writing programs • Abstractions, tools • Enabling resource sharing across distinct institutions • Resource discovery, access, reservation, allocation; authentication, authorization, policy; communication; fault detection and notification; …

  12. Resource • An entity that is to be shared • E.g., computers, storage, data, software • Does not have to be a physical entity • E.g., Condor pool, distributed file system, … • Defined in terms of interfaces, not devices • E.g. scheduler such as LSF and PBS define a compute resource • Open/close/read/write define access to a distributed file system, e.g. NFS, AFS, DFS

  13. Network Protocol • A formal description of message formats and a set of rules for message exchange • Rules may define sequence of message exchanges • Protocol may define state-change in endpoint, e.g., file system state change • Good protocols designed to do one thing • Protocols can be layered • Examples of protocols • IP, TCP, TLS (was SSL), HTTP, Kerberos

  14. FTP Server Web Server HTTP Protocol FTP Protocol Telnet Protocol TLS Protocol TCP Protocol TCP Protocol IP Protocol IP Protocol Network Enabled Service • A protocol impln defining a set of capabilities • Protocol defines interaction with service • All services require protocols • Not all protocols are used to provide services (e.g. IP, TLS) • Examples: FTP and Web servers

  15. Application Programmer Interface • A specification for a set of routines to facilitate application development • Refers to definition, not implementation • E.g., there are many MPI implementations • Spec often language-specific (or IDL) • Routine name, number, order and type of arguments; mapping to language constructs • Behavior or function of routine • Examples • GSS API (security), MPI (message passing)

  16. Software Development Kit • A particular instantiation of an API • SDK consists of libraries and tools • Provides implementation of API specification • Can have multiple SDKs for an API • Examples of SDKs • MPICH, Motif Widgets

  17. Syntax • Rules for encoding information, e.g. • XML, Condor ClassAds, Globus RSL • X.509 certificate format (RFC 2459) • Cryptographic Message Syntax (RFC 2630) • Distinct from protocols • One syntax may be used by many protocols (e.g., XML); & useful for other purposes • Syntaxes may be layered • E.g., Condor ClassAds -> XML -> ASCII • Important to understand layerings when comparing or evaluating syntaxes

  18. A Protocol can have Multiple APIsE.g., TCP/IP • TCP/IP APIs include BSD sockets, Winsock, System V streams, … • The protocol provides interoperability: programs using different APIs can exchange information • I don’t need to know remote user’s API Application Application WinSock API Berkeley Sockets API TCP/IP Protocol: Reliable byte streams

  19. Application Application MPI API MPI API LAM SDK MPICH-P4 SDK LAM protocol MPICH-P4 protocol Different message formats, exchange sequences, etc. TCP/IP TCP/IP An API can have Multiple ProtocolsE.g., Message Passing Interface • MPI provides portability: any correct program compiles & runs on a platform • Does not provide interoperability: all processes must link against same SDK • E.g., MPICH and LAM versions of MPI

  20. The Systems Problem

  21. The Systems Problem:Resource Sharing Mechanisms That … • Address security and policy concerns of resource owners and users • Are flexible enough to deal with many resource types and sharing modalities • Scale to large number of resources, many participants, many program components • Operate efficiently when dealing with large amounts of data & computation

  22. Aspects of the Systems Problem • Need for interoperability when different groups want to share resources • Diverse components, policies, mechanisms • E.g., standard notions of identity, means of communication, resource descriptions • Need for shared infrastructure services to avoid repeated development, installation • E.g., one port/service/protocol for remote access to computing, not one per tool/appln • E.g., Certificate Authorities: expensive to run • A common need for protocols & services

  23. Hence, a Protocol-Oriented View of Grid Architecture, that Emphasises … • Development of Grid protocols & services • Protocol-mediated access to remote resources • New services: e.g., resource brokering • “On the Grid” = speak Intergrid protocols • Mostly (extensions to) existing protocols • Development of Grid APIs & SDKs • Facilitate application development by supplying higher-level abstractions • The (hugely successful) model is the Internet

  24. Application Application Internet Protocol Architecture “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link Layered Grid Architecture(By Analogy to Internet Architecture)

  25. Grid Architecture Review • We now illustrate this architecture by describing a representative set of protocols • Several provided by Globus Toolkit, which • Defines, and provides quality reference implementations of key Grid protocols • Has been adopted as infrastructure by majority of major Grid projects

  26. Grid Services Architecture:Fabric Layer Protocols & Services • Just what you would expect: the diverse mix of resources that may be shared • Individual computers, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc. • Few constraints on low-level technology: connectivity and resource level protocols form the “neck in the hourglass” • Defined by interfaces not physical characteristics

  27. Grid Services Architecture:Connectivity Layer Protocols & Services • Communication • Internet protocols: IP, DNS, routing, etc. • Security: Grid Security Infrastructure (GSI) • Uniform authentication & authorization mechanisms in multi-institutional setting • Single sign-on, delegation, identity mapping • Public key technology, SSL, X.509, GSS-API • Supporting infrastructure: Certificate Authorities, key management, etc. GSI: www.globus.org

  28. Why Grid Security is Hard • Resources being used may be extremely valuable & the problems being solved extremely sensitive • Resources are often located in distinct administrative domains • Each resource may have own policies & procedures • The set of resources used by a single computation may be large, dynamic, and/or unpredictable • Not just client/server • It must be broadly available & applicable • Standard, well-tested, well-understood protocols • Integration with wide variety of tools

  29. Grid Security Requirements User View Resource Owner View 1) Specify local access control 2) Auditing, accounting, etc. 3) Integration w/ local systemKerberos, AFS, license mgr. 4) Protection from compromisedresources 1) Easy to use 2) Single sign-on 3) Run applicationsftp,ssh,MPI,Condor,Web,… 4) User based trust model 5) Proxies/agents (delegation) Developer View API/SDK with authentication, flexible message protection, flexible communication, delegation, ...Direct calls to various security functions (e.g. GSS-API)Or security integrated into higher-level SDKs: E.g. GlobusIO, Condor-G, MPICH-G2, HDF5, etc.

  30. Grid Security Infrastructure (GSI) • Extensions to existing standard protocols & APIs • Standards: SSL/TLS, X.509 & CA, GSS-API • Extensions for single sign-on and delegation • Globus Toolkit reference implementation of GSI • SSLeay/OpenSSL + GSS-API + delegation • Tools and services to interface to local security • Simple ACLs; SSLK5 & PKINIT for access to K5, AFS, etc. • Tools for credential management • Login, logout, etc. • Smartcards • MyProxy: Web portal login and delegation • K5cert: Automatic X.509 certificate creation

  31. Single sign-on via “grid-id” & generation of proxy cred. Or: retrieval of proxy cred. from online repository Remote process creation requests* GSI-enabled GRAM server Authorize Map to local id Create process Generate credentials Ditto GSI-enabled GRAM server Process Process Communication* Local id Local id Kerberos ticket Restricted proxy Remote file access request* Restricted proxy User Proxy GSI-enabled FTP server Proxy credential Authorize Map to local id Access file * With mutual authentication GSI in Action: “Create Processes at A and B that Communicate & Access Files at C” User Site B (Unix) Site A (Kerberos) Computer Computer Site C (Kerberos) Storage system

  32. GSI Working Group Documents • Grid Security Infrastructure (GSI) Roadmap • Informational draft overview of working group activities and documents • Grid Security Protocols & Syntax • X.509 Proxy Certificates • X.509 Proxy Delegation Protocol • The GSI GSS-API Mechanism • Grid Security APIs • GSS-API Extensions for the Grid • GSI Shell API

  33. GSI Futures • Scalability in numbers of users & resources • Credential management • Online credential repositories (“MyProxy”) • Account management • Authorization • Policy languages • Community authorization • Protection against compromised resources • Restricted delegation, smartcards

  34. 1. CAS request, with user/group CAS resource names membership Does the and operations collective policy resource/collective authorize this 2. CAS reply, with membership request for this capability and resource CA info user? collective policy information Resource 3. Resource request, authenticated with Is this request capability authorized by the local policy capability? information 4. Resource reply Is this request authorized for the CAS? Community Authorization(Prototype shown August 2001) User Laura Pearlman, Steve Tuecke, Von Welch, others

  35. Grid Services Architecture:Resource Layer Protocols & Services • Grid Resource Access and Mgmt (GRAM) • Remote allocation, reservation, monitoring, control of compute resources • GridFTP protocol (FTP extensions) • High-performance data access & transport • Grid Resource Information Service (GRIS) • Access to structure & state information • Network reservation, monitoring, control • All integrated with GSI: authentication, authorization, policy, delegation

  36. Resource Management Problem • Enabling secure, controlled remote access to computational resources and management of remote computation • Authentication and authorization • Resource discovery & characterization • Reservation and allocation • Computation monitoring and control • Addressed by new protocols & services • GRAM protocol as a basic building block • Resource brokering & co-allocation services • GSI for security, MDS for discovery

  37. GRAM Protocol • Simple HTTP-based RPC • Job request • Returns a “job contact”: Opaque string that can be passed between clients, for access to job • Job cancel • Job status • Job signal • Event notification (callbacks) for state changes • Pending, active, done, failed, suspended • Moving to SOAP (more later…)

  38. GridFTP: Basic Approach • FTP is defined by several IETF RFCs • Start with most commonly used subset • Standard FTP: get/put etc., 3rd-party transfer • Implement standard but often unused features • GSS binding, extended directory listing, simple restart • Extend in various ways, while preserving interoperability with existing servers • Parameter set/negotiate, parallel transfers (multiple TCP streams), striped transfers (multiple hosts), partial file transfers, automatic & manual TCP buffer setting, progress monitoring, extended restart (via plug-ins)

  39. Grid Information Service:Discovery & Monitoring • Provide access to static and dynamic information regarding system components • Large numbers of “sensors”, in resources, services, applications, etc. • A basis for configuration and adaptation in heterogeneous, dynamic environments • Requirements and characteristics • Uniform, flexible access to information • Scalable, efficient access to dynamic data • Access to multiple information sources • Decentralized maintenance

  40. The GIS Problem: Many Information Sources, Many Views

  41. MDS Structure: Overview

  42. MDS Protocols

  43. Grid Services Architecture:Collective Layer Protocols & Services • Index servers aka metadirectory services • Custom views on dynamic resource collections assembled by a community • Resource brokers (e.g., Condor Matchmaker) • Resource discovery and allocation • Replica catalogs • Co-reservation and co-allocation services • Etc., etc.

  44. The Programming Problem

  45. The Programming Problem • How the @#$@ do I develop robust, secure, long-lived applications for dynamic, heterogeneous, Grids? • I need, presumably: • Abstractions and models to add to speed/robustness/etc. of development • Tools to ease application development and diagnose common problems • Code/tool sharing to allow reuse of code components developed by others

  46. Grid Programming Technologies • “Grid applications” are incredibly diverse (data, collaboration, computing, sensors, …) • Seems unlikely there is one solution • Most applications have been written “from scratch,” with or without Grid services • Application-specific libraries have been shown to provide significant benefits • No new language, programming model, etc., has yet emerged that transforms things • But certainly still quite possible

  47. Examples of GridProgramming Technologies • MPICH-G2: Grid-enabled message passing • CoG Kits, GridPort: Portal construction, based on N-tier architectures • GDMP, Data Grid Tools, SRB: replica management, collection management • Condor-G: simple workflow management • Legion: object models for Grid computing • Cactus: Grid-aware numerical solver framework • Note tremendous variety, application focus

  48. MPICH-G2: A Grid-Enabled MPI • A complete implementation of the Message Passing Interface (MPI) for heterogeneous, wide area environments • Based on the Argonne MPICH implementation of MPI (Gropp and Lusk) • Globus services for authentication, resource allocation, executable staging, output, etc. • Programs run in wide area without change • See also: MetaMPI, PACX, STAMPI, MAGPIE www.globus.org/mpi

  49. Grid-based Computation: Challenges • Locate “suitable” computers • Authenticate with appropriate sites • Allocate resources on those computers • Initiate computation on those computers • Configure those computations • Select “appropriate” communication methods • Compute with “suitable” algorithms • Access data files, return output • Respond “appropriately” to resource changes

  50. Locates hosts MDS Submits multiple jobs globusrun Stages executables GASS Coordinates startup DUROC Authenticates GRAM GRAM GRAM Detects termination Initiates job fork LSF LoadLeveler Monitors/controls P2 P2 P2 P1 P1 P1 Communicates via vendor-MPI and TCP/IP (globus-io) MPICH-G2 Use of Grid Services % grid-proxy-init % mpirun -np 256 myprog Generates resource specification mpirun

More Related