340 likes | 421 Views
Observations on Architecture, Protocols, Services, APIs, SDKs, and the Role of the Grid Forum. Ian Foster With: Carl Kesselman, Steven Tuecke Thanks also to: Bill Johnston, Marty Humphrey, Rusty Lusk, Reagan Moore, and others. Overview.
E N D
Observations on Architecture,Protocols, Services, APIs, SDKs, and the Role of the Grid Forum Ian Foster With: Carl Kesselman, Steven Tuecke Thanks also to: Bill Johnston, Marty Humphrey, Rusty Lusk, Reagan Moore, and others
Overview • The Grid problem: controlled resource sharing in multi-institutional settings • Standards as a means of enabling sharing of code, resources, services • Aside: definition, role, and importance of protocols, services, SDKs, APIs, etc. • A “Grid Architecture”: a categorization of protocols, services, SDKs, and APIs • Questions for the Grid Forum
The Grid Problem • Grid R&D has its origins in high-end computing & metacomputing, but… • In practice, the “Grid problem” is about resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations • Lack of central control, omniscience, trust • Primary challenge: to enable, maintain, and control the sharing of resources to achieve a common goal
Examples of Virtual Organizations • Members of a scientific collaboration • E.g., NSF PACIs, IPG, NEESgrid, GriPhyN • Sharing: computers, storage, software, … • Application service provider + customers • Sharing: ASP computers • Participants in peer-to-peer network • E.g., Gnutella, Napster, Entropia, … • Sharing: resources on individual PCs Tremendous variety in scope, timescale, types of sharing, etc.
Universal Nature of the Grid Problem • “Sharing” fundamental in many settings • Application Service Providers, Storage Service Providers, etc.; Peer-to-peer computing; Distributed computing; Business to business; … • Sharing issues not adequately addressed by existing technologies • Sharing at a deep level, across broad ranges of resources and in a general way • E.g., user provides ASP with controlled access to their data on an SSP: how?? • Grid community has unique experience
Creating Usable Grids:What are the Challenges? • Approaches to problem solving • Data Grids, distributed computing, peer-to-peer, collaboration grids, … • Structuring and writing programs • Abstractions, tools • Enabling resource sharing across distinct institutions • Resource discovery, access, reservation, allocation; authentication, authorization, policy; communication; fault detection and notification; …
What is the Role of Grid Forum in Enabling Grid Computing? • Information exchange, of course • Experiences, patterns, structures • Useful even if every application & Grid is a vertical “stovepipe” • Advocacy • Enabler of shared effort • In code development: libraries, tools, … • Via resource sharing: shared Grids • In infrastructure • Opinion: Long term, only the third is sufficiently compelling to justify GF
Q: How do we Enable Shared Effort?A: “Standards” are Required • To enable portability/sharing of code • E.g., MPI lets me write portable // programs • To enable resource sharing • E.g., IP lets my computer speak to yours • To enable shared infrastructure • E.g., X.509 lets me share Certificate Authorities • But what sorts of “standards”? • Variously, APIs/SDKs, protocols, syntax, … • Observe that these are sometimes confused, so let’s spend some time on definitions …
Some Important Definitions • Resource • Network protocol • Network enabled service • Application Programmer Interface (API) • Software Development Kit (SDK) • Syntax • Not discussed, but important: policies
Resource • An entity that is to be shared • E.g., computers, storage, data, software • Does not have to be a physical entity • E.g., Condor pool, distributed file system, … • Defined in terms of interfaces, not devices • E.g. scheduler such as LSF and PBS define a compute resource • Open/close/read/write define access to a distributed file system, e.g. NFS, AFS, DFS
Network Protocol • A formal description of message formats and a set of rules for message exchange • Rules may define sequence of message exchanges • Protocol may define state-change in endpoint, e.g., file system state change • Good protocols designed to do one thing • Protocols can be layered • Examples of protocols • IP, TCP, TLS (was SSL), HTTP, Kerberos
FTP Server Web Server HTTP Protocol FTP Protocol Telnet Protocol TLS Protocol TCP Protocol TCP Protocol IP Protocol IP Protocol Network Enabled Services • Implementation of a protocol that defines a set of capabilities • Protocol defines interaction with service • All services require protocols • Not all protocols are used to provide services (e.g. IP, TLS) • Examples: FTP and Web servers
Application Programmer Interface • A specification for a set of routines to facilitate application development • Refers to definition, not implementation • E.g., there are many implementations of MPI • Spec often language-specific (or IDL) • Routine name, number, order and type of arguments; mapping to language constructs • Behavior or function of routine • Examples • GSS API (security), MPI (message passing)
Software Development Kit • A particular instantiation of an API • SDK consists of libraries and tools • Provides implementation of API specification • Can have multiple SDKs for an API • Examples of SDKs • MPICH, Motif Widgets
Syntax • Rules for encoding information, e.g. • XML, Condor ClassAds, Globus RSL • X.509 certificate format (RFC 2459) • Cryptographic Message Syntax (RFC 2630) • Distinct from protocols • One syntax may be used by many protocols (e.g., XML); & useful for other purposes • Syntaxes may be layered • E.g., Condor ClassAds -> XML -> ASCII • Important to understand layerings when comparing or evaluating syntaxes
A Protocol can have Multiple APIsE.g., TCP/IP • TCP/IP APIs include BSD sockets, Winsock, System V streams, … • The protocol provides interoperability: programs using different APIs can exchange information • I don’t need to know remote user’s API Application Application WinSock API Berkeley Sockets API TCP/IP Protocol: Reliable byte streams
Application Application MPI API MPI API LAM SDK MPICH-P4 SDK LAM protocol MPICH-P4 protocol Different message formats, exchange sequences, etc. TCP/IP TCP/IP An API can have Multiple ProtocolsE.g., Message Passing Interface • MPI provides portability: any correct program compiles & runs on a platform • Does not provide interoperability: all processes must link against same SDK • E.g., MPICH and LAM versions of MPI
Programming Problem Systems Problem Back to Grids:The Programming & Systems Problems • Approaches to problem solving • Data Grids, distributed computing, peer-to-peer, collaboration grids, … • Structuring and writing programs • Abstractions, tools • Enabling resource sharing across distinct institutions • Resource discovery, access, reservation, allocation; authentication, authorization, policy; communication; fault detection and notification; …
Aspects of the Programming Problem • Need for abstractions and models to add to speed/robustness/etc. of development • E.g., OO abstractions, MPI for messaging • Need for code/tool sharing to allow reuse of code components developed by others • E.g., MPI allows reuse of message passing • E.g., standard profilers, debuggers • Primary need is for standard programming environments: APIs and SDKs
Aspects of the Systems Problem • Need for interoperability when different groups want to share resources • Diverse components, policies, mechanisms • E.g., standard notions of identity, means of communication, resource descriptions • Need for shared infrastructure services to avoid repeated development, installation • E.g., one port/service for remote access to computing, not one per tool/application • E.g., Certificate Authorities: expensive to run • Need standard protocols, services, syntax
I.e., Standard APIs and Protocols are Both Important: For Different Reasons • Standard APIs/SDKs are important • They enable application portability • But w/o standard protocols, interoperability is hard (every SDK speaks every protocol?) • Standard protocols are important • Enable cross-site interoperability • Enable shared infrastructure • But w/o standard APIs/SDKs, application portability is hard (different platforms access protocols in different ways)
Grid “Architecture” • We now proceed to analyze Grid systems with respect to standards • Identify key areas where protocols, services, APIs, and SDKs can occur • Result is a layered protocol architecture • We assert this can be useful as a means of describing and structuring Grid Forum activities
Application “Specialized services”: user- or appln-specific distributed services Application User Internet Protocol Architecture “Managing multiple resources”: ubiquitous infrastructure services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link Layered Grid Architecture(By Analogy to Internet Architecture)
Protocols, Services, and InterfacesOccur at Each Level Applications Languages/Frameworks User Service APIs and SDKs User Service Protocols User Services Collective Service APIs and SDKs Collective Service Protocols Collective Services Resource APIs and SDKs Resource Service Protocols Resource Services Connectivity APIs Connectivity Protocols Local Access APIs and Protocols Fabric Layer
An Aside on Terminology • Is this an “architecture” or just a “categorization” or “taxonomy”? • A matter of opinion (c.f. IAB: “Many members of the Internet community would argue that there is no architecture”) • Our opinion: it is somewhere in between, but is useful regardless • Becomes more architectural if/as we define “necessary” pieces at each level • Note that protocols says nothing about SDKs/APIs architecture (& vice versa)
Important Points • We build on Internet protocols • Communication, routing, name resolution, etc. • “Layering” here is conceptual, does not imply constraints on who can call what • Protocols/services/APIs/SDKs will, ideally, be largely self-contained • But some things are fundamental: e.g., communication and security • But, advantageous for higher-level functions to use common lower-level functions
API SDK Lookup Protocol Source Code Repository API SDK Access Protocol Compute Resource Example: User Portal Appln Web Portal Source code discovery, application configuration User Brokering, co-allocation, certificate authorities Collective Access to data, access to computers, access to network performance data Resource Communication, service discovery (DNS), authentication, authorization, delegation Connect Storage systems, schedulers Fabric
API SDK C-point Protocol Checkpoint Repository API SDK Access Protocol Compute Resource Example:High-Throughput Computing System Appln High Throughput Computing System Dynamic checkpoint, job management, failover, staging User Brokering, certificate authorities Collective Access to data, access to computers, access to network performance data Resource Communication, service discovery (DNS), authentication, authorization, delegation Connect Storage systems, schedulers Fabric
Standards, Again:Intergrid Protocols and Grid APIs • One or many protocols? • No one “right” protocol for any one function • But: interoperability requires that we define and commit to core “Intergrid” protocols • Definition: “A resource is Grid-enabled if it speaks Intergrid protocols” • One or many APIs and SDKs? • Many APIs, SDKs, programming models can target Intergrid protocols • But: code sharing requires standards • So, e.g., “standard Grid collaboration APIs”
Questions for the Grid Forum • Is the “Grid architecture” described here a useful framework? • Could it be made more useful? • Are there things that it fails to capture or misrepresents? • Would it be a useful discipline for us to try to place GF efforts in this context • E.g., be clear whether we are defining a protocol, service, API, SDK, syntax (or something else: which is fine, too) • E.g., explain (and argue about) where in the stack different pieces fit
Questions for the Grid Forum • Are some things easier, or more important, to standardize than others? • Protocols vs. APIs vs. syntax • Connectivity vs. resource vs. collective vs. user layer protocols/services/APIs/SDKs • I would suggest that • Items lower in the stack tend to have broader impact, but standards useful at all levels • Size of community effected (e.g., number of adopters) is the key figure of merit • We should ask explicitly for such an analysis as part of a WG charter
Questions for the Grid Forum • Can we define core “intergrid protocols”? • I.e., instantiate (lower) layers in the diagram • We have avoided it until now (implies choice) • Until we do, interoperability is difficult • Possible approaches • Avoid seeking consensus, instead standardize where it makes sense and where we can; rely on sense of “best practice” emerging • Or, create an architecture WG, charged with defining requirements for “core protocols”?? • I think latter is better, unsure if it can work
Summary • Grids are about [large-scale] sharing • Hence require standard protocols to enable interoperability and shared infrastructure • And, of course standard APIs and SDKs to enable portability & code sharing • Both important; but very different • Well defined architecture can help understanding & progress • Provides a framework for figuring out where the pieces fit • Facilitates asking questions such as “where are standards particularly important?”