360 likes | 375 Views
GRIDS Center Overview John McGee, USC/ISI. NSF Middleware Initiative. June 26, 2002 Internet2 – Base CAMP Boulder, Colorado. GRIDS Center G rid R esearch I ntegration D evelopment & S upport http://www.grids-center.org USC/ISI - Chicago - NCSA – SDSC - Wisconsin. Agenda.
E N D
GRIDS Center OverviewJohn McGee, USC/ISI NSF Middleware Initiative June 26, 2002 Internet2 – Base CAMP Boulder, Colorado
GRIDS Center Grid Research Integration Development & Support http://www.grids-center.org USC/ISI - Chicago - NCSA – SDSC - Wisconsin
Agenda • Vision for Grid Technology • GRIDS Center Operations • Software Components • Packaging and Testing • Documentation and Support • Testbed • Globus Security and Resource Discovery • Campus Enterprise Integration
Enabling Seamless Collaboration • GRIDS help distributed communities pursue common goals • Scientific research • Engineering design • Education • Artistic creation • Focus is on the enabling mechanisms required for collaboration • Resource sharing as a fundamental concept
Grid Computing Rationale • The need for flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources See “The Anatomy of the Grid: Enabling Scalable Virtual Organizations” by Foster, Kesselman, Tuecke at http://www.globus.org (in the “Publications” section) • The need for communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals while assuming the absence of: • central location • central control • omniscience • existing trust relationships
Elements of Grid Computing • Resource sharing • Computers, storage, sensors, networks • Sharing is always conditional, based on issues of trust, policy, negotiation, payment, etc. • Coordinated problem solving • Beyond client-server: distributed data analysis, computation, collaboration, etc. • Dynamic, multi-institutional virtual organizations • Community overlays on classic org structures • Large or small, static or dynamic
Resource-Sharing Mechanisms • Should address security and policy concerns of resource owners and users • Should be flexible and interoperable enough to deal with many resource types and sharing modes • Should scale to large numbers of resources, participants, and/or program components • Should operate efficiently when dealing with large amounts of data & computational power
Grid Applications • Science portals • Help scientists overcome steep learning curves of installing and using new software • Solve advanced problems by invoking sophisticated packages remotely from Web browsers or thin clients • Portals are currently being developed in biology, fusion, computational chemistry, and other disciplines • Distributed computing • High-speed workstations and networks can yoke together an organization's PCs to form a substantial computational resource
Mathematicians Solve NUG30 • Looking for the solution to the NUG30 quadratic assignment problem • An informal collaboration of mathematicians and computer scientists • Condor-G delivered 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites) • 14,5,28,24,1,3,16,15, • 10,9,21,2,4,29,25,22, • 13,26,17,30,6,20,19, • 8,18,7,27,12,11,23 MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin
More Grid Applications • Large-scale data analysis • Science increasingly relies on large datasets that benefit from distributed computing and storage • Computer-in-the-loop instrumentation • Data from telescopes, synchrotrons, and electron microscopes are traditionally archived for batch processing • Grids are permitting quasi-real-time analysis that enhances the instruments’ capabilities • E.g., with sophisticated “on-demand” software, astronomers may be able to use automated detection techniques to zoom in on solar flares as they occur
~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Online System Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 FermiLab ~4 TIPS France Regional Centre Germany Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Caltech ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS HPSS HPSS HPSS HPSS HPSS ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Tier 4 Physicist workstations Data Grids forHigh Energy Physics
Still More Grid Applications • Collaborative work • Researchers often want to aggregate not only data and computing power, but also human expertise • Grids enable collaborative problem formulation and data analysis • E.g., an astrophysicist who has performed a large, multi-terabyte simulation could let colleagues around the world simultaneously visualize the results, permitting real-time group discussion • E.g., civil engineers collaborate to design, execute, & analyze shake table experiments
Tier0/1 facility Tier2 facility Tier3 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link iVDGL:International Virtual Data Grid Laboratory U.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org
The 13.6 TF TeraGrid:Computing at 40 Gb/s Site Resources Site Resources 26 HPSS HPSS 4 24 External Networks External Networks 8 5 Caltech Argonne External Networks External Networks NCSA/PACI 8 TF 240 TB SDSC 4.1 TF 225 TB Site Resources Site Resources HPSS UniTree TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org
Portal Example • NPACI HotPage • https://hotpage.npaci.edu/
General Approach • Define Grid protocols & APIs • Protocol-mediated access to remote resources • Integrate and extend existing standards • “On the Grid” = speak “Intergrid” protocols • Develop a reference implementation • Open source Globus Toolkit • Client and server SDKs, services, tools, etc. • Grid-enable wide variety of tools • Globus Toolkit, FTP, SSH, Condor, SRB, MPI, … • Learn through deployment and applications
Software Components • GRIDS Center software is a collection of packages developed in the academic research community • Protocol and architecture approach • Reference implementations • Each package has at least 2 production level implementations before inclusion into the Grids Center Software Suite
The Hourglass Model • Focus on architecture issues • Propose set of core services as basic infrastructure • Use to construct high-level, domain-specific solutions • Design principles • Keep participation cost low • Enable local control • Support for adaptation • “IP hourglass” model A p p l i c a t i o n s Diverse global services Core services Local OS
Software Components • Globus Toolkit • Core Grid computing toolkit • Condor-G • Advanced job submission and management infrastructure • Network Weather Service • Network capability prediction • KX.509 / KCA (NMI-EDIT) • Kerberos to PKI
The Globus Toolkit™ • The de facto standard for Grid computing • A modular “bag of technologies” addressing key technical problems facing Grid tools, services and applications • Made available under liberal open source license • Simplifies collaboration across virtual organizations • Authentication • Grid Security Infrastructure (GSI) • Scheduling • Globus Resource Allocation Manager (GRAM) • Dynamically Updated Request Online Coallocator (DUROC) • Resource description • Monitoring and Discovery Service (MDS) • File transfer • Global Access to Secondary Storage (GASS) • GridFTP
Condor-G • NMI-R1 will include Condor-G, an enhanced version of the core Condor software optimized to work with Globus Toolkit™ for managing Grid jobs
Network Weather Service • From UC Santa Barbara, NWS monitors and dynamically forecasts performance of network and computational resources • Uses a distributed set of performance sensors (network monitors, CPU monitors, etc.) for instantaneous readings • Numerical models’ ability to predict conditions is analogous to weather forecasting – hence the name • For use with the Globus Toolkit and Condor, allowing dynamic schedulers to provide statistical Quality-of-Service readings • NWS forecasts end-to-end TCP/IP performance (bandwidth and latency), available CPU percentage and available non-paged memory • NWS automatically identifies the best forecasting technique for any given resource
KX.509 for Converting Kerberos Certificates to PKI • Stand-alone client program from the University of Michigan • For a Kerberos-authenticated user, KX.509 acquires a short-term X.509 certificate that can be used by PKI applications • Stores the certificate in the local user's Kerberos ticket file • Systems that already have a mechanism for removing unused kerberos credentials may also automatically remove the X.509 credentials • Web browser may then load a library (PKCS11) to use these credentials for https • The client reads X.509 credentials from the user’s Kerberos cache and converts them to PEM, the format used by the Globus Toolkit
GRIDS Software Packaging • Grids Center software uses the Grid Packaging Technology 2.0 • Perl-based tool eases user installation and setup • GPT2: new version enables creation of RPMs • Lets users install from binaries with familiar packaging • Includes database of all packages, useful for verifying installations • Packaging enables: • Dependency checking • User customization of configuration • Easy upgrades, patches
Software Testing • University of Wisconsin is in charge of testing the GRIDS software for NMI releases • Platforms to date: • RedHat 7.2 on IA 32 • Solaris 8.0 on SPARC • Release 2 additions: • RedHat 7.2 on IA 64 • AIX-L • Testing includes: • Builds • Quality assurance • Interoperability of GRIDS components
Technical Support • First-level tech support handled at NCSA • One-stop-shop address for users: • nmi-support@nsf-middleware.org • All queries go to NCSA, which responds within 24 hours • Help requests that NCSA can’t answer get forwarded to people responsible for each of the components: • Globus Toolkit (U.of Chicago/Argonne/ISI) • Condor-G (U. of Wisconsin) • Network Weather Service (UC Santa Barbara) • KX.509 (Michigan) • PubCookie (U. Washington) • CPM
Integration Issues • NMI testbed sites will be early adopters, seeking integration of campus infrastructure and Grid computing • Via NMI partnerships, GRIDS will help identify points of intersection and divergence between Grid and enterprise computing • Authorization, authentication and security • Directory services • Emphasis is on open standards and architectures as the route to successful collaboration
Grid Security Infrastructure (GSI) • Globus Toolkit implements GSI protocols and APIs, to address Grid security needs • GSI protocols extends standard public key protocols • Standards: X.509 & SSL/TLS • Extensions: X.509 Proxy Certificates & Delegation • GSI extends standard GSS-API
Generic Security Service API • The GSS-API is the IETF draft standard for adding authentication, delegation, message integrity, and message protection to apps • For secure communication between two parties over a reliable channel (e.g. TCP) • GSS-API separates security from communication, which allows security to be easily added to existing communication code. • Effectively placing transformation filters on each end of the communication link • Globus Toolkit components all use GSS-API
Delegation • Delegation = remote creation of a (second level) proxy credential • New key pair generated remotely on server • Proxy cert and public key sent to client • Clients signs proxy cert and returns it • Server (usually) puts proxy in /tmp • Allows remote process to authenticate on behalf of the user • Remote process “impersonates” the user
Limited Proxy • During delegation, the client can elect to delegate only a “limited proxy”, rather than a “full” proxy • GRAM (job submission) client does this • Each service decides whether it will allow authentication with a limited proxy • Job manager service requires a full proxy • GridFTP server allows either full or limited proxy to be used
Sample Gridmap File • Gridmap file maintained by Globus administrator • Entry maps Grid-id into local user name(s) # Distinguished name Local # username "/C=US/O=Globus/O=NPACI/OU=SDSC/CN=Rich Gallup” rpg "/C=US/O=Globus/O=NPACI/OU=SDSC/CN=Richard Frost” frost "/C=US/O=Globus/O=USC/OU=ISI/CN=Carl Kesselman” u14543 "/C=US/O=Globus/O=ANL/OU=MCS/CN=Ian Foster” itf
Security Issues • GSI handles authentication, but authorization is a separate issue. • Management of authorization on a multi-organization grid is still an interesting problem. • The grid-mapfile doesn’t scale well, and works only at the resource level, not the collective level. • Data access exacerbates authorization issues, which has led us to CAS…