360 likes | 444 Views
Grid Technologies Research and Development. Ian Foster Argonne National Laboratory The University of Chicago. Credits. Globus project Co-PI: Carl Kesselman, USC Globus resarchers and developers at ANL, USC/ISI, NCSA, and elsewhere
E N D
Grid TechnologiesResearchand Development Ian Foster Argonne National Laboratory The University of Chicago
Credits • Globus project Co-PI: Carl Kesselman, USC • Globus resarchers and developers at ANL, USC/ISI, NCSA, and elsewhere • Steve Tuecke, Randy Butler, Steve Fitzgerald, Brian Toonen, Gregor von Laszewski, and many others • Research supported by DARPA, DOE, NSF, NASA; equipment from Cisco Systems
Grid Services Architecture:An Emerging Grid Computing Framework … a rich variety of applications ... Applns Appln Toolkits Remote data toolkit Async. collab. toolkit Remote sensors toolkit Remote comp. toolkit Remote viz toolkit ... Protocols, authentication, policy, resource management, instrumentation, discovery, etc., etc. Grid Services Grid Fabric Archives, networks, computers, display devices, etc.; associated local services
Overview • Why Grid Services? • Review of existing Grid services • Security • Information/directory • Resource management • Data access • Our current research focus areas • Grid Forum and b-Grid project
Creating a Usable Grid :Grid Services (“Middleware”) • Standard grid services that • Provide uniform, high-level access to a wide range of resources (including networks) • Address interdomain issues of security, policy, etc. • Permit application-level management and monitoring of end-to-end performance • Middleware-level and higher-level APIs and tools targeted at application programmers • Map between application and Grid
The Challenge of Heterogeneity • Group • Institutions, people; policies • Resources • Hardware: computers, archives, networks, ... • Interface • Software, mechanisms • Distance • Local, campus, metropolitan, wide area • Scale • Single CPU, cluster, supercomputer, ...
Grid Services Approach • Define and deploy standard Grid services that encapsulate heterogeneity • Simple: Cost of joining Grid is low • Noncoercive: Sites retain local control • Uniform: Cost of using Grid is low • Use a Grid information service to represent structure and status of Grid elements • Resource discovery • Application configuration and optimization • Build Grid-enabled tools to enable applications
Grid Services • Security: authentication, authorization • Information: publication, delivery • Resource management: reservation, allocation, monitoring, control • Data: data access, replica management, metadata access • + fault detection, executable management, accounting, others
Grid Services (1):Grid Security Infrastructure • Define uniform authentication and authorization mechanisms that allow cooperating sites to accept credentials while retaining local control • Benefit: Only one A/A infrastructure needs to be maintained at each site; enables inter-site resource sharing & interoperability • Requires • Authentication/authorization standards • Certification authority policies
Authentication • “Grid Security Infrastructure” • Single sign-on via global credential, PKI mechanisms, mapping to local credentials • Delegation • No plaintext passwords • Retains local control over policy • Deployed across PACI and NASA sites • GSS-API binding, used by ssh, SecureCRT, gsiftp, Globus, Condor, others • GAA (Generic Authorization & Access Control) interface provides hooks for policy
Site 1 Site 2 global-to-local mapping table global-to-local mapping table Resource Proxy Resource Proxy Crp Crp Process Process Cp Cp Process Process Local Policy and Mechanisms Local Policy and Mechanisms Security Architecture Protocol 1: user proxy creation Host User Crp User Proxy Protocol 2: resource allocation Cp Protocol 3: Resource allocation from a process
Grid Services (2):Grid Information Service • Effective resource use predicated on knowledge of system components • Publish structure and state info, dynamic performance info, software info, etc., etc. • Selection and scheduling of resources • Resource discovery: “find me an X with property Y available at time T” • Auto-configuration: “tell me what I need to know to use A efficiently/securely/...” • Gateways to other data sources required
Information ServicesTechnical Approaches • Infrastructure based on common protocols • LDAP as unifying communication protocol • Gateways to alternative information sources and organization • Research questions include • Unifying metadata representation • How to support range of access modes • Scalability of collection and publication methods • Index methods and discovery
ISI NCSA NCSA NCSA U.Tenn Distributed Information Services RootServers ReferralServer Replicated servers mds.globus.org:389 NCSA NASA DOE NPACI Remos NWS SNMP Organization Servers Index Server(s)
Grid Services (3):Resource Management • Issues include: • Locating and selecting resources • Allocating resources • Authentication, process creation • Other activities required to prepare a resource for use; monitoring, control • End-to-end management/co-allocation • Diverse resources: CPU, disk, network • Reservation
Resource Management Services • Globus Resource Allocation Manager (GRAM) • Uniform interface to resource management • Integration with security, policy • Co-allocation services • Coordinated allocation across multiple resources • Globus Arch. for Reservation and Allocation • Network and CPU quality of service • Immediate and advance reservations • Resource brokers: e.g., Condor
“10 GFlops, EOS data, 20 Mb/sec -- for 20 mins” GRAM GRAM GRAM Resource Management Architecture Info service: location + selection Metacomputing Directory Service Resource Broker “What computers?” “What speed?” “When available?” “20 Mb/sec” GRAM Globus Resource Allocation Managers “50 processors + storage from 10:20 to 10:40 pm” Fork LSF EASYLL Condor etc.
Local Resource Management MDS client API calls to locate resources GRAM Client MDS Update MDS with resource state information GRAM client API calls to request resource allocation and process creation. Site boundary GramReporter Query current status of resource Gatekeeper Local Resource Manager Create Authentication Allocate & create processes Request Job Manager Globus Security Infrastructure Process Parse Monitor & control Process RSL Library Process
Advanced Resource Management • Provide end-to-end Quality of Service to applications. This requires: • Discovery and selection of resources • Allocation of resources • Advance reservation of resources Workstation Workstation Supercomputer Supercomputer Workstation Workstation
GARA and Differentiated Services Server Client GARA API Diffserv Resource Manager Diffserv Resource Manager
Integrated Policy Management • Required to control reservation and scheduling • Determine who can to what to whom • Integral part of resource management • Resource application, applicationresource • Next step after authentication • Need to integrate with and augment existing approaches • Access control lists, capabilities, usage certificates
Policy: Technical Approaches • Single API to alternative mechanisms • Similar to security infrastructure • Integration with Globus security model and Globus resource management components • Basic policy mechanism in current system • Research questions • Reusable policy structures for resource specification/management • Policy aware resource discovery/scheduling
Grid Services (4):Storage and I/O Services • Access to remote data (GASS) • Uniform access to diverse storage management systems • Cache management • High-speed, secure transport: gsiftp • Integration with metadata & storage systems • Communication (Nexus, GlobusIO) • Application-level interfaces to comm services • Multiple methods: reliable/unreliable, IP/other, unicast/multicast • Quality of service interfaces
Current Technology Focus Areas • Advanced resource management techniques • GARA: Globus Arch. for Resv. & Allocation • High-end data-intensive applications • “Data Grid” • Interfaces to commodity technologies • CoG Kit: Commodity Grid Toolkits • Distance visualization • NOVA: Network Optimized Visualization Arch. With supporting work on info/instr., policy, accounting, authentication/authorization, etc.
The Grid Forumhttp://www.gridforum.org • IETF-like community forum for discussion & definition of Grid infrastructure • First two meetings (June 16-18, Oct 18-20) attracted 150 people • 9 working groups established in security, information infrastructure, resource management, accounting, etc. • Next mtg: San Diego March 22-24 2000 • See also European Grid Forum • www.egrid.org
b-Grid(“Broadband Experimental Terascale Access”) • A proposal to NSF to plan (& build) a national infrastructure for computer systems research • dedicated to research • of a scale that permits realistic experimentation • of a scale that encourages participation by adventurous applications groups • a place for computer and application scientists to tackle problems together • Initial plan is for O(20) Linux clusters, each with O(30) nodes, O(2 TB) disk, Gb/s network http://dsl.cs.uchicago.edu/beta
Summary: Where We Are • Solid technology base for security, resource management, information services • Globus v1.1 completed, with all core services complete, robust, and documented • Many tool projects are leveraging this considerable investment in infrastructure • Substantial deployment activities and application experiments • New R&D in commodity grids, resource management, distance viz, data grids http://www.globus.org
Case Study 1:Online Instrumentation Advanced Photon Source wide-area dissemination desktop & VR clients with shared controls real-time collection archival storage tomographic reconstruction DOE X-ray source grand challenge: ANL, USC/ISI, NIST, U.Chicago
Case Study 2:Distributed Supercomputing • Starting point: SF-Express parallel simulation code • Globus mechanisms for • Resource allocation • Distributed startup • I/O and configuration • Fault detection • 100K vehicles (2002 goal) using 13 computers, 1386 nodes, 9 sites NCSA Origin Caltech Exemplar CEWES SP Maui SP SF-Express Distributed Interactive Simulation: Caltech, USC/ISI
OVERFLOW with latency-tolerant algorithms MPICH-G “Grid-enabled” message passing Globus services Security Directory Scheduling Process mgmt Communication ARC SGI O2000 (California) Argonne SGI O2000 (Illinois) OVERFLOW simulation: NASA Ames
Case Study 3:Collaborative Engineering • Manipulate shared virtual space, with • Simulation components • Multiple flows: Control, Text, Video, Audio, Database, Simulation, Tracking, Haptics, Rendering • Uses Globus comms: (un)reliable uni/multicast • Future: Security, QoS, allocation, reservation CAVERNsoft: UIC Electronic Visualization Laboratory
Case Study 4:High-Throughput Computing • Schedule many independent tasks (e.g., parameter study) • Uses Globus security, discovery, data access, scheduling • Future: Reservation, accounting, code management, etc. Deadline Cost Available Machines Nimrod-G: Monash University
Case Study 5:Problem Solving Environment • Problem solving environment for comp. chemistry • Globus services used for authentication, remote job submission, monitoring, and control • Future: distributed data archive, resource discovery, charging ECCE’: Pacific Northwest National Laboratory