200 likes | 208 Views
Grid Technology: Introduction & Overview. Ian Foster Argonne National Laboratory University of Chicago. Including New Zealand!. Grid Technologies: Expanding the Horizons of HEP Computing. Enabling thousands of physicists to harness the resources of
E N D
Grid Technology: Introduction & Overview Ian Foster Argonne National Laboratory University of Chicago LCG 13.3.2002
Including New Zealand! Grid Technologies: Expanding the Horizons of HEP Computing Enabling thousands of physicists to harness the resources of hundreds of institutions in pursuit of knowledge LCG 13.3.2002
The Grid Problem Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations LCG 13.3.2002
Some “Large” Grid Issues(H. Newman) • Consistent transaction management • Query (task completion time) estimation • Queuing and co-scheduling strategies • Load balancing (e.g., Self Organizing Neural Network) • Error Recovery: Fallback and Redirection Strategies • Strategy for use of tapes • Extraction, transport and caching of physicists’ object-collections; Grid/Database Integration • Policy-driven strategies for resource sharing among sites and activities; policy/capability tradeoffs • Network Performance and Problem Handling • Monitoring and Response to Bottlenecks • Configuration and Use of New-Technology Networks e.g. Dynamic Wavelength Scheduling or Switching • Fault-Tolerance, Performance of the Grid Services Architecture LCG 13.3.2002
How Large is “Large”? • Is the LHC Grid • Just the O(10) Tier 0/1 sites and O(20,000) CPUs? • + the O(50) Tier 2 sites: O(40,000) CPUs? • + the collective computing power of O(300) LHC institutions: perhaps O(60,000) CPUs in total? • Are the LHC Grid users • The experiments and their relatively few, well-structured “production” computing activities? • The curiosity-driven work of 1000s of physicists? • Depending on our answer, the LHC Grid is • A relatively simple deployment of today’s technology • A significant information technology challenge LCG 13.3.2002
The Problem:Resource Sharing Mechanisms That … • Address security and policy concerns of resource owners and users • Are flexible enough to deal with many resource types and sharing modalities • Scale to large number of resources, many participants, many program components • Operate efficiently when dealing with large amounts of data & computation LCG 13.3.2002
Aspects of the Problem • Need for interoperability when different groups want to share resources • Diverse components, policies, mechanisms • E.g., standard notions of identity, means of communication, resource descriptions • Need for shared infrastructure services to avoid repeated development, installation • E.g., one port/service/protocol for remote access to computing, not one per tool/appln • E.g., Certificate Authorities: expensive to run • A common need for protocols & services LCG 13.3.2002
Hence, Grid ArchitectureMust Address • Development of Grid protocols & services • Protocol-mediated access to remote resources • New services: e.g., resource brokering • “On the Grid” = speak Intergrid protocols • Mostly (extensions to) existing protocols • Development of Grid APIs & SDKs • Interfaces to Grid protocols & services • Facilitate application development by supplying higher-level abstractions • The (hugely successful) model is the Internet LCG 13.3.2002
Application “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity “Controlling things locally”: Access to, & control of, resources Fabric Grid Architecture LCG 13.3.2002 For more info: www.globus.org/research/papers/anatomy.pdf
HENP Grid Architecture(H. Newman) • Physicists’ Application Codes • Reconstruction, Calibration, Analysis • Experiments’ Software Framework Layer • Modular and Grid-aware: Architecture able to interact effectively with the lower layers (above) • Grid Applications Layer (Parameters and algorithms that govern system operations) • Policy and priority metrics • Workflow evaluation metrics • Task-Site Coupling proximity metrics • Global End-to-End System Services Layer • Monitoring and Tracking Component performance • Workflow monitoring and evaluation mechanisms • System self-monitoring, evaluation and optimization mechanisms LCG 13.3.2002
Architecture (1): Fabric Layer • Diverse resources that may be shared • Computers, clusters, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc. • Speak connectivity, resource protocols • The neck of the protocol hourglass • May implement standard behaviors • Reservation, pre-emption, virtualization • Grid operation can have profound implications for resource behavior Registration, enquiry, management, access protocol(s) Grid resource LCG 13.3.2002
Architecture (2):Connectivity Layer Protocols & Services • Communication • Internet protocols: IP, DNS, routing, etc. • Security: Grid Security Infrastructure (GSI) • Uniform authentication & authorization mechanisms in multi-institutional setting • Single sign-on, delegation, identity mapping • Public key technology, SSL, X.509, GSS-API (several Internet drafts document extensions) • Supporting infrastructure: Certificate Authorities, key management, etc. LCG 13.3.2002
Architecture (3):Resource Layer Protocols & Services • Resource management: GRAM • Remote allocation, reservation, monitoring, control of [compute=>arbitrary] resources • Data access: GridFTP • High-performance data access & transport • Information/monitoring • MDS: Access to structure & state information • GMA • & others : database access, code repository access, virtual data, … • All integrated with GSI LCG 13.3.2002
Grid Services Architecture (4):Collective Layer Protocols & Services • Community membership & policy • E.g., Community Authorization Service • Index/metadirectory/brokering services • E.g., Globus GIIS, Condor Matchmaker, DAGMAN • Replica management and replica selection • E.g., GDMP • Optimize aggregate data access performance • Co-reservation and co-allocation services • End-to-end performance • Middle tier services • MyProxy credential repository, portal services LCG 13.3.2002
Evolution of Grid Architecture • Up to 1998 • Basic mechanisms: Authentication, virtualization, resource management, information/monitoring • Condor, Globus Toolkit, SRB, etc. • Early application experiments on O(60) site testbeds • 1999-2001 • Data Grid protocols and services; GDMP, GridFTP, DRM, etc. • First experiences with production operation • 2002- • Further evolution in protocol base (Web services) • Higher-level services, reliability, scalability LCG 13.3.2002
The Grid Information Problem • Large numbers of distributed “sensors” with different properties • Need for different “views” of this information, depending on community membership, security constraints, intended purpose, sensor type LCG 13.3.2002
Grid Information Architecture Registration & enquiry protocols, information models, query languages • Provides standard interfaces to sensors • Supports different “directory” structures supporting various discovery/access strategies LCG 13.3.2002
Web Services • “Web services” provide • A standard interface definition language (WSDL) • Standard RPC protocol (SOAP) [but not required] • Emerging higher-level services (e.g., workflow) • Nothing to do with the Web • Useful framework/toolset for Grid applications? • See proposed Open Grid Services Architecture • Represent a natural evolution of current technology • No need to change any existing plans • Introduce in phased fashion when available • Maintain focus on hard issues: how to structure services, build applications, operate Grids LCG 13.3.2002 For more info: www.globus.org/research/papers/physiology.pdf
Identifying and AddressingTechnology Challenges 1) Identify and correct critical technology challenges • We don’t know all of the problems yet 2) Develop coherent Grid technology architecture • To conserve scarce resources; for experiments • Both challenges can be addressed by a pragmatic, experiential strategy • Build and run joint testbeds of increasing size • Gain experience “at scale” • Mix and match technologies • Coordinated projects to resolve problems LCG 13.3.2002
Summary • We have a solid base on which we can build • Still learning how to deploy and operate • Success of LCG (and EDG, GriPhyN, PPDG, …) requires • Focused, methodical effort to deploy and operate • Continued iteration on core components • Collaborative design and development of higher-level services • Early adoption and experimentation by experiments • We are not alone in these endeavors • Dozens of other Grid projects worldwide • Significant and growing industrial participation LCG 13.3.2002