370 likes | 522 Views
The Anatomy of the Grid Enabling Scalable Virtual Organizations. David S. Angulo Dept. of Computer Science The University of Chicago and Mathematics & Computer Science Division Argonne National Laboratory http://www.cs.uchicago.edu/~dangulo. Ian Foster
E N D
The Anatomy of the GridEnabling Scalable Virtual Organizations David S. Angulo Dept. of Computer Science The University of Chicago and Mathematics & Computer Science Division Argonne National Laboratory http://www.cs.uchicago.edu/~dangulo Ian Foster Mathematics & Computer Science Division Argonne National Laboratory and Dept. of Computer Science The University of Chicago http://www.mcs.anl.gov/~foster 2nd US-Hungarian Workshop on Cluster and Grid Computing, February 6, 2002
Partial Acknowledgements • Globus ToolkitTM • R&D involves • many fine scientists & engineers at ANL/UofC, USC/ISI, and elsewhere (see www.globus.org) • Led by • Ian Foster @ Argonne/UofC • Carl Kesselman @ USC/ISI • Open Grid Services Architecture work performed by • Ian Foster, Globus Co-PI @ Argonne/UofC • Carl Kesselman, Globus Co-PI @ USC/ISI • Steve Tuecke, Globus Toolkit Architect @ANL • Jeff Nick, Steve Graham, Jeff Frey @ IBM • Strong collaborations with many outstanding EU, UK, US Grid projects • Support from DOE, NASA, NSF, Microsoft, IBM
The Grid Problem Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations
Why Grids? • A biochemist exploits 10,000 computers to screen 100,000 compounds in an hour • 1,000 physicists worldwide pool resources for petaflop analyses of petabytes of data • Civil engineers collaborate to design, execute, & analyze shake table experiments • Climate scientists visualize, annotate, & analyze terabyte simulation datasets • A home user invokes architectural design functions at an application service provider • An application service provider purchases cycles from compute cycle providers
Elements of the Problem • Resource sharing • Computers, storage, sensors, networks, … • Sharing always conditional: issues of trust, policy, payment, … • Coordinated problem solving • Beyond client-server: distributed data analysis, computation, … • Dynamic, multi-institutional virtual orgs • Community overlays on classic org structures • Large or small, static or dynamic
Grids: Why Now? • Moore’s law improvements in computing produce highly functional end systems • The Internet and burgeoning wired and wireless provide universal connectivity • Network exponentials produce dramatic changes in geometry and geography
Grids: Why Now? • Moore’s law improvements in computing produce highly functional endsystems • The Internet and burgeoning wired and wireless provide universal connectivity • Network exponentials produce dramatic changes in geometry and geography • 9-month doubling: double Moore’s law! • 1986-2001: x340,000; 2001-2010: x4000?
A Little History • Early 90s • Gigabit testbeds, metacomputing • Mid to late 90s • Early experiments (e.g., I-WAY), software projects (e.g., Globus), application experiments • 2002 • Major application communities emerging • Major infrastructure deployments are underway • Rich technology base has been constructed • Global Grid Forum: >1000 people on mailing lists, 192 orgs at last meeting, 28 countries
The Grid World: Current Status • Dozens of major Grid projects in scientific & technical computing/research & education • Deployment, application, technology • Considerable consensus on key concepts and technologies • Globus Toolkit™ has emerged as de facto standard for major protocols & services • Global Grid Forum has emerged as a significant force • And first “Grid” proposals at IETF
g g g g g g Selected Major Grid Projects New New
g g g g g g Selected Major Grid Projects New New New New New
g g g g g g Selected Major Grid Projects New New
g g Selected Major Grid Projects New New Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS See also www.gridforum.org
~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Online System Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 FermiLab ~4 TIPS France Regional Centre Germany Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 Tier2 Centre ~1 TIPS Caltech ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS HPSS HPSS HPSS HPSS HPSS ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Tier 4 Physicist workstations Grid Communities & Applications:Data Grids for High Energy Physics www.griphyn.org www.ppdg.net www.eu-datagrid.org
Grid Communities and Applications:Mathematicians Solve NUG30 • Community=an informal collaboration of mathematicians and computer scientists • Condor-G delivers 3.46E8 CPU seconds in 7 days (peak 1009 processors) in U.S. and Italy (8 sites) • Solves NUG30 quadratic assignment problem • 14,5,28,24,1,3,16,15, • 10,9,21,2,4,29,25,22, • 13,26,17,30,6,20,19, • 8,18,7,27,12,11,23 www.mcs.anl.gov/metaneos: Argonne, Iowa, NWU, Wisconsin
Grid Communities and Applications:Network for Earthquake Eng. Simulation • NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other • On-demand access to experiments, data streams, computing, archives, collaboration NEESgrid: Argonne, Michigan, NCSA, UIUC, USC www.neesgrid.org
The 13.6 TF TeraGrid:Computing at 40 Gb/s Site Resources Site Resources 26 HPSS HPSS 4 24 External Networks External Networks 8 5 Caltech Argonne External Networks External Networks NCSA/PACI 8 TF 240 TB SDSC 4.1 TF 225 TB Site Resources Site Resources HPSS UniTree TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org
Tier0/1 facility Tier2 facility Tier3 facility 10+ Gbps link 2.5 Gbps link 622 Mbps link Other link Intl. Virtual Data Grid Lab. www.ivdgl.org
Presenter mic Presenter camera Ambient mic (tabletop) Audience camera Access Grid • Collaborative work among large groups • ~50 sites worldwide • Use Grid services for discovery, security • www.scglobal.org Access Grid: Argonne, others www.accessgrid.org
Grid Architecture & Globus Toolkit™ • The question: • What is needed for resource sharing & coordinated problem solving in dynamic virtual organizations (VOs)? • The answer: • Major issues identified: membership, resource discovery & access, …, … • Grid architecture captures core elements, emphasizing pre-eminent role of protocols • Globus Toolkit™ has emerged as de facto standard for major protocols & services
The Critical Role of Protocols • Need for interoperability when different groups want to share resources • E.g., IP lets me talk to your computer, but how do we establish & maintain sharing? • How do I discover, authenticate, authorize, describe what I want to do, etc., etc.? • Need for shared infrastructure services to avoid repeated development, installation, e.g. • One port/service for remote access to computing, not one per tool/application • X.509 enables sharing of Certificate Authorities
Application Application Internet Protocol Architecture “Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services Collective “Sharing single resources”: negotiating access, controlling use Resource “Talking to things”: communication (Internet protocols) & security Connectivity Transport Internet “Controlling things locally”: Access to, & control of, resources Fabric Link Grid Architecture For more info: www.globus.org/research/papers/anatomy.pdf
Globus Project and Toolkit • Globus Project™ • R&D project at ANL, U.Chicago, USC/ISI • Emphasis on identifying and defining core protocols and services • O(40) researchers & developers • Globus Toolkit™ • A major product of the Globus Project • Open source software: reference implementation of core protocols & services • Growing open source developer community
Globus & Architecture (1):Fabric Layer • Diverse resources that may be shared • Computers, clusters, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc. • Speak connectivity, resource protocols • The neck of the protocol hourglass • May implement standard behaviors • Reservation, pre-emption, virtualization • Grid operation can have profound implications for resource behavior Registration, enquiry, management, access protocol(s) Grid resource
Globus & Architecture (2):Connectivity Layer Protocols & Services • Communication • Internet protocols: IP, DNS, routing, etc. • Security: Grid Security Infrastructure (GSI) • Uniform authentication & authorization mechanisms in multi-institutional setting • Single sign-on, delegation, identity mapping • Public key technology, SSL, X.509, GSS-API (several Internet drafts document extensions) • Supporting infrastructure: Certificate Authorities, key management, etc.
Single sign-on via “grid-id” & generation of proxy cred. Or: retrieval of proxy cred. from online repository Remote process creation requests* GSI-enabled GRAM server Authorize Map to local id Create process Generate credentials Ditto GSI-enabled GRAM server Process Process Communication* Local id Local id Kerberos ticket Restricted proxy Remote file access request* Restricted proxy User Proxy GSI-enabled FTP server Proxy credential Authorize Map to local id Access file * With mutual authentication GSI in Action: “Create Processes at A and B that Communicate & Access Files at C” User Site B (Unix) Site A (Kerberos) Computer Computer Site C (Kerberos) Storage system
Globus & Architecture (3):Resource Layer Protocols & Services • Resource management: GRAM • Remote allocation, reservation, monitoring, control of [compute] resources • Data access: GridFTP • High-performance data access & transport • Information: MDS (GRRP, GRIP) • Access to structure & state information • & others emerging: database access, code repository access, accounting, … • All integrated with GSI
GRAM ResourceManagement Protocol • Grid Resource Allocation & Management • Allocation, monitoring, control of computations • Secure remote access to diverse schedulers • Current evolution • Immediate and advance reservation • Multiple resource types: manage anything • Recoverable requests, timeout, etc. • Evolve to Web Services • Policy evaluation points for restricted proxies Karl Czajkowski, Steve Tuecke, others
Data Access & Transfer • GridFTP: extended version of popular FTP protocol for Grid data access and transfer • Secure, efficient, reliable, flexible, extensible, parallel, concurrent, e.g.: • Third-party data transfers, partial file transfers • Parallelism, striping (e.g., on PVFS) • Reliable, recoverable data transfers • Reference implementations • Existing clients and servers: wuftpd, nicftp • Flexible, extensible libraries Bill Allcock, Joe Bester, John Bresnahan, Steve Tuecke, others
Grid Services Architecture (4):Collective Layer Protocols & Services • Community membership & policy • E.g., Community Authorization Service • Index/metadirectory/ brokering services • E.g., Globus GIIS, Condor Matchmaker • Replica management and replica selection • Optimize aggregate data access performance • Co-reservation and co-allocation services • End-to-end performance • Middle tier services • MyProxy credential repository, portal services
Data Grids • Grid infrastructures, tools, and applications focused on enabling distributed access to, & analysis of, large amounts of data • A specialization and extension of standard Grid technologies • Current application domains include high energy & nuclear physics, climate data analysis, astronomy, bioinformatics
Grid Physics Network (GriPhyN) Enabling R&D for advanced data grid systems, focusing in particular on Virtual Data concept ATLAS CMS LIGO SDSS Paul Avery, Ian Foster, Co-PIs www.griphyn.org
Future Directions • Initial exploration (1996-1999; Globus 1.0) • Extensive appln experiments; core protocols • Data Grids (1999-??; Globus 2.0+) • Large-scale data management and analysis • Open Grid Services Architecture (2001-??, Globus 3.0) • Integration w/ Web services, hosting envs. • Integration with databases • Integrated set of higher-level services • Scalable systems (2003-??) • Sensors, wireless, ubiquitous computing
Summary • The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations • Grid architecture: Protocol, service definition for interoperability & resource sharing • Globus Toolkit™ a source of protocol and API definitions—and reference implementations • And many projects applying Grid concepts (& Globus technologies) to important problems • Timely to start applying technologies to industrial problems, within & outside STC
For More Information • The Globus Project™ • www.globus.org • Global Grid Forum • www.gridforum.org • Grid architecture • www.globus.org/research/papers/anatomy.pdf • Open Grid Services Architecture (soon) • www.globus.org/research/papers/ogsa.pdf • www.globus.org/research/papers/gsspec.pdf