1 / 39

The Anatomy of the Grid Enabling Scalable Virtual Organizations

The Anatomy of the Grid Enabling Scalable Virtual Organizations. John DYER TERENA john.dyer@terena.nl. Acknowldgement to: Ian Foster Mathematics and Computer Science Division Argonne National Laboratory. Grids are “hot” …. but what are they really about?. Presentation Agenda.

elsu
Download Presentation

The Anatomy of the Grid Enabling Scalable Virtual Organizations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Anatomy of the GridEnabling Scalable Virtual Organizations John DYER TERENA john.dyer@terena.nl Acknowldgement to: Ian Foster Mathematics and Computer Science Division Argonne National Laboratory

  2. Grids are“hot” … but what are they really about?

  3. Presentation Agenda • Problem statement • Architecture • Globus Toolkit • Futures

  4. The Grid Problem Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations

  5. Elements of the Problem • Resource sharing • Computers, storage, sensors, networks, … • Sharing always conditional: issues of trust, policy, negotiation, payment, … (Cost v Performance) • Coordinated problem solving • Beyond client-server: distributed data analysis, computation, collaboration, … • Dynamic, multi-institutional virtual orgs • Community overlays on classic org structures • Large or small, static or dynamic

  6. Computational Astrophysics Gig-E 100MB/sec 17 4 2 2 OC-12 line But only 2.5MB/sec) 12 5 5 SDSC IBM SP 1024 procs 5x12x17 =1020 NCSA Origin Array 256+128+128 5x12x(4+2+2) =480 • Solved EEs for gravitational waves • Tightly coupled, communications required • Must communicate 30MB/step between machines

  7. ~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Online System Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 FermiLab ~4 TIPS France Regional Centre Germany Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 Tier2 Centre ~1 TIPS Caltech ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS HPSS HPSS HPSS HPSS HPSS ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Tier 4 Physicist workstations Data Grids for High Energy Physics Image courtesy Harvey Newman, Caltech

  8. Network for Earthquake Engineering Simulation • NEESgrid: national infrastructure to couple earthquake engineers with experimental facilities, databases, computers, & each other • On-demand access to experiments, data streams, computing, archives, collaboration NEESgrid: Argonne, Michigan, NCSA, UIUC, USC

  9. Grid Applications:Mathematicians • Community=an informal collaboration of mathematicians and computer scientists • Condor-G delivers 3.46E8 CPU seconds in 7 days (600E3 seconds real-time) • peak 1009 processorsin U.S. and Italy (8 sites) MetaNEOS: Argonne, Iowa, Northwestern, Wisconsin

  10. Grid ArchitectureIsn’t it just the Next Generation Internet , so why bother !

  11. Why Discuss Architecture? • Descriptive • Provide a common vocabulary for use when describing Grid systems • Guidance • Identify key areas in which services are required - FRAMEWORK • Prescriptive • Define standards • But in the existing standards framework • GGF working with IETF, Internet2 etc.

  12. What Sorts of Standards? • Need for interoperability when different groups want to share resources • E.g., IP lets me talk to your computer, but how do we establish & maintain sharing? • How do I discover, authenticate, authorize, describe what I want to do, etc., etc.? • Need for shared infrastructure services to avoid repeated development, installation, e.g. • One port/service for remote access to computing, not one per tool/application • X.509 enables sharing of Certificate Authorities • MIDDLEWARE !

  13. In Defining Grid Architecture, We Must Address . . . • Development of Grid protocols & services • Protocol-mediated access to remote resources • New services: e.g., resource brokering • Mostly (extensions to) existing protocols • Development of Grid APIs & SDKs • Facilitate application development by supplying higher-level abstractions • The model is the Internet and Web

  14. Collaboration Tools Data Mgmt Tools Distributed simulation . . . Information services Resource mgmt Fault detection . . . The Role of Grid Services(Middleware) and Tools net

  15. GRID ArchitectureStatus • No “official” standards exist • But: • Globus Toolkit has emerged as the de facto standard for several important Connectivity, Resource, and Collective protocols • GGF has an architecture working group • Technical specifications are being developed for architecture elements: e.g., security, data, resource management, information • Internet drafts submitted in security area

  16. Application Application Internet Protocol Architecture JOB MANAGEMENT Directory, Discovery, Monitoring Collective NEGOTIATION & CONTROL Sharing resources, controlling Resource COMMS & AUTHENTICATION Single Sign On, Trust . . . Connectivity Transport Internet Fabric Link Layered Grid Architecture DOES THE SCIENCE /…. ALL PHYSICAL RESOURCES Net, CPUs, Storage, Sensors

  17. Toolkits & Components • CONDOR - Harnessing the processing capacity of idle workstations • www.cs.wisc.edu/condor/ • LEGION- developing an object-oriented framework for grid applications • www.cs.virginia.edu/~legion • Globus Toolkit SDK - APIs • www.globus.org/

  18. Architecture: Fabric Layer • Just what you would expect: the diverse mix of resources that may be shared • Individual computers, Condor pools, file systems, archives, metadata catalogs, networks, sensors, etc., etc. • Few constraints on low-level technology: connectivity and resource level protocols • Globus toolkit provides a few selected components (e.g., bandwidth broker)

  19. Architecture: Connectivity • Communication • Internet protocols: IP, DNS, routing, etc. • Security: Grid Security Infrastructure (GSI) • Uniform authentication & authorization mechanisms in multi-institutional setting • Single sign-on, delegation, identity mapping • Public key technology, SSL, X.509, GSS-API (several Internet drafts document extensions) • Supporting infrastructure: Certificate Authorities, key management, etc.

  20. GSI Futures • Scalability in numbers of users & resources • Credential management • Online credential repositories • Account management • Authorization • Policy languages • Community authorization • Protection against compromised resources • Restricted delegation, smartcards

  21. Architecture: Resources • Resource management: Remote allocation, reservation, monitoring, control of [compute] resources - GRAM (access & management • Data access: GridFTP • High-performance data access & transport • Information: • GRIP cf LDAP • GRRP – Registration Protocol • Access to structure & state information • & others emerging: catalog access, code repository access, accounting, … • All integrated with GSI

  22. GRAM Resource Management Protocol • Grid Resource Allocation & Management • Allocation, monitoring, control of computations • Simple HTTP-based RPC • Job request: Returns opaque, transferable “job contact” string for access to job • Job cancel, Job status, Job signal • Event notification (callbacks) for state changes • Protocol/server address robustness (exactly once execution), authentication, authorization • Servers for most schedulers; C and Java APIs

  23. Data Access & Transfer • GridFTP: extended version of popular FTP protocol for Grid data access and transfer • Secure, efficient, reliable, flexible, extensible, parallel, concurrent, e.g.: • Third-party data transfers, partial file transfers • Parallelism, striping (e.g., on PVFS) • Reliable, recoverable data transfers • Reference implementations • Existing clients and servers: wuftpd, nicftp • Flexible, extensible libraries

  24. Architecture: Collective • Bringing the underlying resources together to provide the requested services • Resource brokers (e.g., Condor Matchmaker) • Resource discovery and allocation • Replica management and replica selection • Optimize aggregate data access performance • Co-reservation and co-allocation services • End-to-end performance • Etc., etc.

  25. Globus Toolkit Solution Registration & enquiry protocols, information models, query languages • Provides standard interfaces to sensors • Supports different “directory” structures supporting various discovery/access strategies Karl Czajkowski, Steve Fitzgerald, others

  26. Grid Futures

  27. g g g g g g Major Grid Projects New New

  28. g g g g g g Major Grid Projects New New New New New

  29. g g g g g g Major Grid Projects New New

  30. g g Major Grid Projects New New Also many technology R&D projects: e.g., Condor, NetSolve, Ninf, NWS See also www.gridforum.org

  31. The 13.6 TF TeraGrid:Computing at 40 Gb/s Site Resources Site Resources 26 HPSS HPSS 4 24 External Networks External Networks 8 5 Caltech Argonne External Networks External Networks NCSA/PACI 8 TF 240 TB SDSC 4.1 TF 225 TB Site Resources Site Resources HPSS UniTree TeraGrid/DTF: NCSA, SDSC, Caltech, Argonne www.teragrid.org

  32. Tier0/1 facility Tier2 facility Tier3 facility 10 Gbps link 2.5 Gbps link 622 Mbps link Other link International Virtual Data Grid Lab U.S. PIs: Avery, Foster, Gardner, Newman, Szalay www.ivdgl.org

  33. Problem Evolution • Past-present: (102) high-end systems; Mb/s networks; centralized (or entirely local) control • I-WAY (1995): 17 sites, week-long; 155 Mb/s • GUSTO (1998): 80 sites, long-term experiment • NASA IPG, NSF NTG: O(10) sites, production • Present: (104-106) data systems, computers; Gb/s networks; scaling, decentralized control • Scalable resource discovery; restricted delegation; community policy; GriPhyN Data Grid: 100s of sites, (104) computers; complex policies • Future: (106-109) data, sensors, computers; Tb/s networks; highly flexible policy, control

  34. The Future • We don’t build or buy “computers” anymore, we borrow or lease required resources • When I walk into a room, need to solve a problem, need to communicate • A “computer” is a dynamically, often collaboratively constructed collection of processors, data sources, sensors, networks • Similar observations apply for software

  35. And Thus … • Reduced barriers to access mean that we do much more computing, and more interesting computing, than today => Many more components (& services); massive parallelism • All resources are owned by others => Sharing (for fun or profit) is fundamental; trust, policy, negotiation, payment • All computing is performed on unfamiliar systems => Dynamic behaviors, discovery, adaptivity, failure

  36. The Global Grid Forum • Merger of (US) GridForum & EuroGRID • Cooperative Forum of Working Groups • Open to all who show up • Meets every four months • Alternate – US and Europe • GGF1 – Amsterdam, NL • GGF2 – Washington, US • GGF3 – Frascatti, IT http://www.gridforum.org

  37. Global Grid Forum History 2001 2002 1998 1999 2000 GF BOF (Orlando) GF1 (San Jose, NASA Ames) GF2 (Chicago, Northwestern) eGrid and GF BOFs (Portland) GGF3 (Rome,INFN) 7-10 October 2001 GGF4 (Toronto, NRC) 17-20 February 2002 GGF5 (Edinburgh) 21-24 July 2002 Jointly with HPDC (24-26 July) GF3 (San Diego, SDSC) eGrid1(Posnan, PSNC) GF4 (Redmond, Microsoft) Asia-Pacific GF Planning (Yokohama) eGrid2 (Munich, Europar) GF5 (Boston, Sun) Global GF BOF (Dallas) GGF-1 (Amsterdam, WTCW) GGF-2 (Washington, DC, DOD-MSRC)

  38. Summary • The Grid problem: Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations • Grid architecture: Emphasize protocol and service definition to enable interoperability and resource sharing • Globus Toolkit a source of protocol and API definitions, reference implementations • See: globus.org, griphyn.org, gridforum.org

More Related