1 / 75

The Grid - Multi-Domain Distributed Computing

The Grid - Multi-Domain Distributed Computing. Kai Rasmussen Paul Ruggieri. Topic Overview. The Grid Types Virtual Organizations Security Real Examples Grid Tools Condor Cactus Cactus-G Globus OGSA. The Grid. What is a Grid system?

turner
Download Presentation

The Grid - Multi-Domain Distributed Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Grid - Multi-DomainDistributed Computing Kai Rasmussen Paul Ruggieri

  2. Topic Overview • The Grid • Types • Virtual Organizations • Security • Real Examples • Grid Tools • Condor • Cactus • Cactus-G • Globus • OGSA

  3. The Grid • What is a Grid system? • Highly heterogeneous set of resources that may or may not be maintained by multiple administrative domains • Early idea • Computational resources would be universally available as electric power

  4. “A hardware and software infrastructures that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities” - Ian Foster • Resources are distributed across sites and organizations with no centralized point of control • What constitutes a Grid? • Resources coordinated without being subjected to a centralized control • Uses standard, open source protocols and interfaces • Delivers non-trivial qualities of service

  5. Grid Types • Computations Grids • Resource pure CPU • Strength: Computational Intensive applications • Data Grids • Shared storage and data • Terabytes of storage space. • Sharing of data among collaborators • Fault Tolerance • Equipment Grids • Set of resources that surround shared equipments, such as a telescope

  6. Virtual Organizations • Grids are Multi-domain • Resources are administrated by separate departments or institutions • All wish to maintain individual control • There is a cross site grouping of collaborators sharing resources • “Virtual Organization”

  7. Virtual Organizations • Users of VO’s share a common goal and trust • Collection of resources, users and rules governing sharing • Highly controlled - What is Shared? Who is Sharing? How can resources be used? • One global domains acting over individual collaborating domains

  8. Grid Security • Highly distributed nature • VOs spread over many security domains • Authentication • Proving identity • Authorization • Obtaining privileges • Confidentiality & Integrity • Identity and privileges can be trusted

  9. Authentication • Certificate Authority (CA) • Entity that signs certificate that proves users identity • Certificate then used as credentials to use system • Typically several CAs to prevent single point of failure/attack • Globus Grid Security Infrastructure (GSI) • Globus’s Authentication component • Global security credential later mapped to local • Kerberos tickets or local username and password • Typically generate short-term proxy certificate with long-term certificate

  10. Authentication • Certification Authority Coordination Group • Maintains a global infrastructure of trusted CA agents • CA must meet standards • Physically secure • Must validate identity with Registration Authorities using official documents or photographic identification • Private Keys must be minimum of 1020 Bits and have max 1 year life • 28 approved CAs is European union

  11. Security Issues • Delegation • User entrusts separate entity to perform task • Entity must be given certification and trusted to behave • Limit proxies strength • Endow proxy with specific purpose

  12. Grid Projects • EGEE - Enabling Grids for eScience • 70 sites in over 27 countries • Mostly European • 40 Virtual Organizations • GENIUS Grid-Portal is used for submission • Individual collaborators use own middle-ware tools to group resources

  13. LCG • Large Hadron Collider Computation Grid • Developed distributed systems needed to support computation and data needs of LHC physics experiments • EGEE Collaborator • 100 Sites • Worlds largest Grid

  14. Grid 2003 • US effort • 27 National sites • 28000 Processors, 13000 Simultaneous Jobs • Infrastructure for • Particle Physics Grid • Virtual Data Grid Laboratory • Develop Application Grid Laboratory - Grid3 • Platform for experimental CS Research • Built on Virtual Data Toolkit • Collection of Globus, Condor and other middleware tools

  15. TeraGrid • 40 Teraflops of Computational Power • 8 National Sites with strong backbone • Used for NSF sponsored High Performance Computing • Mapping the human arterial tree model • TeraShake - Earthquake simulation

  16. Applications • Climate Monitoring + Simulation • Network Weather Service • Climate Data-Analysis Tool • Both run on the Earth System Grid running on Globus • MEANDER nowcast meteorology • Run on Hungarian Supergrid • ATLAS Challenge • Simulate high energy proton-proton collisions • Computational Science Simulations • Biology, Fluid Dynamics

  17. Grid Tools • Many middleware implementations • Globus • Condor • Condor-G • Cactus-G • OGSA • Solves common Grid problems • Resource discovery/management/allocation • Security/Authentication

  18. Condor • Initially developed in 1983 at University of Wisconsin • Pre-Grid tool • A Local Resource Management System • Allows creation of communities with distributed resources • Communities should grown naturally • Sharing as much or as little as they care too • Sounds like Virtual Organizations

  19. Condor • Responsibilities • Job Management, Scheduling • Resource monitoring and management • Checkpointing and Migration • Utilize idle CPU • Cycle ‘Scavenge

  20. Condor Pool • Full set of users and resources in community • Composed of three Entities • Agent • Finds resources and executes jobs • Resource • Advertise itself and how it can be used in pool • Matchmaker • Knows of all agents and resources • Puts together compatible pairs • Pool is defined by single matchmaker

  21. Matchmaking • Problem of centralized Scheduling • Resources have multiple owners • Unique use requirements • Matchmaking finds balance between user and resource needs • ClassAds • Agents advertise requirements • Resources advertise how it can be used

  22. Matchmaking • Matchmaker scans all known ClassAds • Creates matching pairs of agents and resources • Informs both parties • Individually responsible to negotiate job and initiating execution of job • Separation of matching and claiming • Matchmaker unaware of complicated allocation • Stale information may exist. Resource can deny match

  23. Condor Flocking • Linking condor pools necessary for collaboration • Sharing of resources beyond the organizational level • Individuals belonging to multiple communities • Gateway Flocking • Entire communities are linked • Direct Flocking • Individual collaborators belong to many pools

  24. Gateway Flocking • Gateway entity serves as a singular point of access for cross pool communication • Matchmakers talk to Gateways • Gateways talk to Gateways • Transparent to user • Organizational level sharing • Powerful, but difficult to setup and maintain

  25. Gateway Flocking

  26. Direct Flocking • Agents report to multiple matchmakers • Individual collaboration • Natural idea for users • Less powerful but simpler to build and deploy • Eventually used in favor Gateway Flocking

  27. Direct Flocking

  28. Cactus • General-purpose, open-source parallel computation framework • Developed for numerical solution to Einstein’s equation • Two main components flesh and thorns • Flesh – central core • Thorns – application modules • Provides simple abstract API • Hides MPI parallel driver, I/O (thorns)

  29. Cactus-G • “Grid-enabled” Cactus • Combines Cactus and MPICH-G2 (more later) • Layered approach • Application thorns • Grid-aware infrastructure thorns • Grid-enabled communication library (MPICH-G2 in this case)

  30. Globus • Condor • Pre-Grid tool applied to Grid Systems • Multi-domain possible but limited • No security. Focus primarily on resource management • Globus • Set of Grid specific tools • Extendable and Hierarchical

  31. The Toolkit • Globus Toolkit • Components for basic security, resource management, etc • Well defined interfaces - “Hour-glass” architecture • Local services sit behind API • Global services built on top of these local services • Interfaces useful to manage heterogeneity • Information Service integral component • Information-rich environment needed

  32. Globus Services

  33. Resource Management • Globus Resource Allocation Manager (GRAM) • Responsible for set of local resources • Single domain • Implemented with set a local RM tools • Condor, NQE, Fork, Easy-LL, etc… • Resource requests expressed in Resource Specification Language (RSL

  34. Resource Broker • Manages RSL requests • Uses Information services to discover GRAMS • Transforms abstract RSLs into more specific requirements • Sends allocation requests to appropriate GRAM

  35. Information Service • Grid always in flux • Information rich system produces information users find useful • Enhances flexibility and performance • Necessity for administration • Globus Metacomputing Directory Service (MDS) • Stores and makes accessible Grid information • Lightweight Directory Access Protocol (LDAP) • Extensible representation for information • Stores component information in directory information tree

  36. Security • Local Heterogeneity • Resources operated in multiple security domains • All use different authentication techniques • N-Way authentication • Job may be any number of processes on any number of resources • One logical entity. User should only authenticate once.

  37. Security • Globus Security Infrastructure (GSI) • Modular design constructed on top of local services • Solves local heterogeneity • Globus Identity • Mapped into local user identities by local GSI • Allows for n-way authorization

  38. OGSA • Open Grid Services Architecture • Defines a Grid Service • Provides standard interface for naming, creating, discovering a Grid Service • Location Transparent • Globus Toolkit • GRAM – resource allocation/management • MDS-2 – information discovery • GSI – authentication (single sign-on) • Web services • Widely used • Language/system independent

  39. OGSA – Grid Service Interface

  40. OGSA – VO Structure

  41. Condor-G • Hybrid Condor-Globus System • Local Condor agent (Condor-G) • Communicates with Globus GRAM, MDS, GSI, etc • Optimized Globus’s GRAM to work with Condor better

  42. Specific Testbed • Grid2003 • Organized into 6 VOs (one for each application) • At each VO site, middleware installed with grid certificate databases • GSI, GRAM, and GridFTP used from Globus • MDS • MonALISA • Agent-based monitoring used in conjunction with MDS

  43. MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface Nicholas Karonis, Brian Toonen, Ian Foster

  44. Abstract • Grid Enabled MPI implementation • Extends MPICH • Utilizes Globus Toolkit • Authentication, Authorization, Resource Allocation, Executable Staging, I/O, Process management creation and control • Hide/Expose critical aspects of heterogeneous environment

  45. The Problem • Grids difficult to program for… • heterogeneous, highly distributed • Build on existing MPI API • MPICH specifically • Can we implement MPI constructs in a highly heterogeneous environment efficiently and transparently? • Yes, use Globus! • Can we also allow users to manage heterogeneity? • Yes, existing MPI Communicator Construct!

More Related