760 likes | 908 Views
The Grid - Multi-Domain Distributed Computing. Kai Rasmussen Paul Ruggieri. Topic Overview. The Grid Types Virtual Organizations Security Real Examples Grid Tools Condor Cactus Cactus-G Globus OGSA. The Grid. What is a Grid system?
E N D
The Grid - Multi-DomainDistributed Computing Kai Rasmussen Paul Ruggieri
Topic Overview • The Grid • Types • Virtual Organizations • Security • Real Examples • Grid Tools • Condor • Cactus • Cactus-G • Globus • OGSA
The Grid • What is a Grid system? • Highly heterogeneous set of resources that may or may not be maintained by multiple administrative domains • Early idea • Computational resources would be universally available as electric power
“A hardware and software infrastructures that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities” - Ian Foster • Resources are distributed across sites and organizations with no centralized point of control • What constitutes a Grid? • Resources coordinated without being subjected to a centralized control • Uses standard, open source protocols and interfaces • Delivers non-trivial qualities of service
Grid Types • Computations Grids • Resource pure CPU • Strength: Computational Intensive applications • Data Grids • Shared storage and data • Terabytes of storage space. • Sharing of data among collaborators • Fault Tolerance • Equipment Grids • Set of resources that surround shared equipments, such as a telescope
Virtual Organizations • Grids are Multi-domain • Resources are administrated by separate departments or institutions • All wish to maintain individual control • There is a cross site grouping of collaborators sharing resources • “Virtual Organization”
Virtual Organizations • Users of VO’s share a common goal and trust • Collection of resources, users and rules governing sharing • Highly controlled - What is Shared? Who is Sharing? How can resources be used? • One global domains acting over individual collaborating domains
Grid Security • Highly distributed nature • VOs spread over many security domains • Authentication • Proving identity • Authorization • Obtaining privileges • Confidentiality & Integrity • Identity and privileges can be trusted
Authentication • Certificate Authority (CA) • Entity that signs certificate that proves users identity • Certificate then used as credentials to use system • Typically several CAs to prevent single point of failure/attack • Globus Grid Security Infrastructure (GSI) • Globus’s Authentication component • Global security credential later mapped to local • Kerberos tickets or local username and password • Typically generate short-term proxy certificate with long-term certificate
Authentication • Certification Authority Coordination Group • Maintains a global infrastructure of trusted CA agents • CA must meet standards • Physically secure • Must validate identity with Registration Authorities using official documents or photographic identification • Private Keys must be minimum of 1020 Bits and have max 1 year life • 28 approved CAs is European union
Security Issues • Delegation • User entrusts separate entity to perform task • Entity must be given certification and trusted to behave • Limit proxies strength • Endow proxy with specific purpose
Grid Projects • EGEE - Enabling Grids for eScience • 70 sites in over 27 countries • Mostly European • 40 Virtual Organizations • GENIUS Grid-Portal is used for submission • Individual collaborators use own middle-ware tools to group resources
LCG • Large Hadron Collider Computation Grid • Developed distributed systems needed to support computation and data needs of LHC physics experiments • EGEE Collaborator • 100 Sites • Worlds largest Grid
Grid 2003 • US effort • 27 National sites • 28000 Processors, 13000 Simultaneous Jobs • Infrastructure for • Particle Physics Grid • Virtual Data Grid Laboratory • Develop Application Grid Laboratory - Grid3 • Platform for experimental CS Research • Built on Virtual Data Toolkit • Collection of Globus, Condor and other middleware tools
TeraGrid • 40 Teraflops of Computational Power • 8 National Sites with strong backbone • Used for NSF sponsored High Performance Computing • Mapping the human arterial tree model • TeraShake - Earthquake simulation
Applications • Climate Monitoring + Simulation • Network Weather Service • Climate Data-Analysis Tool • Both run on the Earth System Grid running on Globus • MEANDER nowcast meteorology • Run on Hungarian Supergrid • ATLAS Challenge • Simulate high energy proton-proton collisions • Computational Science Simulations • Biology, Fluid Dynamics
Grid Tools • Many middleware implementations • Globus • Condor • Condor-G • Cactus-G • OGSA • Solves common Grid problems • Resource discovery/management/allocation • Security/Authentication
Condor • Initially developed in 1983 at University of Wisconsin • Pre-Grid tool • A Local Resource Management System • Allows creation of communities with distributed resources • Communities should grown naturally • Sharing as much or as little as they care too • Sounds like Virtual Organizations
Condor • Responsibilities • Job Management, Scheduling • Resource monitoring and management • Checkpointing and Migration • Utilize idle CPU • Cycle ‘Scavenge
Condor Pool • Full set of users and resources in community • Composed of three Entities • Agent • Finds resources and executes jobs • Resource • Advertise itself and how it can be used in pool • Matchmaker • Knows of all agents and resources • Puts together compatible pairs • Pool is defined by single matchmaker
Matchmaking • Problem of centralized Scheduling • Resources have multiple owners • Unique use requirements • Matchmaking finds balance between user and resource needs • ClassAds • Agents advertise requirements • Resources advertise how it can be used
Matchmaking • Matchmaker scans all known ClassAds • Creates matching pairs of agents and resources • Informs both parties • Individually responsible to negotiate job and initiating execution of job • Separation of matching and claiming • Matchmaker unaware of complicated allocation • Stale information may exist. Resource can deny match
Condor Flocking • Linking condor pools necessary for collaboration • Sharing of resources beyond the organizational level • Individuals belonging to multiple communities • Gateway Flocking • Entire communities are linked • Direct Flocking • Individual collaborators belong to many pools
Gateway Flocking • Gateway entity serves as a singular point of access for cross pool communication • Matchmakers talk to Gateways • Gateways talk to Gateways • Transparent to user • Organizational level sharing • Powerful, but difficult to setup and maintain
Direct Flocking • Agents report to multiple matchmakers • Individual collaboration • Natural idea for users • Less powerful but simpler to build and deploy • Eventually used in favor Gateway Flocking
Cactus • General-purpose, open-source parallel computation framework • Developed for numerical solution to Einstein’s equation • Two main components flesh and thorns • Flesh – central core • Thorns – application modules • Provides simple abstract API • Hides MPI parallel driver, I/O (thorns)
Cactus-G • “Grid-enabled” Cactus • Combines Cactus and MPICH-G2 (more later) • Layered approach • Application thorns • Grid-aware infrastructure thorns • Grid-enabled communication library (MPICH-G2 in this case)
Globus • Condor • Pre-Grid tool applied to Grid Systems • Multi-domain possible but limited • No security. Focus primarily on resource management • Globus • Set of Grid specific tools • Extendable and Hierarchical
The Toolkit • Globus Toolkit • Components for basic security, resource management, etc • Well defined interfaces - “Hour-glass” architecture • Local services sit behind API • Global services built on top of these local services • Interfaces useful to manage heterogeneity • Information Service integral component • Information-rich environment needed
Resource Management • Globus Resource Allocation Manager (GRAM) • Responsible for set of local resources • Single domain • Implemented with set a local RM tools • Condor, NQE, Fork, Easy-LL, etc… • Resource requests expressed in Resource Specification Language (RSL
Resource Broker • Manages RSL requests • Uses Information services to discover GRAMS • Transforms abstract RSLs into more specific requirements • Sends allocation requests to appropriate GRAM
Information Service • Grid always in flux • Information rich system produces information users find useful • Enhances flexibility and performance • Necessity for administration • Globus Metacomputing Directory Service (MDS) • Stores and makes accessible Grid information • Lightweight Directory Access Protocol (LDAP) • Extensible representation for information • Stores component information in directory information tree
Security • Local Heterogeneity • Resources operated in multiple security domains • All use different authentication techniques • N-Way authentication • Job may be any number of processes on any number of resources • One logical entity. User should only authenticate once.
Security • Globus Security Infrastructure (GSI) • Modular design constructed on top of local services • Solves local heterogeneity • Globus Identity • Mapped into local user identities by local GSI • Allows for n-way authorization
OGSA • Open Grid Services Architecture • Defines a Grid Service • Provides standard interface for naming, creating, discovering a Grid Service • Location Transparent • Globus Toolkit • GRAM – resource allocation/management • MDS-2 – information discovery • GSI – authentication (single sign-on) • Web services • Widely used • Language/system independent
Condor-G • Hybrid Condor-Globus System • Local Condor agent (Condor-G) • Communicates with Globus GRAM, MDS, GSI, etc • Optimized Globus’s GRAM to work with Condor better
Specific Testbed • Grid2003 • Organized into 6 VOs (one for each application) • At each VO site, middleware installed with grid certificate databases • GSI, GRAM, and GridFTP used from Globus • MDS • MonALISA • Agent-based monitoring used in conjunction with MDS
MPICH-G2: A Grid-Enabled Implementation of the Message Passing Interface Nicholas Karonis, Brian Toonen, Ian Foster
Abstract • Grid Enabled MPI implementation • Extends MPICH • Utilizes Globus Toolkit • Authentication, Authorization, Resource Allocation, Executable Staging, I/O, Process management creation and control • Hide/Expose critical aspects of heterogeneous environment
The Problem • Grids difficult to program for… • heterogeneous, highly distributed • Build on existing MPI API • MPICH specifically • Can we implement MPI constructs in a highly heterogeneous environment efficiently and transparently? • Yes, use Globus! • Can we also allow users to manage heterogeneity? • Yes, existing MPI Communicator Construct!