410 likes | 432 Views
Learn how Condor leverages idle workstations to maximize job throughput without dedicated hardware. Understand Condor architecture, matchmaking, and checkpointing to optimize resource allocation.
E N D
Grid Computing 2 http://www.globus.org http://www.cs.virginia.edu/~legion/ http://www.cs.wisc.edu/condor/ (thanks to shava and holly [see notes for CSE 225]) CSE 160/Berman
Outline • Today: • Condor • Globus • Legion • Next class: • Talk by Marc Snir, Architect of IBM’s Blue Gene • Tuesday June 6, AP&M 4301 1:00-2:00 CSE 160/Berman
Condor • Condor is a high-throughput scheduler • Main idea is to leverage free cycles on very large collections of privately owned, non-dedicated desktop workstations • Performance measure is throughput of jobs • Rather than how fast can a particular job run, how manyjobs can complete over a long period of time. • Developed by Miron Livny et al. at U. of Wisconsin CSE 160/Berman
Condor Basics • Condor = “hunter of idle workstations” • Condor pool consists of large number of privately controlled UNIX workstations • (Condor now being ported to NT) • WS owners define the conditions under which the WS can be allocated by Condor to an external user • External Condor jobs run while machines are idle • User does not need a login on participating machines • Uses remote system calls to submitting WS CSE 160/Berman
Central Manager Startd Schedd Shadow Starter UserProcess SubmissionMachine ExecutionMachine Condor Architecture (all machines in same Condor Pool) Architecture: • Each WS runs Schedd and Startd daemons • Startd monitors and terminates jobs assigned by CM • Schedd queues jobs submitted to Condor at that WS and seeks resources for them • Central Manager (CM) WS controls allocation and execution for all jobs CSE 160/Berman
Central Manager Startd Schedd Shadow Starter UserProcess SubmissionMachine ExecutionMachine Standard Condor Protocol (all machines in same Condor Pool) Protocol: • Schedd (submitting machine) sends job context to CM; Execution machine sends machine context to CM • CM identifies a match between job requirements and execution machine resources • CM sends to Schedd the execution machine ID • Schedd forks a Shadow process on submission machine • Shadow passes job requirements to Startd on execution machine and gets acknowledgement that execution machine is still idle • Shadow sends executable to execution machine where it executes until completion or migration CSE 160/Berman
More Condor Basics • Participating condor machines not required to share file systems • No source code changes to user’s code required to use Condor, users must re-link their program in order to use checkpoint and migration • vanilla jobs vs. condor jobs • Condor jobs allocated to good target resource using a matchmaker • Single condor jobs automatically checkpointed and migrated between WSs, and restarted as needed CSE 160/Berman
Submittedprocess file Submission WS Execution WS Shadowprocess allocatedprocess After allocation … file Submission WS ExecutionWS Condor Remote System Call Strategy • Job must be able to read and write files on its submit workstation CSE 160/Berman
Condor Matchmaking • Matchmaking mechanism matches job specs to machine characteristics • Matchmaking done using classads • Resources produce resource offer ads • Include information such as available RAM memory, CPU type and speed, virtual memory size, physical location, current load average, etc. • Jobs provide resource request ad which defines the required and desired set of resources to run on • Condor acts as a broker which matches and ranks resource offer ads against resource request ads • Condor makes sure that all requirements in both ads are satisfied • Priorities of users and certain types of ads also taken into consideration CSE 160/Berman
Condor Checkpointing • When WS owner returns, job can be checkpointed and restarted on another WS • Periodic checkpoint feature can periodically checkpoint the job so that work is not lost should the job be migrated • Condor jobs vs. “vanilla” jobs • Condor job executables must be relinked and can be checkpointed, migrated and restarted • Vanilla jobs are not relinked and cannot be checkpointed and migrated CSE 160/Berman
Condor Checkpointing Limitations • Only single process jobs supported • Inter-process communication not supported (socket, send, recv, etc. not implemented) • All file operations idempotent (read-only, write-only work correctly, read and write to the same file may not) • Disk space must be available to store the checkpoint file on the submitting machines. • Each checkpointed job has an associated checkpoint file which is approximately the size of the address space of the process. CSE 160/Berman
Condor-PVM and Parallel Jobs • PVM master/slave jobs can be submitted to Condor pool. (Special condor-pvm universe) • Master is run on machine where the job was submitted • Slaves pulled from the condor pool as they become available • Condor acts as resource manager for pvm daemon • Whenever pvm program asks for nodes, request is remapped to Condor • Condor finds machine in condor pool and adds it to pvm virtual machine CSE 160/Berman
Condor and the Grid • Condor and the Alliance • Condor one of the Grid technologies deployed by the Alliance • Used for production high-throughput computing by partners • Condor and Globus • Globus can use Condor as a local resource manager. • Globus RSL specs translated into matchmaker classads CSE 160/Berman
Condor and the Grid • Flock of Condors • Aggregation of condor pools into “flock” enables Condor pools to cross load-sharing and protection boundaries • Condor flock may include Condor pools connected by wide-area networks • Infrastructure • Idea is to add Gateway machine for every pool. • Gateway machines act as resource brokers for machines external to a pool • In published description, GW machine presents randomly chosen external pools/machines • CM does not need to know about flocking • Each GW machine runs GW-startd and GW-schedd as with a single condor pool CSE 160/Berman
Central Manager Central Manager GW-Startd Startd Schedd GW-Schedd GW-SimulateShadow Shadow GW-Startdchild Starter UserProcess SubmissionMachine ExecutionMachine GatewayMachine GatewayMachine Submission Pool Execution Pool Flocking Protocol(machines in different pools) CSE 160/Berman
Globus • Globus -- integrated toolkit of Grid services • Developed by Ian Foster (ANL/UC) and Carl Kesselman (USC/ISI) • Bag of services model – applications can use Grid services without having to adopt a particular programming model CSE 160/Berman
Core Globus Services • Resource allocation and process management (GRAM, DUROC, RSL) • Information Infrastructure (MDS) • Security (GSI) • Communication (Nexus) • Remote Access (GASS, GEM) • Fault Detection (HBM) • QoS (GARA, Gloperf) CSE 160/Berman
Local Services Condor MPI TCP UDP LSF Easy NQE AIX Irix Solaris Globus Layered Architecture Applications High-level Services and Tools GlobusView Testbed Status DUROC MPI MPI-IO CC++ Nimrod/G globusrun Core Services Nexus GRAM Metacomputing Directory Service Globus Security Interface Heartbeat Monitor Gloperf GASS CSE 160/Berman
Globus Resource Management Services • Resource Management services provide mechanism for remote job submission and management • 3 low level services: • GRAM (Globus Resource Allocation Manager) • Provides remote job submission and management • DUROC (Dynamically Updated Request Online Co-allocator) • Provides simultaneous job submission • Layers on top of GRAM • RSL (Resource Specification Language) • Language used to communicate resource requests CSE 160/Berman
Broker Co-allocator Globus Resource Management Architecture RSL specialization RSL Information Service Queries Application & Info Ground RSL Simple ground RSL Local resource managers GRAM GRAM GRAM LSF EASY-LL NQE CSE 160/Berman
Globus Information Infrastructure • MDS (Metacomputing Directory Service) • MDS stores information about entry = some type of object (organization, person, network, computer, etc.) • Object class associated with each entry describes a set of entry attributes • LDAP (Lightweight Directory Access Protocol) used to store information about resources • LDAP = hierarchical, tree-structured information model defining form and character of information CSE 160/Berman
Globus Security Service • GSI (Grid Security Infrastructure) • Provides public key-based security system that layers on top of local site security • User identified to system using X.509 certificate containing info about the duration of permissions, public key, signature of certificate authority • User also has private key • Provides users with a single sign-on access to the various sites to which they are authorized CSE 160/Berman
More GSI • Resource management system uses GSI to establish which machines user may have access to • GSI system allows for proxies so that user only need logon once, as opposed to logging on for all machines involved in a distributed computation • Proxies used for short-term authentication, rather than long-term use CSE 160/Berman
Globus Communication Services • Nexus • Communication library which provides asynchronous RPC, multi-method communication, data conversion and multi-threading facilities • I/O • Low level communication library which provides a thin wrapper around TCP, UDP, IP multicast and file I/O • Integrates GSI into TCP communication CSE 160/Berman
Globus Remote Access Services • GASS (Globus Access to Secondary Storage) • Provides secure remote access to files • GEM (Globus Executable Management) • Intended to support identification, location, and creation of executables in a heterogeneous environment. CSE 160/Berman
Globus Fault Detection Services • HBM (Heartbeat Monitor) • Provides mechanisms for monitoring multiple remote processes in a job and enabling application to respond to failures • Nexus Fault Detection: • Notifies applications using Nexus when a communicating process fails (but not which one) CSE 160/Berman
Globus QoS Services • GARA(Globus Architecture for Reservation and Allocation) • Provides dedicated access to collections of resources via reservations • Gloperf • Provides bandwidth and latency information • Wolski’s NWS being integrated with Globus • NWS provides monitoring and predictive information CSE 160/Berman
Globus and the Grid • Major player in Grid Infrastructure development • Currently deployed widely • User community strong • Infrastructure supported by IPG, Alliance and NPACI • Exclusive infrastructure of Alliance and IPG CSE 160/Berman
Legion • Developed by Andrew Grimshaw (UVA) • Provides single, coherent virtual machine model that addresses grid issues within a reflective, object-based metasystem • Everything is an object in Legion model – HW resources, SW resources, etc. CSE 160/Berman
Legion Goals • Site autonomy • Each organization maintains control over their own resources • Extensibility • Users can construct own mechanisms and policies within Legion • Scalability • No centralized structures or servers; full distribution CSE 160/Berman
Legion Goals • Easy to use / seamless • System must hide complexity of environment • “Ninja users” must be able to tune applications • High performance via parallelism • Coarse-grained applications should perform well • Single, persistent object space • Single name space, transparent of location or replication • Security • “do no harm” – Legion should not weaken local security policies CSE 160/Berman
Legion Object Model • Every Legion object is defined and managed by its class object; class objects act as managers and make policy, as well as define instances • Legion defines the interface and basic functionality of a set of core object types which support basic services • Users may also define and build their own class objects CSE 160/Berman
Legion Object Model • Core Objects: • Host objects • Encapsulate machine capabilities in Legion (processors and memory) • Currently represent single host systems (uniprocessor and multiprocessor shared memory) • Vault objects • Represents persistent storage • Implementation objects • Generally an executable file – host object can execute when it receives a request to activate or create an object CSE 160/Berman
Legion Object Model • Basic system services provided by core objects • Naming and binding, object creation, activation, deactivation and deletion • Responsibility for system-level functionality endowed on classes • Classes (which are also objects) define and manage objects associated with them • Classes create new instances, schedule them for execution, activate and deactivate them, and provide current location info for contacting them • Users can define and build own class objects CSE 160/Berman
Legion Programming • Legion supports MPI and PVM libraries via “emulation libraries” (which use runtime Legion library) • Applications need to be recompiled and relinked • Legion supports BFS (Basic Fortran Support) and Java • Legion OO programming language = Mentat (MPL) CSE 160/Berman
Legion and the Grid • Major Grid player with Globus • Legion infrastructure deployed at NPACI, Department of Defense Modernization sites, being considered as infrastructure for Boeing’s distributed product data management and manufacturing resource control systems. • Large-scale application implementations of molecular dynamics applications [Charmm and Amber] at NPACI CSE 160/Berman
Still other Infrastructure Approaches • Corba • Globe (Europe) • Suma (Venezuela) • Web-based approaches (Geoffrey Fox) • Jini (Sun) • DCom (MS) etc. CSE 160/Berman
What’s Missing? • How do we ensure application performance? • Performance-efficient application development and execution: • Ninja programming • AppLeS, Nimrod, Mars, Prophet/Gallop, MSHN, etc. • GrADS CSE 160/Berman
Prototype system which facilitates end-to-end “grid-aware” program development Based on the idea of a performance economy in which negotiated contracts bind application to resources Joint project with large team of researchers Ken Kennedy Jack Dongarra Dennis Gannon Dan Reed Lennart Johnsson Performance feedback Perf problem Software components Realtime perf monitor Scheduler/ Service Negotiator Grid runtime System (Globus) Config. object program Source appli- cation whole program compiler P S E negotiation Dynamic optimizer libraries Grid Application Development System GrADS – Grid Application Development and Execution Environment Andrew Chien Rich Wolski Ian Foster Carl Kesselman Fran Berman
Cool GrADS Ideas • Performance Contracts • Vehicle for sharing complex, multi-dimensional performance information between components • Performance Economy • Framework in which to negotiate services and promote performance. • Performance contracts play fundamental role in exchange of information and binding of resources • Resource allocation and performance steering using fuzzy logic (“AppLePilot”) • Mechanism for describing quality of information • Allows for performance steering based on evaluation of application progress
Next Time • Talk by Marc Snir, Architect of IBM’s Blue Gene • Tuesday June 6, AP&M 4301 1:00-2:00 • Abstract IBM Research announced in December a 5 year, $100M research project aimed at developing a petaop computer and using it for research in computational biology. The talk will discuss the architectural choices involved in the design of a petaop computer, and will present the design point pursued by the Blue Gene project. We shall discuss the mapping of molecular dynamic computations onto the Blue Gene architecture and outline research problems in Computer Science and Computational Biology that such project motivates.