120 likes | 258 Views
Part 7: CondorG. Part 7: CondorG. A: Condor-G B: Laboratory: CondorG. A: Condor-G. Condor-G. A client-side job management system for the grid General-purpose Can manage large numbers of jobs Handles many failures gracefully. Condor-G. Condor-G can manage a large number of jobs
E N D
Part 7: CondorG • A: Condor-G • B: Laboratory: CondorG
Condor-G • A client-side job management system for the grid • General-purpose • Can manage large numbers of jobs • Handles many failures gracefully
Condor-G • Condor-G can manage a large number of jobs • You specify the jobs in a file and submit them to Condor, which runs them all and keeps you notified on their progress • Mechanisms to help you manage huge numbers of jobs (1000’s), all the data, etc. • Condor-G can handle inter-job dependencies (DAGMan) • You can set job priorities
Condor-G • Condor-G handles many failures gracefully • Condor-G does whatever it takes to run your jobs, even if… • The gatekeeper is temporarily unavailable • The job manager crashes • The network goes down • Your machine crashes
Condor-G Fault-Tolerance:Lost Contact with Remote Jobmanager Can we contact gatekeeper? Yes - jobmanager crashed No – retry until we can talk to gatekeeper again… Can we reconnect to jobmanager? No – machine crashed or job completed Yes – network was down Restart jobmanager Has job completed? No – is job still running? Yes – update queue
Credential Management Pull refreshed credentials from MyProxy? Push refreshed credentials to remote systems Job Scheduling Use Matchmaking to select resources for jobs WS-GRAM Support for GT4 GlideIn Allows late binding of resources and job checkpoint/migration Other Condor-G Features
Lab 7: CondorG • In this lab, you’ll: • Configure and start Condor • Display Condor information • Submit • Single job, multiple job, multiple job with separate directories • Diagnose and release a held job • Shut Condor down
Credits • Portions of this presentation were adapted from the following sources: • Jaime Frey, UW-Madison