360 likes | 594 Views
Condor-G Making Condor Grid Enabled. Outline. Why use Condor-G Globus Universe GlideIn Status & Future Work. What is Condor-G?. Extensions to Condor to allow access to the Grid through Globus Two Parts Globus Universe GlideIn. Why Use Condor-G. Condor
E N D
Outline • Why use Condor-G • Globus Universe • GlideIn • Status & Future Work
What is Condor-G? • Extensions to Condor to allow access to the Grid through Globus • Two Parts • Globus Universe • GlideIn
Why Use Condor-G • Condor • Designed to run jobs within a single administrative domain • Globus • Designed to run jobs across many administrative domains • Condor-G • Combine the strengths of both
Condor-G Helps Condor Users • Machines available to Condor users are limited • Local Condor Pool • Friendly Condor Pools (via Flocking) • Through Globus, many more machines become available to run your jobs
Condor-G Helps Globus Users • Globus is primarily an infrastructure upon which to develop distributed applications • Command-line tools are limited • Some users don’t want to rewrite their applications to use Globus • Condor-G provides them a powerful interface to the Grid to run their existing applications
Globus Universe • Advantages of using Condor as a front-end to Globus • Full-featured queuing service • Fault-tolerance • Credential Management
Full-Featured Queue • Persistent queue • Many queue-manipulation tools • Set up job dependencies (DAGman) • E-mail notification of events • Log files
Fault-Tolerance • Local Crash • Queue state kept on disk • Condor Master restarts other daemons • Remote Crash • Condor will resubmit jobs • Globus jobmanager enhanced to improve recoverability
Credential Management • Authentication in Globus is done with limited-lifetime X509 proxies • Proxy may expire before jobs finish executing • Condor can put jobs on hold and e-mail user to refresh proxy
How It Works Personal Condor Globus Resource Schedd LSF
600 Globus jobs How It Works Personal Condor Globus Resource Schedd LSF
600 Globus jobs How It Works Personal Condor Globus Resource Schedd LSF GridManager
600 Globus jobs How It Works Personal Condor Globus Resource JobManager Schedd LSF GridManager
600 Globus jobs How It Works Personal Condor Globus Resource JobManager Schedd LSF GridManager User Job
Globus Universe • Disadvantages • No matchmaking or dynamic scheduling of jobs • No job checkpoint or migration • No remote system calls
Solution: GlideIn • Use the Globus Universe to run the Condor daemons on Globus resources • When the resources run these GlideIn jobs, they will join your Condor Pool • Submit your jobs as Standard or Vanilla Universe jobs and they will be matched and run on the Globus resources
600 Condor jobs How It Works Personal Condor Globus Resource Schedd LSF Collector
600 Condor jobs glide-ins How It Works Personal Condor Globus Resource Schedd LSF Collector
600 Condor jobs glide-ins How It Works Personal Condor Globus Resource Schedd LSF GridManager Collector
600 Condor jobs glide-ins How It Works Personal Condor Globus Resource JobManager Schedd LSF GridManager Collector
600 Condor jobs glide-ins How It Works Personal Condor Globus Resource JobManager Schedd LSF GridManager Startd Collector
600 Condor jobs glide-ins How It Works Personal Condor Globus Resource JobManager Schedd LSF GridManager Startd Collector
600 Condor jobs glide-ins How It Works Personal Condor Globus Resource JobManager Schedd LSF GridManager Startd Collector User Job
GlideIn Concerns • What if a Globus resource kills my GlideIn? • That resource will disappear from your pool and you jobs will be rescheduled on other machines • What if all my jobs are done before a GlideIn runs? • If the glided-in Condor daemons are not matched with a job in 10 minutes, they terminate
personal Condor Globus Grid your workstation LSF PBS Condor Group Condor
personal Condor Globus Grid your workstation 600 Condor jobs LSF PBS Condor Group Condor
personal Condor Globus Grid Group Condor your workstation 600 Condor jobs LSF PBS Condor
personal Condor Globus Grid Group Condor your workstation 600 Condor jobs LSF PBS glide-ins Condor
personal Condor Globus Grid Group Condor your workstation 600 Condor jobs LSF PBS glide-ins Condor
personal Condor Globus Grid Group Condor your workstation 600 Condor jobs LSF PBS glide-ins Condor
personal Condor Globus Grid Group Condor your workstation 600 Condor jobs LSF PBS glide-ins Condor
personal Condor Globus Grid Group Condor your workstation 600 Condor jobs LSF PBS glide-ins Condor
Current Status • First version of GridManager ready • Runs jobs using Globus GRAM • Stages executable and standard I/O using Globus GASS • Jobmanager changes will be folded into a future release of Globus • Credential management in progress
Future Work • GridManager • Stage user jobs’ data files • Automatic GlideIn • Condor creates GlideIn jobs when more resources are needed • Matchmaking in Globus Universe • Use Globus GRIS to create ClassAds for Globus resources and match them to job ClassAds