520 likes | 535 Views
Condor-G: An Update. Outline. What is Condor-G Past Present Future. What Is Condor-G. Use Condor to run jobs on the Grid Uses Globus Toolkit GRAM (submit a remote job) GASS (transfer job’s files) Two components Globus Universe GlideIn. Globus Universe. Run a job on a Grid resource
E N D
Outline • What is Condor-G • Past • Present • Future
What Is Condor-G • Use Condor to run jobs on the Grid • Uses Globus Toolkit • GRAM (submit a remote job) • GASS (transfer job’s files) • Two components • Globus Universe • GlideIn
Globus Universe • Run a job on a Grid resource • Features • Job management • Fault tolerance • Credential management • Disadvantages • No remote syscalls, checkpoint/migration, or dynamic resource selection
How It Works Condor-G Grid Resource Schedd LSF
600 Globus jobs How It Works Condor-G Grid Resource Schedd LSF
600 Globus jobs How It Works Condor-G Grid Resource Schedd LSF GridManager
600 Globus jobs How It Works Condor-G Grid Resource JobManager Schedd LSF GridManager
600 Globus jobs How It Works Condor-G Grid Resource JobManager Schedd LSF GridManager User Job
GlideIn • Create your own personal Condor pool from temporarily-acquired Grid resources • Brings the full power of Condor to the Grid • Run a Condor startd on a Grid resource • Startd reports back to your machine and runs Vanilla and Standard Universe jobs
600 Condor jobs How It Works Condor-G Grid Resource Schedd LSF Collector
600 Condor jobs glide-ins How It Works Condor-G Grid Resource Schedd LSF Collector
600 Condor jobs glide-ins How It Works Condor-G Grid Resource Schedd LSF GridManager Collector
600 Condor jobs glide-ins How It Works Condor-G Grid Resource JobManager Schedd LSF GridManager Collector
600 Condor jobs glide-ins How It Works Condor-G Grid Resource JobManager Schedd LSF GridManager Startd Collector
600 Condor jobs glide-ins How It Works Condor-G Grid Resource JobManager Schedd LSF GridManager Startd Collector
600 Condor jobs glide-ins How It Works Condor-G Grid Resource JobManager Schedd LSF GridManager Startd Collector User Job
Globus Grid LSF PBS Condor Condor-G
Globus Grid 600 Condor jobs LSF PBS Condor Condor-G
Globus Grid Condor-G 600 Condor jobs LSF PBS Condor
Globus Grid Condor-G 600 Condor jobs LSF PBS glide-ins Condor
Globus Grid Condor-G 600 Condor jobs LSF PBS glide-ins Condor
Globus Grid Condor-G 600 Condor jobs LSF PBS glide-ins Condor
Globus Grid Condor-G 600 Condor jobs LSF PBS glide-ins Condor
Past • GridManager daemon • Runs Grid jobs using GRAM protocol • Stages executable and standard I/O using GASS protocol • Globus GRAM 1.5 • We added fault-tolerance to the GRAM protocol • Changes included in Globus Toolkit 2.0 release
Present • Updated Condor-G to Globus Toolkit 2.0 • Enhanced GridManager • GAHP
Enhanced GridManager • Put problem jobs on hold • No more stuck jobs • Increase concurrency with GAHP • Almost ready
Single-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Single-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Single-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Single-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Single-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Single-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Single-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Single-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Single-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Multi-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Multi-Threaded Execution Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Globus Application Helper Protocol (GAHP) • Condor is non-threaded • Want to use multi-threaded libraries • Increased concurrency • Put libraries in external helper process • Simple interface over pipes/sockets
Multi-Threaded Execution with GAHP Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource Grid Resource
Multi-Threaded Execution with GAHP Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource GAHP Server GAHP Client Grid Resource
Multi-Threaded Execution with GAHP Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource GAHP Server GAHP Client Grid Resource
Multi-Threaded Execution with GAHP Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource GAHP Server GAHP Client Grid Resource
Multi-Threaded Execution with GAHP Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource GAHP Server GAHP Client Grid Resource
Multi-Threaded Execution with GAHP Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource GAHP Server GAHP Client Grid Resource
Multi-Threaded Execution with GAHP Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource GAHP Server GAHP Client Grid Resource
Multi-Threaded Execution with GAHP Grid Resource GridManager Job 1 Grid Resource Job 2 Job 3 Job 4 Grid Resource GAHP Server GAHP Client Grid Resource
Future • GRAM 1.6 • Condor-G on Windows • Condor-G Grid service
Globus GRAM 1.6 • Working with Globus team to add additional features to GRAM protocol • Credential refresh • File staging • Scheduler-specific options
Condor-G for Windows • Condor • Windows implementation available • GRAM and GASS APIs • No C implementation for Windows (yet) • Java implementation (Java CoG) • Condor-G • Windows version possible by writing GAHP server in Java