170 likes | 182 Views
Condor and the NGS John Kewley NGS Support Centre Manager. Outline. What is Condor? What is High Throughput Computing Condor and the NGS. What is Condor?. A job submission framework which utilises the spare computing power within a heterogeneous computer network
E N D
Outline • What is Condor? • What is High Throughput Computing • Condor and the NGS NGS Innovation Forum, Manchester
What is Condor? • A job submission framework which utilises the spare computing power within a heterogeneous computer network • Desktop PCs, Linux workstations, servers, clusters, teaching lab resources, all can be include in the Condor pool • It supports High-Throughput Computing (HTC), maximising the amount of processing capacity that is utilised over long periods of time. • Developed over the past 20 years at the University of Wisconsin in Madison NGS Innovation Forum, Manchester
Terminology HPC (High Performance Computing) • Large of amounts of [simultaneous] computing power for short periods of time HTC (High Throughput Computing) • Large amounts of computing over longer periods, not necessarily all at once NGS Innovation Forum, Manchester
Useful Features • Automatic resubmission when jobs fail • Ability to cluster groups of jobs • Checkpointing / migration • DAGMan - Directed Acyclic Graph / workflow manager • Integration with Grid resources, especially through Condor-G • Staging and retrieval of data • Glide-in – dynamically add Grid worker nodes to your Condor pool NGS Innovation Forum, Manchester
Various job types Parameter Studies OpenMP Master-worker Parallel Parameter Search Serial Embarassingly Parallel Sequential Parameter Sweep Monte Carlo MPI PVM NGS Innovation Forum, Manchester
Terminology Parallel • Tightly-coupled Processes • Need synchronisation • Information sharing • Message passing • Shared memory • 1 process fails, whole job fails • Single large homogenous resouce • Processors used simultaneously Independent • Unordered (so not serial/sequential) • Nothing embarrassing about it • No communication once job starts • Might not need all results • Could run on different machines with different operating systems. NGS Innovation Forum, Manchester
Condor on the NGS • Cardiff: as well as their SGI cluster, 1000 WindowsXP (~200 available to NGS) of their Condor Pool • Bristol: ~50 WindowsXP in a Condor pool fronted by a Linux server • Reading: ~400 Linux (CoLinux under WindowsXP) NGS Innovation Forum, Manchester
Condor with the NGS • Cardiff: as well as their SGI cluster, 1000 WindowsXP (~200 available to NGS) of their Condor Pool • Bristol: ~50 WindowsXP in a Condor pool fronted by a Linux server • Reading: ~400 Linux (CoLinux under WindowsXP) NGS Innovation Forum, Manchester
University of Manchester, Research Computing Services • 100 cores (an additional 400 in 2nd pool) • Condor used as backfill for the SGE queues • IP-tunnelling used to enable connection to the NW-Grid backend nodes from Condor (rather than the provided GCB, the Generic Connection Broker) NGS Innovation Forum, Manchester
OxGrid: Overview Department/College Department/College Oxford e-Research Centre Department/College Storage (SRB) BDII, VOMS, SSO CA... Resource Broker/ Login (Condor) Condor pool Departmental Clusters Condor pool Other University/Institution Other University/Institution Other University/Institution National Grid Service Resource Microsoft Cluster National Grid Service Cluster Super-computing centre NGS Innovation Forum, Manchester
User login Condor-G portal MyProxy server Condor-G central manager Condor-G submit host CSD-Physics cluster (ulgbc2) CSD-Physics cluster (ulgbc2) CSD AMD cluster (ulgbc1) NW-GRID cluster (ulgbc3) NW-GRID/POL cluster (ulgp4) Condor ClassAds Globus file staging NGS Innovation Forum, Manchester
Novel Architecture !? • Condor itself is not that new • Some NGS users request Windows resources, but most previous NGS nodes use PBS, LSF or SGE on Linux • Condor can provide access to Windows resources NGS Innovation Forum, Manchester
Windows on the NGS Many users are looking for Windows resources on which to run their computations. • Cardiff: WindowsXP on Condor • Bristol: WindowsXP on Condor • Southampton: 100 processors running under Windows Compute Cluster Server NGS Innovation Forum, Manchester
Other work • Jean-Alain Grunchec of the University of Edinburgh is trying Condor Glidein to add NGS resources to his condor pool • The e-Minerals project utilised a condor submission mechanism to submit jobs to both local Condor pools and Grid resources such as NGS and NW-Grid http://www3.interscience.wiley.com/journal/117909340/abstract?CRETRY=1 NGS Innovation Forum, Manchester
Summary • Condor can be part of the NGS • Condor can be used with the NGS • Being combined with NGS in many Campus Grids • Incorporation of Windows into the NGS NGS Innovation Forum, Manchester
Acknowledgements • Some slides are based on material from the University of Wisconsin-Madison Condor team. • Slides describing the UK university condor work are based on ones provided by them NGS Innovation Forum, Manchester