250 likes | 431 Views
Heterogeneous Pools John Kewley j.kewley@dl.ac.uk CCLRC e-Science Centre http://www.e-science.clrc.ac.uk/web/staff/john_kewley. Outline. Building/Growing a Condor pool for opportunistic computation Some issues Uses for a Heterogeneous pool. Let's build a Condor Pool.
E N D
Heterogeneous Pools John Kewley j.kewley@dl.ac.uk CCLRC e-Science Centre http://www.e-science.clrc.ac.uk/web/staff/john_kewley
Outline • Building/Growing a Condor pool for opportunistic computation • Some issues • Uses for a Heterogeneous pool
Let's build a Condor Pool Our site has many machines that spend much of their time idling. We should harness their power to provide a reasonable sized computing resource. Lets use Condor!
Abundance of machines Under • Windows workstations (but centrally administered) • Linux desktops (but administered by “owners”) • Commodity Clusters (unavailable, many being decommissioned, no access to root) • Servers for CVS, backup, external web access, access grid (production systems – mission critical) • Training machines (turned off when not in use – only 4 at present) • HPCx (top500 HPC) (No comment!)
How do we grow a useful sized pool? • How do we attract users without a large pool? • How do we encourage more desktop owners to install Condor on their machines without "push" from users, PC Support or Management? • How can I convince Management or PC support without any users? Chicken and egg – which comes first? We need to grow the pool organically
Approach “Community” approach: allow any users to participate if they provide at least one execute machine. If you want us to trust you to run jobs on our machines – you should trust us to run jobs on yours. Typically, most nodes are both submit and execute nodes
True Cycle Stealing Scavenging Allow any machines to join pool • Dual-boot • Low-spec machines • Laptops • Machines that are not always on
Some Pool Statistics • 15 resource “Owners” at 2 sites in 3 departments • 14 OS Variants • 32 Processors on 24 execution Machines (including central node) • 6 Windows (3 variants) • 23 Linux (11 variants) • 1 submit-only node (head-node of the e-HTPX cluster) • 3 departments • 2 laptops and 2 dual-boots (only main included above)
Compute Farm • Homogenous: large numbers of (almost) identical resources • Often co-located physically: a training room, lab workstations or a large cluster • Centrally managed, often by dedicated staff • Typical of many Condor Pools: excellent for High Throughput Computing
Compute Zoo • Heterogeneous: resources are of many different operating system types and architectures • Located across a site • Individually, or variously managed • Of use for HTC ?
The CCLRC Compute Zoo • 3x Windows XP Professional • 2x Windows 2000 Professional • 1x Windows NT 4.0 Workstation • 7x SuSE Linux 9.0 • 2x SuSE Linux 8.0 • 1x SuSE Linux 9.1 • 5x White Box Enterprise Linux 3.0 • 1x Red Hat Enterprise Linux AS release 3.0 • 1x Red Hat Enterprise Linux WS release 3.0 • 3x Red Hat Linux 9 • 2x Red Hat Linux 8.0 • 2x Red Hat Linux 7.3 • 1x Mandrake Linux 10.1 • 1x Gentoo Linux 1.4
Outline • Building/Growing a Condor pool for speculative computation • Some issues • Uses for a Heterogeneous pool
Profile varies Pool composition varies over time • Machines such as laptops and dual-boots are ephemeral • Even many desktops get turned off at night in an attempt to be more eco-friendly No guarantee that resources of a particular type will be there. Jobs can be terminated at any time
Management Most machines have a different administrator which makes updating software tricky However: • On Linux, Condor has an account on all nodes in the pool and owns condor_config.local (although not condor_config.root – this is needed so that DAEMON_LIST cannot be updated) • The central node is included as an administrator for all nodes in HOSTALLOW_ADMINISTRATOR so condor_reconfig can be used.
What jobs? • Great for HTC jobs where jobs are short or have minimum requirements on memory or processor speed. • Executables must be available for different platforms. • How can we get the code appropriate to each platform?
Outline • Building/Growing a Condor pool for speculative computation • Some issues • Uses for a Heterogeneous pool
Computer Science use Maybe Condor can be used to solve Computer Science rather than Computational Science problems? • Build software • Test software • Release Software
“Build and Test” • The CCLRC pool was part of the UK Grid Engineering Task Force “Build and Test” project. • Software bundles were distributed to a variety of OS types around the flocked pool for building and testing. • This type of (flocked) pool relies on heterogeneity and small numbers of each type are all that are required. http://polaris.ecs.soton.ac.uk:65000/ http://wiki.nesc.ac.uk/read/sfct?HomePage
Building tarballs • Any tarball that, after uncompressing and untarring would build with • configure • make • make install would be suitable for the framework • Software was then built with an install directory in the job directory, for automatic return by Condor • Unix s/w would also build under cygwin by the same mechanism
test_each This is a script which when given a ClassAd (including a user-defined one) will generate a submit file which will create a job for EVERY different value it got for that ClassAd.
Other CS Uses • I want to ensure my code compiles without warnings and/or runs its basic tests on • As many OSs as possible • With as many different compilers as possible • With various compilation/linking options such as debugging, memory and performance tools
Other CS Uses • I want to perform a release build of my product for platform X, but I only have accounts on A, B and C • I have several server-licensed products and many potential occasional users. How can this be made available to them more easily (within the bounds of the licence of course!) ?
Summary • Setting up a Condor pool of personal workstations requires considerable coaxing, convincing, coercion and cajoling. • A heterogeneous pool (Compute Zoo!) may probably only be useable for short opportunistic jobs which can cycle scavenge effectively. • Such pools are great from a software engineering perspective