1 / 25

Heterogeneous Pools John Kewley j.kewley@dl.ac.uk CCLRC e-Science Centre e-science.clrc.ac.uk/web/staff/john_kewley

Heterogeneous Pools John Kewley j.kewley@dl.ac.uk CCLRC e-Science Centre http://www.e-science.clrc.ac.uk/web/staff/john_kewley. Outline. Building/Growing a Condor pool for opportunistic computation Some issues Uses for a Heterogeneous pool. Let's build a Condor Pool.

moe
Download Presentation

Heterogeneous Pools John Kewley j.kewley@dl.ac.uk CCLRC e-Science Centre e-science.clrc.ac.uk/web/staff/john_kewley

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Heterogeneous Pools John Kewley j.kewley@dl.ac.uk CCLRC e-Science Centre http://www.e-science.clrc.ac.uk/web/staff/john_kewley

  2. Outline • Building/Growing a Condor pool for opportunistic computation • Some issues • Uses for a Heterogeneous pool

  3. Let's build a Condor Pool Our site has many machines that spend much of their time idling. We should harness their power to provide a reasonable sized computing resource. Lets use Condor!

  4. Abundance of machines Under • Windows workstations (but centrally administered) • Linux desktops (but administered by “owners”) • Commodity Clusters (unavailable, many being decommissioned, no access to root) • Servers for CVS, backup, external web access, access grid (production systems – mission critical) • Training machines (turned off when not in use – only 4 at present) • HPCx (top500 HPC) (No comment!)

  5. How do we grow a useful sized pool? • How do we attract users without a large pool? • How do we encourage more desktop owners to install Condor on their machines without "push" from users, PC Support or Management? • How can I convince Management or PC support without any users? Chicken and egg – which comes first? We need to grow the pool organically

  6. Approach “Community” approach: allow any users to participate if they provide at least one execute machine. If you want us to trust you to run jobs on our machines – you should trust us to run jobs on yours. Typically, most nodes are both submit and execute nodes

  7. True Cycle Stealing Scavenging Allow any machines to join pool • Dual-boot • Low-spec machines • Laptops • Machines that are not always on

  8. Some Pool Statistics • 15 resource “Owners” at 2 sites in 3 departments • 14 OS Variants • 32 Processors on 24 execution Machines (including central node) • 6 Windows (3 variants) • 23 Linux (11 variants) • 1 submit-only node (head-node of the e-HTPX cluster) • 3 departments • 2 laptops and 2 dual-boots (only main included above)

  9. Compute Farm • Homogenous: large numbers of (almost) identical resources • Often co-located physically: a training room, lab workstations or a large cluster • Centrally managed, often by dedicated staff • Typical of many Condor Pools: excellent for High Throughput Computing

  10. Compute Farm

  11. Compute Zoo • Heterogeneous: resources are of many different operating system types and architectures • Located across a site • Individually, or variously managed • Of use for HTC ?

  12. Compute Zoo

  13. The CCLRC Compute Zoo • 3x Windows XP Professional • 2x Windows 2000 Professional • 1x Windows NT 4.0 Workstation • 7x SuSE Linux 9.0 • 2x SuSE Linux 8.0 • 1x SuSE Linux 9.1 • 5x White Box Enterprise Linux 3.0 • 1x Red Hat Enterprise Linux AS release 3.0 • 1x Red Hat Enterprise Linux WS release 3.0 • 3x Red Hat Linux 9 • 2x Red Hat Linux 8.0 • 2x Red Hat Linux 7.3 • 1x Mandrake Linux 10.1 • 1x Gentoo Linux 1.4

  14. Outline • Building/Growing a Condor pool for speculative computation • Some issues • Uses for a Heterogeneous pool

  15. Profile varies Pool composition varies over time • Machines such as laptops and dual-boots are ephemeral • Even many desktops get turned off at night in an attempt to be more eco-friendly No guarantee that resources of a particular type will be there. Jobs can be terminated at any time

  16. Management Most machines have a different administrator which makes updating software tricky However: • On Linux, Condor has an account on all nodes in the pool and owns condor_config.local (although not condor_config.root – this is needed so that DAEMON_LIST cannot be updated) • The central node is included as an administrator for all nodes in HOSTALLOW_ADMINISTRATOR so condor_reconfig can be used.

  17. What jobs? • Great for HTC jobs where jobs are short or have minimum requirements on memory or processor speed. • Executables must be available for different platforms. • How can we get the code appropriate to each platform?

  18. Outline • Building/Growing a Condor pool for speculative computation • Some issues • Uses for a Heterogeneous pool

  19. Computer Science use Maybe Condor can be used to solve Computer Science rather than Computational Science problems? • Build software • Test software • Release Software

  20. “Build and Test” • The CCLRC pool was part of the UK Grid Engineering Task Force “Build and Test” project. • Software bundles were distributed to a variety of OS types around the flocked pool for building and testing. • This type of (flocked) pool relies on heterogeneity and small numbers of each type are all that are required. http://polaris.ecs.soton.ac.uk:65000/ http://wiki.nesc.ac.uk/read/sfct?HomePage

  21. Building tarballs • Any tarball that, after uncompressing and untarring would build with • configure • make • make install would be suitable for the framework • Software was then built with an install directory in the job directory, for automatic return by Condor • Unix s/w would also build under cygwin by the same mechanism

  22. test_each This is a script which when given a ClassAd (including a user-defined one) will generate a submit file which will create a job for EVERY different value it got for that ClassAd.

  23. Other CS Uses • I want to ensure my code compiles without warnings and/or runs its basic tests on • As many OSs as possible • With as many different compilers as possible • With various compilation/linking options such as debugging, memory and performance tools

  24. Other CS Uses • I want to perform a release build of my product for platform X, but I only have accounts on A, B and C • I have several server-licensed products and many potential occasional users. How can this be made available to them more easily (within the bounds of the licence of course!) ?

  25. Summary • Setting up a Condor pool of personal workstations requires considerable coaxing, convincing, coercion and cajoling. • A heterogeneous pool (Compute Zoo!) may probably only be useable for short opportunistic jobs which can cycle scavenge effectively. • Such pools are great from a software engineering perspective

More Related