1 / 12

Putting Existing Farms on the Testbed

Putting Existing Farms on the Testbed. Manchester DZero/Atlas and BaBar farms are available via the Testbed. Done with a handful of modifications to the Testbed site and to the existing farms. This talks describes what we did and how you can do it too.

borisn
Download Presentation

Putting Existing Farms on the Testbed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Putting Existing Farms on the Testbed • Manchester DZero/Atlas and BaBar farms are available via the Testbed. • Done with a handful of modifications to the Testbed site and to the existing farms. • This talks describes what we did and how you can do it too... Andrew McNab - Manchester HEP - 17 September 2002

  2. Farms at Manchester HEP BaBar 80 * 0.8GHz GridFarm 16 * 1.0GHz DZero / Atlas 60 * 1.5GHz Andrew McNab - Manchester HEP - 17 September 2002

  3. The problem • We want to make existing farms available on the Testbed. • But we don’t want to massively reconfigure/reinstall farms • they’re in production so need to be kept stable • they are already configured the way their owners need • We might want to keep reinstalling as EDG software is updated. • this is labour intensive unless we install from scratch with LCFG install • don’t want to have to make many manual changes to CE etc every time we install/upgrade • Solution that has been mentioned several times is to have a standard EDG Testbed Site as a front end to the Existing Farm • So want to find the minimal set of changes to Farm and Testbed Site that will put the Farm on the Testbed. Andrew McNab - Manchester HEP - 17 September 2002

  4. Standard Testbed Site /home • All elements installed from LCFG server • Computing Element shares /home directories by NFS • Storage Element shares /flatfiles with data by NFS • PBS Server on CE talks to PBS on Worker Nodes. WN PBS Node CE PBS Server PBS WN PBS Node LCFG WN PBS Node SE /flatfiles Andrew McNab - Manchester HEP - 17 September 2002

  5. What we want Grid Farm / Testbed Site BaBar or DZero/Atlas Farm /home qsub WN PBS Node CE PBS Server PBS Server PBS WN PBS Node LCFG PBS Node WN PBS Node SE PBS Node /flatfiles Andrew McNab - Manchester HEP - 17 September 2002

  6. Reconfigure Existing Farm • PBS Server must allow access from CE, but only for the right users. • Add CE to list of valid job submission clients (eg in hosts.equiv) • Create special queue (bfq or dfq) for Testbed jobs. • Limit queues so desired pool of accounts (eg atlas001 etc) can submit jobs to the bfq/dfq but other queues/pools forbidden. • PBS Nodes need access to pool accounts, home directories on CE, and /flatfiles area on SE. • If already using NFS automount, then easy to add /home on CE and /flatfiles on SE (eg as /nfs/gf-home and /nfs/gf-flatfiles) • Add pool accounts to /etc/passwd (or NIS) • Make symbolic links in /home to automount CE /home directories. Andrew McNab - Manchester HEP - 17 September 2002

  7. Software on PBS Nodes • For current EDG job submissions to work, need to install globus-url-copy RPMs on PBS Nodes. • PBS Nodes currently need to make an outgoing gridftp • connections to Resource Broker. • GridFTP possible with NAT, but difficult. • Other middleware RPMs will be needed if also intending to manipulate SE and RC during jobs. • For use with EDG Testbed, should also install relevant application RPMs Andrew McNab - Manchester HEP - 17 September 2002

  8. Changes to Testbed Site • Have attempted to minimise changes: • easier to document and support • easier to maintain as EDG software changes • Basic philosophy: modify EDG scripts to make remote qsub and qstat calls to PBS Server machines on the farms. • Only need to edit 3 scripts on the CE • /opt/globus/libexec/globus-script-pbs-queue • /opt/edg/info/mds/sbin/skel/ce-globus.skel • /opt/edg/info/mds/bin/ce-pbs • Create grid-mapfile and ce-static.ldif for each queue. • Include farm queue and PBS nodes in LCFG site-cfg.h Andrew McNab - Manchester HEP - 17 September 2002

  9. New behaviour • Modified ce-pbs queries PBS Server using remote qstat • Publishes edited grid-mapfile listing only the right users. • Jobs can be submitted using Resource Broker, based on published information. • When received by CE, globus-script-pbs-queue submits job to remote PBS Server • EDG Globus jobmanager on CE monitors job status via remote qstat and transmits to Logging as normal. • Job runs on PBS Node with access to pool account /home • Job completes and returns files to RB via gridftp Andrew McNab - Manchester HEP - 17 September 2002

  10. Example logs • Three jobmanagers visible to GridPP MDS and RB: • gf18.hep.man.ac.uk:2119/jobmanager-pbs-gfq (Grid Farm/Testbed) • gf18.hep.man.ac.uk:2119/jobmanager-pbs-dfq (DZero/Atlas farm) • gf18.hep.man.ac.uk:2119/jobmanager-pbs-bfq (BaBar farm) • Different operating system, grid-mapfile lists of users etc for each queue. • Can submit job to RB and have it matchmake the requirements • including dynamic properties like free nodes • Example log shows submitting a job from UI at RAL via RB at IC, which decides which farm at Manchester matches and sends the job there. Andrew McNab - Manchester HEP - 17 September 2002

  11. Applying this to other sites • This recipe being written up for http://www.gridpp.ac.uk/tb-support/ • With current EDG release, the PBS Nodes need outgoing direct internet access (not NAT.) • You need to be able to make minor changes to PBS Server permissions, NFS mounts etc as described. • You should have some (3?) dedicated Testbed machines, or add it to an existing GridPP/EDG Testbed setup. • We use Microdirect.co.uk boxes at 1.5GHz/256MB/40GB box for £250 …. • If you don’t use an EDG-supported batch system (PBS etc), you need to modify ce-pbs and globus-script-pbs-* scripts to use your job submission commands. Andrew McNab - Manchester HEP - 17 September 2002

  12. Summary • It’s not at all difficult to access existing PBS farms via an EDG Testbed site. • include CE + SE in NFS and PBS configuration of farm • include pool accounts in farms passwd file • enforce security by account pools • Only need to modify a handful of files on the Testbed CE. • Should be relatively straightforward to apply this to other batch queue systems even if you don’t use PBS. • We’ve demonstrated putting our 150 * ~1 GHz nodes on the current Testbed and submitting jobs via GridPP RB • You can too. Andrew McNab - Manchester HEP - 17 September 2002

More Related