120 likes | 136 Views
Putting Existing Farms on the Testbed. Manchester DZero/Atlas and BaBar farms are available via the Testbed. Done with a handful of modifications to the Testbed site and to the existing farms. This talks describes what we did and how you can do it too.
E N D
Putting Existing Farms on the Testbed • Manchester DZero/Atlas and BaBar farms are available via the Testbed. • Done with a handful of modifications to the Testbed site and to the existing farms. • This talks describes what we did and how you can do it too... Andrew McNab - Manchester HEP - 17 September 2002
Farms at Manchester HEP BaBar 80 * 0.8GHz GridFarm 16 * 1.0GHz DZero / Atlas 60 * 1.5GHz Andrew McNab - Manchester HEP - 17 September 2002
The problem • We want to make existing farms available on the Testbed. • But we don’t want to massively reconfigure/reinstall farms • they’re in production so need to be kept stable • they are already configured the way their owners need • We might want to keep reinstalling as EDG software is updated. • this is labour intensive unless we install from scratch with LCFG install • don’t want to have to make many manual changes to CE etc every time we install/upgrade • Solution that has been mentioned several times is to have a standard EDG Testbed Site as a front end to the Existing Farm • So want to find the minimal set of changes to Farm and Testbed Site that will put the Farm on the Testbed. Andrew McNab - Manchester HEP - 17 September 2002
Standard Testbed Site /home • All elements installed from LCFG server • Computing Element shares /home directories by NFS • Storage Element shares /flatfiles with data by NFS • PBS Server on CE talks to PBS on Worker Nodes. WN PBS Node CE PBS Server PBS WN PBS Node LCFG WN PBS Node SE /flatfiles Andrew McNab - Manchester HEP - 17 September 2002
What we want Grid Farm / Testbed Site BaBar or DZero/Atlas Farm /home qsub WN PBS Node CE PBS Server PBS Server PBS WN PBS Node LCFG PBS Node WN PBS Node SE PBS Node /flatfiles Andrew McNab - Manchester HEP - 17 September 2002
Reconfigure Existing Farm • PBS Server must allow access from CE, but only for the right users. • Add CE to list of valid job submission clients (eg in hosts.equiv) • Create special queue (bfq or dfq) for Testbed jobs. • Limit queues so desired pool of accounts (eg atlas001 etc) can submit jobs to the bfq/dfq but other queues/pools forbidden. • PBS Nodes need access to pool accounts, home directories on CE, and /flatfiles area on SE. • If already using NFS automount, then easy to add /home on CE and /flatfiles on SE (eg as /nfs/gf-home and /nfs/gf-flatfiles) • Add pool accounts to /etc/passwd (or NIS) • Make symbolic links in /home to automount CE /home directories. Andrew McNab - Manchester HEP - 17 September 2002
Software on PBS Nodes • For current EDG job submissions to work, need to install globus-url-copy RPMs on PBS Nodes. • PBS Nodes currently need to make an outgoing gridftp • connections to Resource Broker. • GridFTP possible with NAT, but difficult. • Other middleware RPMs will be needed if also intending to manipulate SE and RC during jobs. • For use with EDG Testbed, should also install relevant application RPMs Andrew McNab - Manchester HEP - 17 September 2002
Changes to Testbed Site • Have attempted to minimise changes: • easier to document and support • easier to maintain as EDG software changes • Basic philosophy: modify EDG scripts to make remote qsub and qstat calls to PBS Server machines on the farms. • Only need to edit 3 scripts on the CE • /opt/globus/libexec/globus-script-pbs-queue • /opt/edg/info/mds/sbin/skel/ce-globus.skel • /opt/edg/info/mds/bin/ce-pbs • Create grid-mapfile and ce-static.ldif for each queue. • Include farm queue and PBS nodes in LCFG site-cfg.h Andrew McNab - Manchester HEP - 17 September 2002
New behaviour • Modified ce-pbs queries PBS Server using remote qstat • Publishes edited grid-mapfile listing only the right users. • Jobs can be submitted using Resource Broker, based on published information. • When received by CE, globus-script-pbs-queue submits job to remote PBS Server • EDG Globus jobmanager on CE monitors job status via remote qstat and transmits to Logging as normal. • Job runs on PBS Node with access to pool account /home • Job completes and returns files to RB via gridftp Andrew McNab - Manchester HEP - 17 September 2002
Example logs • Three jobmanagers visible to GridPP MDS and RB: • gf18.hep.man.ac.uk:2119/jobmanager-pbs-gfq (Grid Farm/Testbed) • gf18.hep.man.ac.uk:2119/jobmanager-pbs-dfq (DZero/Atlas farm) • gf18.hep.man.ac.uk:2119/jobmanager-pbs-bfq (BaBar farm) • Different operating system, grid-mapfile lists of users etc for each queue. • Can submit job to RB and have it matchmake the requirements • including dynamic properties like free nodes • Example log shows submitting a job from UI at RAL via RB at IC, which decides which farm at Manchester matches and sends the job there. Andrew McNab - Manchester HEP - 17 September 2002
Applying this to other sites • This recipe being written up for http://www.gridpp.ac.uk/tb-support/ • With current EDG release, the PBS Nodes need outgoing direct internet access (not NAT.) • You need to be able to make minor changes to PBS Server permissions, NFS mounts etc as described. • You should have some (3?) dedicated Testbed machines, or add it to an existing GridPP/EDG Testbed setup. • We use Microdirect.co.uk boxes at 1.5GHz/256MB/40GB box for £250 …. • If you don’t use an EDG-supported batch system (PBS etc), you need to modify ce-pbs and globus-script-pbs-* scripts to use your job submission commands. Andrew McNab - Manchester HEP - 17 September 2002
Summary • It’s not at all difficult to access existing PBS farms via an EDG Testbed site. • include CE + SE in NFS and PBS configuration of farm • include pool accounts in farms passwd file • enforce security by account pools • Only need to modify a handful of files on the Testbed CE. • Should be relatively straightforward to apply this to other batch queue systems even if you don’t use PBS. • We’ve demonstrated putting our 150 * ~1 GHz nodes on the current Testbed and submitting jobs via GridPP RB • You can too. Andrew McNab - Manchester HEP - 17 September 2002