Moving from CREAM CE to ARC CE: Migration at RAL

Moving from CREAM CE to ARC CE Andrew Lahiff andrew.lahiff@stfc.ac.uk

The short version • Install ARC CE • Test ARC CE • Move ARC CE(s) into production • Drain CREAM CE(s) • Switch off CREAM CE(s)

Migration at RAL • In 2013 we combined • migration from Torque to HTCondor • migration from CREAM CE to ARC CE • Initial reasons for choice of ARC CE • we didn’t like CREAM • HTCondor-CE was still very new, even in OSG • had heard good things about ARC • Glasgow & Imperial College in the UK had already tried it • looked much simpler than CREAM • YAIM not required • ATLAS use it a lot

Migration at RAL Initially had CREAM CEs + Torque CREAM CEs (Torque) Torque server / Maui worker nodes (Torque) APEL glite-CLUSTER

Migration at RAL Added HTCondor pool + ARC & CREAM CEs CREAM CEs (Torque) Torque server worker nodes (Torque) APEL glite-CLUSTER ARC CEs condor_schedd HA HTCondor central managers worker nodes condor_startd CREAM CEs condor_schedd

Migration at RAL Torque batch system decommissioned APEL glite-CLUSTER ARC CEs condor_schedd HA HTCondor central managers worker nodes condor_startd CREAM CEs condor_schedd

Migration at RAL CREAM CEs & APEL publisher decommissioned - once all LHC VOs & non-LHC VOs could submit to ARC glite-CLUSTER ARC CEs condor_schedd HA HTCondor central managers worker nodes condor_startd

Migration at RAL glite-CLUSTER decommissioned ARC CEs condor_schedd HA HTCondor central managers worker nodes condor_startd

ARC CEs at RAL • 4 ARC CEs – each is a VM with • 4 CPUs • 32 GB memory • most memory usage comes from the condor shadows we use 32-bit HTCondor rpms will move to static shadows soon • see slapd using up to ~1 GB • we wanted to have lots of headroom! (were new to both ARC and HTCondor) • Using multiple ARC CEs for redundancy & scalabilty

ARC CEs at RAL • Example from today – 5.5K running jobs on a single CE

Usage since Oct 2013 • Generally have 2-3K running jobs per CE Running jobs per ARC CE monitoring glitch

Things you need to know

glite-WMS support • Some non-LHC VOs still use glite-WMS • getting less & less important • In order for the WMS job wrapper to work with ARC CEs, need an empty file /usr/etc/globus-user-env.sh on all worker nodes

Software tags • Software tags (almost) no longer needed due to CVMFS • some non-LHC VOs may need them however • again, probably getting less & less important • ARC • runtime environments appear in the BDII in the same way as software tags • unless you have a shared filesystem (worker nodes, CEs), no way for VOs to update tags themselves • our configuration management system manages the runtime environments • mostly just empty files

Information system • Max CPU & wall time not published correctly • only a problem for the HTCondor backend • no way for ARC to determine this from HTCondor • could try to extract from SYSTEM_PERIODIC_REMOVE? • what if someone does this on the worker nodes, e.g. WANT_HOLD? • We modified /usr/share/arc/glue-generator.pl

Information system - VO views • ARC reports the same number of running & idle jobs for all VOs • We modified /usr/share/arc/glue-generator.pl • cron running every 10 mins queries HTCondor & creates files listing numbers of jobs by VO • glue-generator.pl modified to read these files • Some VOs still need this information (incl LHC VOs) • hopefully the need for this will slowly go away

Information system – VO shares • VO shares not published • Added some lines into /usr/share/arc/glue-generator.pl GlueCECapability: Share=cms:20 GlueCECapability: Share=lhcb:27 GlueCECapability: Share=atlas:49 GlueCECapability: Share=alice:2 GlueCECapability: Share=other:2 • Not sure why this information is needed anyway 

LHCb • DIRAC can’t specify runtime environments • we use an auth plugin to specify a default runtume environment • we put all essential things in here (grid-related env variables etc) • Default runtime environment needs to set • NORDUGRID_ARC_QUEUE=<queue name> https://github.com/alahiff/ral-arc-ce-rte/blob/master/GLITE

Multi-core jobs • In order for stdout/err to be available to VO, need to set RUNTIME_ENABLE_MULTICORE_SCRATCH=1 in a runtime environment • In ours we have: if [ "x$1" = "x0" ]; then export RUNTIME_ENABLE_MULTICORE_SCRATCH=1 fi (amongst other things) https://github.com/alahiff/ral-arc-ce-rte/blob/master/GLITE

Auth plugins • Can configure an external executable to run every time a job is about to switch to a different state • ACCEPTED, PREPARING, SUBMIT, FINISHING, FINISHED, DELETED • Very useful! Our standard uses • Setting default runtime environment for all jobs • Scaling CPU & wall time for completed jobs • Occasionally for debugging • keep all stdout/err files for completed jobs for a particular VO https://github.com/alahiff/ral-arc-ce-plugins

User mapping • Argus for mapping to local pool accounts (via lcmaps) • In /etc/arc.conf [gridftpd] ... unixmap="* lcmaps liblcmaps.so /usr/lib64 /usr/etc/lcmaps/lcmaps.db arc" unixmap="banned:banned all” ... • Setup Argus policies to allow all supported VOs to submit jobs

Monitoring

Monitoring - alerts • ARC Nagios tests • Check proc a-rex • Check proc gridftp • Check proc nordugrid-arc-bdii • Check proc nordugrid-arc-slapd • Check ARC APEL consistency • check that SSM message sent successfully to APEL < 24 hours ago • Check HTCondor-ARC consistency • check that HTCondor & ARC agree on number of running + idle jobs

Monitoring - alerts • HTCondor Nagios tests • Check HTCondor CE Schedd • check that the schedd ClassAd is available • we found that a check for condor_master is not enough, e.g. if you have a corrupt HTCondor config file • Check job submission HTCondor • check that Nagios can successfully submit job to HTCondor

Monitoring - Ganglia • Ganglia metrics • standard host metrics • Gangliarc: http://wiki.nordugrid.org/wiki/Gangliarc • ARC specific metrics • condor_gangliad • HTCondor specific metrics

Monitoring - Ganglia +more...

Monitoring - InfluxDB • 1-min time resolution • ARC CE metrics • job states, time since last arex heartbeat • HTCondor metrics include: • shadow exit codes • numbers of jobs run more than once

Monitoring - InfluxDB

Problems we’ve had • APEL central message broker hardwired in config • when hostname of the message broker changed once, APEL publishing stopped • now have Nagios check for APEL publishing • ARC-HTCondor running+idle jobs consistency • before scan-condor-job was optimized, had ~2 incidents in the past couple of years where ARC lost track of jobs • best to use ARC version > 5.0.0

Moving from CREAM CE to ARC CE: Migration at RAL

Moving from CREAM CE to ARC CE: Migration at RAL

Presentation Transcript

i ce cream

CE -CE Marking

CE 513

CE 3500

Current status WMS and CREAM CE deployment

CE 525

Ce qui vs. Ce que

CREAM-CE and WMS news

CE-200

CE (%)

CE BEM, CE MANCAM ?

Ce que / Ce qui

Arab 650 CE – 1800 CE

600 CE to 1450 CE

CREAM CE: report for the OSCT

CREAM-CE status and evolution plans

CREAM CE Certification and Testing

A-REX / ARC CE Status and Plans