90 likes | 252 Views
CASTOR 2 Setup at CNAF. Experience about the deployment of the new stager at CNAF TIER1. Giuseppe Lo Presti - Sebastien Ponce CERN/IT/FIO & INFN/CNAF. Castor External Operation Meeting, RAL, January 24-25, 2006. Outline. General deployment Main issues and fixes stager scheduler
E N D
CASTOR 2 Setup at CNAF Experience about the deployment of the new stager at CNAF TIER1 Giuseppe Lo Presti - Sebastien PonceCERN/IT/FIO & INFN/CNAF Castor External Operation Meeting, RAL, January 24-25, 2006
Outline • General deployment • Main issues and fixes • stager • scheduler • disk servers • Enhancements • Quattor-based components • Conclusions and plans
stagerdb Oracle/RHE dlfdb Oracle/SL Disk cache CNAF deployment • DNS aliases as at CERN • castorrh • castorstager • castorscheduler • castorrtcpcld • castordlf • castorrmmaster • LSF • master exports /lsf via NFS • rmmaster and diskserver(s) use it as clients • DLF • Both dlfserver and Oracle database on the same machine • Castor1 services • Shared with current Castor1 production instance • Tape part unchanged RH, stager, MigHunter, rtcpclientd DLFserver, rmmaster, expertd, LSF, DLF GUI LSF master castor-6 diskserv-san-13 castorlsf01 Castor1 services (vdqm, vmgr, ns, cupvd) castor sc11 oracle01 diskserv-san-13 same tapesrvsas for Castor1
Followed procedure • Quattor • Based on profiles used at CERN for the test instance • Modified to fit CNAF needs (for instance all environment related configuration went to /etc/profile) • Hand-made hacks • Mainly to fix wrong scripts • Manual installation of DLF GUI • Complete deployment done in 2.5 days joint work with the operation team • starting from servers with RPMs • LSF roughly configured
Main issues 1/3 • Startup scripts • Most of them made wrong assumptions on the Oracle environment • Modified by hand • /etc/sysconfig/rmmaster stated not to use LSF • Mostly fixed in the latest release 2.0.2-9 • yet some other fixes to be delivered with the next release • DLF GUI • Installed by hand (actually copied from CERN servers) on castordlf • Needed Apache + PHP + Oracle interface • Going to be packaged • LSF • Thanks to LSF people at CNAF, only few files had to be modified • All of them are on the wiki page Castor2InstanceForExternalInstitutes • but the castor plugin didn’t log because of wrong permissions on /var/spool/scheduler.
Main issues 2/3 • Castor main configuration file castor.conf • Modified by hand • Still some parts to be understood • Disk servers • Missing procedure to make them, created from direct inspection of CERN diskservers. • The Wiki page includes now a procedure for them • User stage:st • not properly configured before RPM installation • CNAF uses LDAP to handle users • this leaded to misbehaviours of RMMaster and stagerJob • fixed by hand, RPMs were fine.
Main issues 3/3 • Migration • Didn’t work because of missing autorizations to be granted in Cupvdaemon for the rtcpclientd • Cf. wiki page • Lack of time in December left this part unconfigured • DLF database • Due to the large amount of log needed to figure out early problems, the DLF db table space filled up soon • Issue at CERN too, will be solved soon by Oracle partitioning • It remains an issue for the MySQL implementation • Scheduler/RMMaster • FQNs vs. simple names of diskservers:the rmnode gives the FQN to the rmmaster to describe the filesystem resource, while the scheduler knows only the simple name • This leads to jobs not being scheduled for lack of resources
Enhancements since Dec • Fixed several bugs in the Castor2 codebase, and improved automatic configuration with Quattor • NCM component for castor.conf and LSF configuration • One more deployment made at CERN for SC3 rerun by the operation team – only 1.0 more day work needed • Currently a new production instance for Alice is being deployed • Updated installation procedure in the Wiki pages • still some “hacks” have to be fixed/implementedin the RPMs • Installation at RAL (last week + yesterday)
Conclusions • First Castor2 instance outside CERN almost running • actually we were not able so far to achieve the full transfer client -> diskserver -> tape, still working on it… • New deployments expected (much) smoother • Use RPMs version 2.0.2-9 or newer • This exercise has been (and will be) done at CERN for the planned production instances of Castor2 • Plans and future directions • Further improvements on the procedure in the short-term • Needed a plan for upgrades whenever new releases will be made available (see plans tomorrow) • Quattor based installations have some advantage here • Questions & Discussions