1 / 9

CASTOR 2 Setup at CNAF

CASTOR 2 Setup at CNAF. Experience about the deployment of the new stager at CNAF TIER1. Giuseppe Lo Presti - Sebastien Ponce CERN/IT/FIO & INFN/CNAF. Castor External Operation Meeting, RAL, January 24-25, 2006. Outline. General deployment Main issues and fixes stager scheduler

livvy
Download Presentation

CASTOR 2 Setup at CNAF

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CASTOR 2 Setup at CNAF Experience about the deployment of the new stager at CNAF TIER1 Giuseppe Lo Presti - Sebastien PonceCERN/IT/FIO & INFN/CNAF Castor External Operation Meeting, RAL, January 24-25, 2006

  2. Outline • General deployment • Main issues and fixes • stager • scheduler • disk servers • Enhancements • Quattor-based components • Conclusions and plans

  3. stagerdb Oracle/RHE dlfdb Oracle/SL Disk cache CNAF deployment • DNS aliases as at CERN • castorrh • castorstager • castorscheduler • castorrtcpcld • castordlf • castorrmmaster • LSF • master exports /lsf via NFS • rmmaster and diskserver(s) use it as clients • DLF • Both dlfserver and Oracle database on the same machine • Castor1 services • Shared with current Castor1 production instance • Tape part unchanged RH, stager, MigHunter, rtcpclientd DLFserver, rmmaster, expertd, LSF, DLF GUI LSF master castor-6 diskserv-san-13 castorlsf01 Castor1 services (vdqm, vmgr, ns, cupvd) castor sc11 oracle01 diskserv-san-13 same tapesrvsas for Castor1

  4. Followed procedure • Quattor • Based on profiles used at CERN for the test instance • Modified to fit CNAF needs (for instance all environment related configuration went to /etc/profile) • Hand-made hacks • Mainly to fix wrong scripts • Manual installation of DLF GUI • Complete deployment done in 2.5 days joint work with the operation team • starting from servers with RPMs • LSF roughly configured

  5. Main issues 1/3 • Startup scripts • Most of them made wrong assumptions on the Oracle environment • Modified by hand • /etc/sysconfig/rmmaster stated not to use LSF • Mostly fixed in the latest release 2.0.2-9 • yet some other fixes to be delivered with the next release • DLF GUI • Installed by hand (actually copied from CERN servers) on castordlf • Needed Apache + PHP + Oracle interface • Going to be packaged • LSF • Thanks to LSF people at CNAF, only few files had to be modified • All of them are on the wiki page Castor2InstanceForExternalInstitutes • but the castor plugin didn’t log because of wrong permissions on /var/spool/scheduler.

  6. Main issues 2/3 • Castor main configuration file castor.conf • Modified by hand • Still some parts to be understood • Disk servers • Missing procedure to make them, created from direct inspection of CERN diskservers. • The Wiki page includes now a procedure for them • User stage:st • not properly configured before RPM installation • CNAF uses LDAP to handle users • this leaded to misbehaviours of RMMaster and stagerJob • fixed by hand, RPMs were fine.

  7. Main issues 3/3 • Migration • Didn’t work because of missing autorizations to be granted in Cupvdaemon for the rtcpclientd • Cf. wiki page • Lack of time in December left this part unconfigured • DLF database • Due to the large amount of log needed to figure out early problems, the DLF db table space filled up soon • Issue at CERN too, will be solved soon by Oracle partitioning • It remains an issue for the MySQL implementation • Scheduler/RMMaster • FQNs vs. simple names of diskservers:the rmnode gives the FQN to the rmmaster to describe the filesystem resource, while the scheduler knows only the simple name • This leads to jobs not being scheduled for lack of resources

  8. Enhancements since Dec • Fixed several bugs in the Castor2 codebase, and improved automatic configuration with Quattor • NCM component for castor.conf and LSF configuration • One more deployment made at CERN for SC3 rerun by the operation team – only 1.0 more day work needed • Currently a new production instance for Alice is being deployed • Updated installation procedure in the Wiki pages • still some “hacks” have to be fixed/implementedin the RPMs • Installation at RAL (last week + yesterday)

  9. Conclusions • First Castor2 instance outside CERN almost running • actually we were not able so far to achieve the full transfer client -> diskserver -> tape, still working on it… • New deployments expected (much) smoother • Use RPMs version 2.0.2-9 or newer • This exercise has been (and will be) done at CERN for the planned production instances of Castor2 • Plans and future directions • Further improvements on the procedure in the short-term • Needed a plan for upgrades whenever new releases will be made available (see plans tomorrow) • Quattor based installations have some advantage here • Questions & Discussions

More Related