CASTOR at RAL

GridPP 0x15th collab’n mtg Swansea, 3-4 Sep 0x07D8 Jens Jensen CASTOR at RAL

RAL CASTOR 2.1.7-15 Allows disk pools shared between service classes (CMS request) (with GridFTP2)‏ 2.1.7-16-1 SEGV on non-existent svccls ppToGet bugfix for xrootd Scheduling and GC bugfixes CERN on -16 (or -14)‏ Dash 1 contains mighunter db procedure hotfix

The Bug Caused a lot of downtime First Atlas, then LHCb, then CMS id2type.id suddenly “6.022x1023” Sequence number – suddenly real rather than int Related to stager “bulk” code (but not consistent), also with bulk==1 Related to RAC? Related to CASTOR instances sharing RAC clusters?

The Bug Seen in all RAC instance Not certification Shotgun workaround: restart RH automatically when error appears Apply same approach to mighunter Database parameter added Needs database restart (completed Wed. for remaining db (CMS))‏ Seems to have fixed the problem? Watched. Also used at CERN, seems to have no side effects

The Bug As of 11:25 today, Sebastien reports: Possibility of memory corruption in variable passing length to Oracle Affecting ASGC – thought to be unrelated to RAL's However, it's still in the bulk code? Fixed in 2.1.8, backported to 2.1.7, in next release

LHCb down... Yesterday morning LSF logs filling up, then not rotating Workaround, then LSF fix Not affecting CMS and Atlas

Repack Is was now working Sort of Occasionally files get stuck in stage-in Have to get unstuck “manually” High stager load with many tapes (INFN)‏ Stuck again, as of a few minutes ago... Repack instance is sharing RAC with CMS

Data transfer and access Xrootd doesn't work at RAL Not clear why not GridFTP v2 Forks resources on demand

Dark Data Dark Storage Storage which is there but cannot be reached Now also published via BDII (not in prod’n though)‏ “Reserved” (but also for non-space-token)‏ not yet fully WLCG compliant but WLCG may change Need nearline space published (could do it in past)‏ Dark Data Orphaned data:

Releases Release management Extremely complex system – CERN do testing We do testing, too – need to track CERN Differences between the labs (also INFN, ASGC)‏ Good support from CERN – often RAL specific patches CASTOR 2.1.8 Expected “Mid September” Secure RFIO (here be dragons)‏

Status CASTOR instances at RAL: Atlas, CMS, LHCb, gen, preprod, certification 24/7 support via callouts (see Andrew's talk)‏ Communication is important CASTOR-Experiments meeting CASTORPP-L (announce) and CASTOR-SUPPORT “CASTOR external” meetings CASTOR team CASTOR meeting CASTOR-OP

CASTOR Team overview Bonny – “benevolent coordinator” Chris – LSF, disk server deployment Tim – tapes, robot Shaun – SRM, CASTOR debugging Cheney – monitoring, Nagios, servers Guy – servers, setup Jens – SRM info, occasional debugging/support And of course the T1 team and the DB team

Final words “The log files never lie (well hardly ever)” Shaun “The Grid is an experimental science” me

Conclusion High priority at RAL CASTOR is obviously critical to UKI This is understood... lots of effort Extremely complex system Three teams at RAL working together No single person knows everything Communication is important Testing is important Differences between setups/labs

CASTOR at RAL

CASTOR at RAL

Presentation Transcript

Database Deployment at RAL

Ion Source Development at RAL

Report from CASTOR external operations F2F meeting held at RAL in February

Superbeam target work at RAL

Stave 250: Preparations at RAL

MICE RF – Installation at RAL

CASTOR

Experiences Deploying Xrootd at RAL

CMS Software at RAL

Preliminary DTL studies at RAL

CASTOR at CMS Enhancing Forward Physics

CASTOR 2 Setup at CNAF

Monitoring of Castor at RAL

THE MICE facility at RAL

Implementation of MICE at RAL

Multicore at RAL

Implementation at RAL

GridPP24, RHUL RAL CASTOR Upgrade Plans & Timeline

Scheduling under LCG at RAL

HTCondor at RAL: An Update

Storage at RAL Tier1A

Support for Laptops at RAL

CASTOR at RAL

CASTOR at RAL

Presentation Transcript

Database Deployment at RAL

Ion Source Development at RAL

Report from CASTOR external operations F2F meeting held at RAL in February

Superbeam target work at RAL

Stave 250: Preparations at RAL

MICE RF – Installation at RAL

CASTOR

Experiences Deploying Xrootd at RAL

CMS Software at RAL

Preliminary DTL studies at RAL

CASTOR at CMS Enhancing Forward Physics

CASTOR 2 Setup at CNAF

Monitoring of Castor at RAL

THE MICE facility at RAL

Implementation of MICE at RAL

Multicore at RAL

Implementation at RAL

GridPP24, RHUL RAL CASTOR Upgrade Plans &amp; Timeline

Scheduling under LCG at RAL

HTCondor at RAL: An Update

Storage at RAL Tier1A

Support for Laptops at RAL

GridPP24, RHUL RAL CASTOR Upgrade Plans & Timeline