120 likes | 221 Views
Persistency Framework News for ATLAS. Andrea Valassi (IT-ES) For the Persistency Framework team ATLAS Database Meeting, 19 th April 2010. Outline. Recent developments and releases POOL, CORAL, COOL Oracle client Open issues and priorities
E N D
Persistency Framework Newsfor ATLAS Andrea Valassi (IT-ES) For the Persistency Framework team ATLAS Database Meeting, 19th April 2010
Outline • Recent developments and releases • POOL, CORAL,COOL • Oracle client • Open issues and priorities • CORAL reconnections to database after a network glitch
Latest release 56f (April 2010) • POOL 2.9.8 (main motivation for a new ATLAS release) • Fix for CollSplitByGuid with non-primary token (bug #63860) • Improved handling of RelationalCollection metadata • CORAL 2.3.8 (only listing the main ATLAS-related fixes) • Fix TAG problem at DESY Oracle 11g server (bug #63994) • Fix setMemoryCacheSize in OracleAccess queries (bug #64215) • Fix (CMS) for DDL/DML executed in R/O transaction (bug #17873) • COOL 2.8.6 • Fix the CLOB optimization patch: “ORA-00972 identifier is too long” (bug #64710), memory leak (bug #64835) • Fix handling of null string fields: "f1==f2 and f2==f1 return different results" (bug #62634), setNull for wrapped Attribute (bug #65100) • For full details see the release notes on Drupal • Also: port to osx10.6/64bit, gcc4.5, icc, llvm, upgrade to Xerces 3.1
Oracle client – summary • ATLAS downgraded back from 11g to 10g in 56e (Feb 2010) • 11g client bug #62194on AMD Opteron quadcore at many sites • ATLAS is presently using 10.2.0.4p1 with light instant client • See also the minutes of the February 11 T1SCM • 10g client is OK for now, but 11g is desirable eventually • Light instant client (smaller memory) available on both 10g/11g • SELinux bug #45238 is ~ fixed in 11g, workaround for 10g • OCIEnvCreatebug #31554 fixed in 11g, backport in 10.2.0.4p1 • LOB MT hang bug #47435 fixed in 11g, workaround in CORAL • LOB MT crash bug #49662 solved only by workaround in CORAL • OCIStmtReleasebug #61090 only in 11g, workaround in CORAL • Apr 2010: installed 11.2.0.1.0p1 with three new patches • Fix for AMD Opteron: seems OK, waiting for ATLAS validation • SELinux fixes for OCI 64bit (and OCCI 32bit)
Oracle client – details (1): AMD • Bug in 11g client on AMD Opteron quadcore • Symptom: sqlplus/CORAL crash in SHATransformI32_3() • Bug in Oracle library libnnz11.so • Blocker for Tier1 reconstruction at NDGF (bug #62194) • Also observed in Ljubljana, Chicago… • Triggered the downgrade back to 10g for ATLAS in 56e • New patch was received in April • Installed on AFS as ‘11.2.0.1.0p1’ • Basic validation by PF team in Ljubljana: seems OK at last! • Thanks to Andrej Filipcicfor the test account! • Waiting for more complete validation by ATLAS sites • If this is OK, I would suggest to upgrade back to 11g eventually…
Oracle client – details (2): SELinux • Bug in 10g and 11g client on SLC5 if SELinux is enabled • Symptom: ‘cannot restore segment prot after reloc’ (bug #45238) • Bug in Oracle library libclntsh.so (OCI) • Same bug in libocci.so (OCCI) – used by CMS, ROOT (not CORAL) • Causes may include a missing ‘-fPIC’ during the build • Bug open with Oracle Support since March 2009… • Seems to affect only local file systems (not AFS or NFS) • Several workarounds available: • On individual files: ‘chcon -t textrel_shlib_t libclntsh.so’ • System-wide as root: ‘setsebool allow_execmod=on’ • Four patches are necessary • OCI 32bit: included in first 11g release 11.2.0.1.0 • OCI 64bit: received April 2010– included in ‘11.2.0.1.0p1’ • OCCI 32bit: received March 2010– included in ‘11.2.0.1.0p1’ • OCCI 64 bit: not yet received – but not used in CORAL anyway
Open issues • Immediatepriority • Performance fixes in LFC replica service (task #9774) • Patch prepared/validated last week, new release for LHCb this week? • Non-serializable R/O transactions in reconnections (bug #65597) • CORAL server incident in ATLAS online on April 11th • ORA-24327 “need explicit attach” while reconnecting (bug #24327) • Reported several times by both CMS and ATLAS • Other items on the list • Inconsistent libexpat.so in gfal at SARA (Frontier bug #65398) • New functionalities • Partitioning (CMS, ATLAS), sequences, new COOL schema • API extensions break binary compatibility online/offline • PVSS2COOL review • CORAL server extensions (monitoring, security, R/W access…)
CORAL server incident • ATONR intervention (Oracle Streams bug) on April 11th • CORAL server got stuck and had to be restarted • Connections from CORAL server to ATONR were re-established automatically, but transactions were left in a mixed state… • General issue: transactions in DB reconnections • Fixed for RW transactions (bug #57639) • Might be the cause of COOL data corruption (bug #57487) • Fixed for serializable RO transactions (bug #57117) • Queries do not find data added after start of RO transaction • Not fixed for non-serializable RO transactions (bug #65597) • Only used in the CORAL server so far (not exposed in the API) • Queries find data added after start of RO transaction
ATLAS online-offline issues • Binary compatibility for TDAQ software • TDAQ runs both online and offline software • Using different LCGCMT versions • Requires CORAL and COOL APIs to be frozen • API extensions: a chicken-and-egg problem? • Will online and offline upgrade LCGCMT at the same time? • No urgency, but worth planning… • e.g. partitioning, sequences, new COOL schemas… • Even bug fixes (may) require changes in the API
MySQL support in CORAL? • No production users of the MySQLAccess plugin at LHC • Neither in ATLAS nor in CMS or LHCb • Maintenance has a cost (e.g. also operate the server for tests) • General feeling that it would be a pity to drop it completely • MySQLAccess was very useful in the ATLAS online system • Support for several backends is one of the strengths of CORAL • Interest from communities outside LHC too (e.g. Fermilab) • For the moment: will continue to build it, at lowest priority • It is very useful to know for sure that it is not used in production • Thanks for your feedback!
Conclusions • Most of the effort on maintenance and support • CORAL, COOL, POOL, Oracle client • New developments in parallel • CORAL server monitoring • Priorities for the immediate future • Transactions in CORAL server in database reconnections • Other issues in database reconnections (e.g. ORA-24327)