1 / 12

Persistency Framework News for ATLAS

Persistency Framework News for ATLAS. Andrea Valassi (IT-ES) For the Persistency Framework team ATLAS Database Meeting, 19 th April 2010. Outline. Recent developments and releases POOL, CORAL, COOL Oracle client Open issues and priorities

zoey
Download Presentation

Persistency Framework News for ATLAS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Persistency Framework Newsfor ATLAS Andrea Valassi (IT-ES) For the Persistency Framework team ATLAS Database Meeting, 19th April 2010

  2. Outline • Recent developments and releases • POOL, CORAL,COOL • Oracle client • Open issues and priorities • CORAL reconnections to database after a network glitch

  3. Latest release 56f (April 2010) • POOL 2.9.8 (main motivation for a new ATLAS release) • Fix for CollSplitByGuid with non-primary token (bug #63860) • Improved handling of RelationalCollection metadata • CORAL 2.3.8 (only listing the main ATLAS-related fixes) • Fix TAG problem at DESY Oracle 11g server (bug #63994) • Fix setMemoryCacheSize in OracleAccess queries (bug #64215) • Fix (CMS) for DDL/DML executed in R/O transaction (bug #17873) • COOL 2.8.6 • Fix the CLOB optimization patch: “ORA-00972 identifier is too long” (bug #64710), memory leak (bug #64835) • Fix handling of null string fields: "f1==f2 and f2==f1 return different results" (bug #62634), setNull for wrapped Attribute (bug #65100) • For full details see the release notes on Drupal • Also: port to osx10.6/64bit, gcc4.5, icc, llvm, upgrade to Xerces 3.1

  4. Oracle client – summary • ATLAS downgraded back from 11g to 10g in 56e (Feb 2010) • 11g client bug #62194on AMD Opteron quadcore at many sites • ATLAS is presently using 10.2.0.4p1 with light instant client • See also the minutes of the February 11 T1SCM • 10g client is OK for now, but 11g is desirable eventually • Light instant client (smaller memory) available on both 10g/11g • SELinux bug #45238 is ~ fixed in 11g, workaround for 10g • OCIEnvCreatebug #31554 fixed in 11g, backport in 10.2.0.4p1 • LOB MT hang bug #47435 fixed in 11g, workaround in CORAL • LOB MT crash bug #49662 solved only by workaround in CORAL • OCIStmtReleasebug #61090 only in 11g, workaround in CORAL • Apr 2010: installed 11.2.0.1.0p1 with three new patches • Fix for AMD Opteron: seems OK, waiting for ATLAS validation • SELinux fixes for OCI 64bit (and OCCI 32bit)

  5. Oracle client – details (1): AMD • Bug in 11g client on AMD Opteron quadcore • Symptom: sqlplus/CORAL crash in SHATransformI32_3() • Bug in Oracle library libnnz11.so • Blocker for Tier1 reconstruction at NDGF (bug #62194) • Also observed in Ljubljana, Chicago… • Triggered the downgrade back to 10g for ATLAS in 56e • New patch was received in April • Installed on AFS as ‘11.2.0.1.0p1’ • Basic validation by PF team in Ljubljana: seems OK at last! • Thanks to Andrej Filipcicfor the test account! • Waiting for more complete validation by ATLAS sites • If this is OK, I would suggest to upgrade back to 11g eventually…

  6. Oracle client – details (2): SELinux • Bug in 10g and 11g client on SLC5 if SELinux is enabled • Symptom: ‘cannot restore segment prot after reloc’ (bug #45238) • Bug in Oracle library libclntsh.so (OCI) • Same bug in libocci.so (OCCI) – used by CMS, ROOT (not CORAL) • Causes may include a missing ‘-fPIC’ during the build • Bug open with Oracle Support since March 2009… • Seems to affect only local file systems (not AFS or NFS) • Several workarounds available: • On individual files: ‘chcon -t textrel_shlib_t libclntsh.so’ • System-wide as root: ‘setsebool allow_execmod=on’ • Four patches are necessary • OCI 32bit: included in first 11g release 11.2.0.1.0 • OCI 64bit: received April 2010– included in ‘11.2.0.1.0p1’ • OCCI 32bit: received March 2010– included in ‘11.2.0.1.0p1’ • OCCI 64 bit: not yet received – but not used in CORAL anyway

  7. Open issues • Immediatepriority • Performance fixes in LFC replica service (task #9774) • Patch prepared/validated last week, new release for LHCb this week? • Non-serializable R/O transactions in reconnections (bug #65597) • CORAL server incident in ATLAS online on April 11th • ORA-24327 “need explicit attach” while reconnecting (bug #24327) • Reported several times by both CMS and ATLAS • Other items on the list • Inconsistent libexpat.so in gfal at SARA (Frontier bug #65398) • New functionalities • Partitioning (CMS, ATLAS), sequences, new COOL schema • API extensions break binary compatibility online/offline • PVSS2COOL review • CORAL server extensions (monitoring, security, R/W access…)

  8. CORAL server incident • ATONR intervention (Oracle Streams bug) on April 11th • CORAL server got stuck and had to be restarted • Connections from CORAL server to ATONR were re-established automatically, but transactions were left in a mixed state… • General issue: transactions in DB reconnections • Fixed for RW transactions (bug #57639) • Might be the cause of COOL data corruption (bug #57487) • Fixed for serializable RO transactions (bug #57117) • Queries do not find data added after start of RO transaction • Not fixed for non-serializable RO transactions (bug #65597) • Only used in the CORAL server so far (not exposed in the API) • Queries find data added after start of RO transaction

  9. ATLAS online-offline issues • Binary compatibility for TDAQ software • TDAQ runs both online and offline software • Using different LCGCMT versions • Requires CORAL and COOL APIs to be frozen • API extensions: a chicken-and-egg problem? • Will online and offline upgrade LCGCMT at the same time? • No urgency, but worth planning… • e.g. partitioning, sequences, new COOL schemas… • Even bug fixes (may) require changes in the API

  10. MySQL support in CORAL? • No production users of the MySQLAccess plugin at LHC • Neither in ATLAS nor in CMS or LHCb • Maintenance has a cost (e.g. also operate the server for tests) • General feeling that it would be a pity to drop it completely • MySQLAccess was very useful in the ATLAS online system • Support for several backends is one of the strengths of CORAL • Interest from communities outside LHC too (e.g. Fermilab) • For the moment: will continue to build it, at lowest priority • It is very useful to know for sure that it is not used in production • Thanks for your feedback!

  11. Conclusions • Most of the effort on maintenance and support • CORAL, COOL, POOL, Oracle client • New developments in parallel • CORAL server monitoring • Priorities for the immediate future • Transactions in CORAL server in database reconnections • Other issues in database reconnections (e.g. ORA-24327)

  12. Reserve slides

More Related