160 likes | 320 Views
Persistency Framework News for ATLAS. Andrea Valassi (IT-ES) For the Persistency Framework team ATLAS Database Meeting, 14 th March 2012. Outline and summary. COOL validation on Oracle 11g Recent releases ( since Oct 2011 talk at ATLAS sw week )
E N D
Persistency Framework Newsfor ATLAS Andrea Valassi (IT-ES) For the Persistency Framework team ATLAS Database Meeting, 14th March 2012
Outline and summary • COOL validation on Oracle 11g • Recent releases (since Oct 2011 talk at ATLAS sw week) • LCG 60x (COOL, CORAL, POOL): oldest supported version • LCG 61x (COOL, CORAL, POOL): 2012 production • Maintain binary compatibility to COOL 2.8 and CORAL 2.3 • LCG 62x (COOL, CORAL; no POOL): 2012 development • Eventually break binary compatibility in COOL 2.9 and CORAL 2.4 • WLCG Technical Evolution Groups • DB (CORAL/COOL) and DM (POOL) TEGs • Work in progress
COOL on Oracle 11g servers • Validation of COOL performance has been completed • Problem seen in Oct 2011 (task #23366) is now understood to be caused by a bug in Oracle 11.2.0.2 server (Oracle bug 10405897) • Confirmed by enabling/disabling Oracle patch in a private 11.2.0.2 DB • Introduced in 11.2.0.2 (was not there in 11.2.0.1!) • Earliest COOL tests on 11g had seen no issue because they used 11.1.0.7… • Bad news: even a minor server patch can break performance! • Fixed in 11.2.0.3 – good news: we should no longer worry now • No need to downgrade to 10.2.0.5 optimizer (see my Oct 2011 talk) • Performance on 11.2.0.3 is as good as on 10.2.0.5 • Note that the corresponding best execution plans look different (but most likely the algorithm is exactly the same – only their display differs) • Thanks a lot to IT-DB and ATLAS DBAs for their help! 11.2.0.2 11.2.0.3
COOL performance tests on Oracle • Test script was improved and procedure was documented • See https://twiki.cern.ch/twiki/bin/view/Persistency/CoolPerformanceTests • A detailed performance report can be created with one command • Covering 9 common use cases for querying COOL data • Showing queries, hints, performance plots and execution plans • e.g. https://twiki.cern.ch/twiki/pub/Persistency/CoolPerformanceTests/ALL-11.2.0.3-full.pdf 11.2.0.3 11.2.0.2
10g vs. 11g execution plans • With hints: only one (good) plan on 10.2.0.5 and 11.2.0.3 • Plan looks different but is probably exactly the same algorithm • Using INDEX RANGE SCAN (MIN/MAX) • With hints: 3 (bad) plans on 11.2.0.2 • Depending on statistics and bind variables (no exec plan stability!) • All of them are bad and involve INDEX FULL SCAN 11.2.0.3 (good) 10.2.0.5 (good) 11.2.0.2 (bad) – 1st of 3 plans
Other issues on 11g servers • New discoveries and switches/defaults in CORAL & COOL • Disabled SQL plan baselines by default in CORAL • Disabled adaptive cursor sharing in COOL queries • COOL uses hints to stabilize execution plans – the two features above instead were found to lead to very confusing results… • One problem specific to nightly tests has also been fixed • ORA-01466 errors when querying (in RO transactions) test tables that have just been created (bug #87935) • Also present in 9i and 10g servers, more frequent in 11g servers • Workaround: sleep 1s if DDL on the test tables has just happened
Oracle client – “11.2.0.1.0p3” • Many bugs relevant to CORAL are fixed in 11.2.0.1.0p3 • Production version for ATLAS and LHCb since June 2011 • See /afs/cern.ch/sw/lcg/external/oracle/11.2.0.1.0p3/doc/README_11.2.0.1.0p3.txt • SELinux issues on SLC5 (bug #45238) • 11.2.0.1 libraries rebuilt with Oracle patches (completed in 11.2.0.1.0p2) • Crash on AMD Opteron quadcore (bug #62194) • 11.2.0.1 libraries rebuilt with Oracle patch (as of 11.2.0.1.0p2) • Remove ~/oradiag directory dump (bug #58917) and workaround for bug in Oracle libraries redefining Kerberos5 symbols (bug #76988) • Custom sqlnet.ora configuration file (completed in 11.2.0.1.0p3) • A newer client 11.2.0.3 is available but I did not test it yet • IIRC I had tested 11.2.0.2 but SELinux and AMD fixes were missing • Upgrading to the 11.2.0.3 client is on my (low priority) to-do list • If I discover that the patches are not there, I would stick to 11.2.0.1.0p3 rather than reapplying the patches on top of 11.2.0.3.0 (to be discussed)
LCG 61b for LHCb (Oct 2011) • Motivation: urgent Xrootd bug fix in ROOT (5.30.04) • Not sure if this was used by ATLAS (that requested 61c later on) • POOL 2.9.18 • Minor fixes in PersistencySvc • CORAL 2.3.19 • Minor fixes to help in the analysis of Oracle 11g server performance • Environment variable CORAL_ORA_OPTIMIZER_FEATURES • COOL 2.8.11a • Rebuild of previous COOL 2.8.11 (for ATLAS in LCG 61a) • For full details see the release notes on TWiki
LCG 60e for ATLAS (Nov 2011) • Motivation: urgent CINT bug fix in ROOT (5.28.00h) • Essentially no other changes • POOL 2.9.16a • Rebuild of previous POOL 2.9.16 (for ATLAS in LCG 60d) • CORAL 2.3.17a • Rebuild of previous CORAL 2.3.17 (for ATLAS in LCG 60d) • COOL 2.8.10c • Rebuild of previous COOL 2.8.10b (for ATLAS in LCG 60d) • For full details see the release notes on TWiki
LCG 62 for ATLAS (Dec 2011) • Motivation: major upgrade in ROOT (5.32) and Boost (1.48) • This is the first release without POOL! • First release on gcc46 (on SLC5 – prepare for the move to SLC6) • Complete 11g move (and other changes) in CORAL and COOL • CORAL 2.3.20 • Useful changes to improve analysis on 11g servers • Disable SQL plan baselines by default (unless CORAL_ORA_USE_SQL_PLAN_BASELINES is set) • Allow selective control over optimizer features and bug fixes by the CORAL_ORA_FIX_CONTROL environment variable • Fixes in the simple expression parser • COOL 2.8.12 • Useful changes to improve analysis on 11g servers • Environment variable COOL_ORA_OPTIMIZER_FEATURES • Disable adaptive cursor sharing (add the NO_BIND_AWARE hint) • For full details see the release notes on TWiki
LCG 61c for ATLAS (Dec 2011) • Motivation: urgent CINT bug fix in ROOT (5.30.05) • Complete port to 11g in CORAL and COOL (rebuild LCG62 tags) • POOL 2.9.19 • Fixes and improvements in collection packages • CORAL 2.3.20a • Rebuild of previous CORAL 2.3.20 (for ATLAS in LCG 62) • COOL 2.8.12a • Rebuild of previous COOL 2.8.12 (for ATLAS in LCG 62) • For full details see the release notes on TWiki
LCG 62a for ATLAS (Dec 2011) • Motivation: fix LCG62 installation procedure • Also move to frontier client 2.8.5 (with several bug fixes) • CORAL 2.3.20 • No rebuild, move last good installation (for ATLAS in LCG 62) • COOL 2.8.12 • No rebuild, move last good installation (for ATLAS in LCG 62) • For full details see the release notes on TWiki
CORAL 2.3.21 for CMS (Feb 2012) • Upgrade from CORAL 2.3.12 • Pick up many changes prepared during the last ~one year • CORAL 2.3.21 • Fix for ORA-25408 during a transaction rollback (bug #87164) • More fixes in the simple expression parser (bug #91075) • Please remember: the usage of lowercase names and of reserved Oracle words (e.g. SELECT) as column names is strongly discouraged! • Fix memory leaks in OracleAccess • For full details see the release notes on TWiki
WLCG TEGs • Data Management TEG – POOL support • LHCb (like CMS previously) is essentially no longer using POOL • Replaced by direct ROOT; only Gaudi (not for long) still needs POOL • ATLAS will no longer need POOL support as of LCG62, where a custom package derived from it is built and maintained by ATLAS • Database TEG – CORAL and COOL usage and support • Review of conditions data handling in ATLAS, CMS and LHCb • Survey of COOL usage in ATLAS will be useful (AndreaF’s talk) • Experiment requests for COOL, CORAL and CoralServer support • More details in Dario’s talk
Other issues and work in progress • More work on CORAL connection management • Work on network glitch handling should converge quite soon • Better understanding of CORAL interaction with Oracle TAF (Transparent Application Failover) – e.g. ORA-25408 errors • Port to gcc46 and clang30 on SLC6 is now complete • Will eventually be included in the releases • Cleaning up CORAL and (mainly) COOL API extensions • Will be in LCG62x – or more likely in LCG63 with ROOT 5.34 • Not for ATLAS production usage in 2012 • Example: COOL “vector payload”
Conclusions • COOL migration to 11g servers is now complete • Performance affected by bug in 11.2.0.2, fixed in 11.2.0.3 • Detailed performance reports can now be easily produced • Several COOL and CORAL releases • LCG61 will be the ATLAS production version in 2012 • LCG62 will be the ATLAS development version in 2012 • With API extensions in CORAL & COOL; and without POOL • WLCG Technical Evolution Groups • DM: agreement to move POOL to ATLAS as of LCG62 • DB: review of usage and support model for CORAL & COOL