290 likes | 413 Views
Physics Services Support. Relational Databases for the LHC Computing Grid The LCG Distributed Database Deployment (3D) and Conditions Database (COOL) Projects. Andrea Valassi (CERN IT-PSS-DP) NEC2007, Varna, Bulgaria 14 th September 2007. Acknowledgements.
E N D
Physics Services Support Relational Databases for the LHC Computing GridThe LCG Distributed Database Deployment (3D)and Conditions Database (COOL) Projects Andrea Valassi (CERN IT-PSS-DP) NEC2007, Varna, Bulgaria 14th September 2007
Acknowledgements • Several people have ‘lent’ me their slides or contributed useful suggestions for this talk • Dirk Duellmann and the 3D team • Maria Girone and the CERN Physics DB team • The COOL and CORAL teams • Several users in the experiments • Many thanks to all of them! 3D and COOL - 2
Outline • Relational databases for LHC computing • Reliable services at CERN and other LCG sites • The 3D project: distributed database deployment • COOL and other conditions data • COOL development and deployment status • Conclusions • Relational databases for LHC computing • Reliable services at CERN and other LCG sites • The 3D project: distributed database deployment • COOL and other conditions data • COOL development and deployment status • Conclusions 3D and COOL - 3
Relational databases for LHC • In LHC computing, relational databases will be crucial to store metadata of both physics applications and grid services • Detector conditions (calibration, geometry…) • Experiment data production bookkeeping • Core grid services for cataloguing, monitoring and distributing LHC data (e.g. LFC file catalog) • Key features of relational db services • High availability, backup and recovery, performance and scalability, security… 3D and COOL - 4
The 3D Project • Distributed Database Deployment • The LCG provided initially tools for distributed access andreplication of file-based data • The aim of the 3D project is to provide a similar infrastructure for data stored in RDBMS services • Experience in running RDBMS services at CERN and several other LCG sites already since a long time • Goals of the 3D project as part of LCG • Increase database availability and scalability • Allow applications to access databases in a consistent and location-independent way • Provide database replication between sites • Coordinate the setup and deployment of the database and replication infrastructure 3D and COOL - 5
3D Service Architecture 3D and COOL - 6
Building block – db cluster • CERN db services use Oracle 10g RAC • High availability – redundant storage and network • Scalability – for CPUs and storage independently • Cost reduction – commodity hardware on Linux • Homogeneous h/w and s/w setup for all physics DBs • Similar setup is used by most T1 sites as well 3D and COOL - 7
Development service Validation service Production service Physics DB services at CERN • Size of Oracle services for physics • 110 mid-range servers, 110 disk arrays • i.e. 220 CPUs, 440 GB RAM, 300 TB disk space • Several production clusters • One offline RAC per LHC experiment (up to 8 nodes), Atlas online RAC, COMPASS RAC • In addition: development and validation services • Development and validation services too • Application release cycle 3D and COOL - 8
Read-only access to Oracle data via http Oracle server at T0 Tomcat server at T0 Squid web cache at T0/T1/T2 Frontier used in CMS Under evaluation in Atlas (integrated in Coral/Cool) Successfully tested in CMS CSA’06, many improvements in 2007 CMS are confident that they have ways to avoid stale-cache issues Frontier and CMS 3D and COOL - 9
Replication – Oracle Streams(Capture, Propagation, Apply) Barbara Martelli, INFN T1/T2 Workshop, Nov. 2006 3D and COOL - 10
Replication – T0 to T1 • CERN data are replicated to ten T1 sites • Streams used by Atlas (10 T1) and LHCb (6 T1) • More details in the slides about COOL deployment • The present setup can sustain 2 GB/day to T1 • This is the Atlas requirement for COOL user data 3D and COOL - 11
Streams downstream capture • This technology provides isolation of the source database against problems with the network or with the destination databases • In 3D, this shields the CERN T0 services from problems in the replication to T1 sites • The redo log retention on the downstream database is optimized (e.g. 5 days) to allow for re-synchronisation without recall from tape 3D and COOL - 12
Replication – online to offline • Streams used by Atlas, LHCb and CMS • For LHCb offline to online too (see COOL slides) • Work in progress with Atlas to test replication of the full PVSS archive • Allow detector expert analysis without impacting the performance of the online production server • Data rates (6 GB/day) much higher than COOL • Tests over the last two months are promising 3D and COOL - 13
3D service operation • DB service level according to WLCG MoU • At T0: piquet service being set up to replace current 24x7 best-effort operation • Streams interventions 8x5 for now • At T1: need more experience to confirm coverage • Some policies proposed by CERN T0 have been accepted also by the T1 sites • Backup and recovery (Oracle RMAN) • Security patch application (frequency, procedure) • Database and Streams monitoring, usage reports • Integration with WLCG procedures • GGUS tickets, intervention announcement 3D and COOL - 14
Outline • Relational databases for LHC computing • Reliable services at CERN and other LCG sites • The 3D project: distributed database deployment • COOL and other conditions data • COOL development and deployment status • Conclusions 3D and COOL - 15
What are conditions data? • Non-event detector data that vary with time • And may also exist in different versions • Data produced both online and offline • Geometry, detector control, alignment, calibration... • Data used for event processing and more • Detector experts • Alignment and calibration • Event reconstruction and analysis 3D and COOL - 16
CondDB in the 4 experiments • ALICE • Alice-specific software for time/version handling • ROOT files with AliEn file catalog • ALICE-managed deployment (AliEn MySQL at T0) • CMS • CMS-specific software for time/version handling • Oracle (via POOL-ORA) with Frontier web cache • 3D/CMS deployment: Oracle/Frontier (T0), Squid (T1/T2) • ATLAS and LHCb • COOL common software for time/version handling • Common development of Atlas, LHCb and CERN IT • Oracle, MySQL, SQLite, Frontier (via COOL API) • 3D/Atlas/LHCb deployment: Oracle (T0/T1) with Streams • ALICE • Alice-specific software for time/version handling • ROOT files with AliEn file catalog • ALICE-managed deployment (AliEn MySQL at T0) • CMS • CMS-specific software for time/version handling • Oracle (via POOL-ORA) with Frontier web cache • 3D/CMS deployment: Oracle/Frontier (T0), Squid (T1/T2) • ATLAS and LHCb • COOL common software for time/version handling • Common development of Atlas, LHCb and CERN IT • Oracle, MySQL, SQLite, Frontier (via COOL API) • 3D/Atlas/LHCb deployment: Oracle (T0/T1) with Streams 3D and COOL - 17
COOL software overview • Consistent approach to many use cases • Single-version (DCS) and multi-version (calib/align) • Technology-neutral C++ API • API is not relational - no direct SQL user access • Same user code can be used on all backends • Maximize reuse of other LCG AA software • CORAL and SEAL for C++ implementation • ROOT/Reflex for python bindings (PyCool) • Single relational implementation via Coral • Same code for Oracle, MySQL, SQLite, Frontier • Same relational schema for all backends • Emphasis on read and write performance • Best practices (bulk operations, bind variables) • Detailed performance studies and optimizations 3D and COOL - 18
COOL relational implementation • Modeling of condition data “objects” • System-managed common “metadata” • Data items: many tables, each with many “channels” • Interval of validity - IOV: since, until • Versioning information with handling of interval overlaps • User-defined schema for “data payload” • Support for simple C++ types 3D and COOL - 19
Development summary • Milestones • COOL 1.0 released in April 2005 • Basic functionality (development started in Nov. 2004) • COOL 2.0 released in January 2007 • Major backward-incompatible API and schema changes • Current focus is performance optimization • Separate optimizations for different use cases • Several performance issues solved in 2007 • Feedback from and for Atlas/LHCb stress tests • Work in progress also on support for new platforms and a few functional enhancements 3D and COOL - 20
COOL data distribution • Replication at the database backend level • Oracle Streams (see next slides) • Cross-technology replication is possible (same schema for all backends), not really attempted yet • Oracle remote access via Frontier • Under evaluation in Atlas • Replication tools based on the COOL API • Static (copy once) or dynamic (copy then update) • Data slicing/selection is also possible • Cross-technology replication is possible • Many use cases for SQLite files in Atlas and LHCb 3D and COOL - 21
Deployment in LHCb • Computing model • Reconstruction at T0/T1 • Only MC prod at T2 • COOL stores only conditions data for event reconstruction • Oracleat PIT, T0, T1 with replication viaStreams • Geometry and conditions for MC sent to T2 as SQLite file • Online db master at PIT • Replicated forward to T0 and T1 via Streams • Data from PVSS processes • Offline db master at T0 • Replicated back to PIT and forward to T1 via Streams • Data computed in offline calibration/alignment jobs (Marco Clemencic, COOL meeting 3 July 2006) COOL 3D and COOL - 22
Deployment in Atlas • Largest COOL data set comes from DCS • Via the PVSS2COOL data transfer (1.5 GB/day) • From the online RAC in the T0 computer centre • For offline reconstruction and detector experts • Many options open for T2 replication • Many use cases (simulation, calibration, analysis) • Static/dynamic replication to sqlite/mysql, Frontier (Florbela Viegas, CHEP 2007) 3D and COOL - 23 3D and COOL - 23
COOL deployment status • The T0 setup is (almost) complete • The LHCb online server is being set up these days • Atlas and LHCb T1 sites are all connected • SARA, RAL, PIC, IN2P3, Gridka, CNAF (both) • Plus Nordugrid, Triumf, BNL, Taiwan (Atlas only) • Distributed tests underway in both experiments Much larger data rates in ATLAS! 3D and COOL - 24 COOL Status - 24
Atlas scalability tests (1) 3D and COOL - 25
Atlas scalability tests (2) 3D and COOL - 26
Outline • Relational databases for LHC computing • Reliable services at CERN and other LCG sites • The 3D project: distributed database deployment • COOL and other conditions data • COOL development and deployment status • Conclusions 3D and COOL - 27
Conclusions • The 3D project has set up a world-wide distributed database infrastructure for LHC • This is one of the largest distributed deployments of the Oracle database worldwide (over 100 nodes at CERN and a few nodes at each of ten T1 sites) • T0/T1 are ready for ramp-up to LHC production • The COOL software is used by both Atlas and LHCb to store their conditions data • COOL deployment is one of the largest users of 3D • First results from Atlas scalability tests confirm that resources allocated should match required #jobs/h 3D and COOL - 28
For more information • Physics database services at CERN • http://cern.ch/phydb • The 3D project • https://twiki.cern.ch/twiki/bin/view/PSSGroup/LCG3DWiki • The COOL project • http://cern.ch/cool • The CORAL project • http://pool.cern.ch/coral 3D and COOL - 29