200 likes | 217 Views
Explore the current status, server setup, performance limits, and future upgrades of the Accelerator Online Operational Databases. Learn about data needs, database evolution, and performance challenges. Get insights into planned upgrades, implications for applications, and projected data volume and rates.
E N D
Status of the AcceleratorOnline Operational Databases Accelerators and Beams Department Controls Group Ronny Billen, Chris Roderick LTC – 7 March 2008
Outline • The Accelerator Online Operational Databases • Current Database Server Situation • Evolution of the Provided Services • Performance Hitting The limits • 2008: Planned Upgrade and Migration • Implications, Policy and Constraints for Applications • Logging Data : Expected Vs Acceptable • The Future • Conclusions LTC - Controls session - Databases
The Accelerator Online Operational Databases • Data needed instantaneously to interact with the accelerator • Database is between the accelerator equipment and the client (operator, equipment specialist, software developer) • Many database services, including APIs, and applications • LSA – Accelerator Settings database • MDB – Measurement database • LDB – Logging database • CCDB – Controls Configuration • E-Logbook – Electronic Logbooks • CESAR – SPS-EA Controls • LASER – Alarms database • TIM – Technical Infrastructure Monitoring database • 3-tier deployment of services for resource optimization Client Application Server Database Server LTC - Controls session - Databases
Current Database Server Situation • Technical • 2-node cluster SUN Fire V2402 x {single core 1GHz CPU, 4GB RAM, 2 x 36GB disks, 2 PS} • External Storage9TB RAID 1+0 / RAID 5 mirrored & striped (~60% usable) • History • Purchased original setup: March 2004 • Purchased extra disks: October 2006 • Main accounts - data • Logging: LHC HWC, Injectors, Technical Services • Measurements: LHC HWC, Injectors • Settings: LSA for LHC, SPS, LEIR, PS, PSB, AD • Today’s specifics • 150 simultaneous user sessions • Oracle data-files 4.7 TB, SUNLHCLOG Often referred to as the “LHC Logging Database” LTC - Controls session - Databases
Current Database Server Situation • Technical • Server SUN E420R {450MHz CPU, 4GB RAM, 2x36GB disks} • External Storage 218GB • History • Installed in January 2001 • Main accounts - data • AB-Controls, FESA, CMW, RBAC, OASIS • CESAR, PO-Controls, INTERLOCK • e-Logbooks, ABS-cache • Historical SPS and TZ data • LSA Test • Today’s specifics • 200-300 simultaneous user sessions • Oracle data-files 32GB SUNSLPS Often referred to as the “Controls Configuration Database” LTC - Controls session - Databases
Evolution of the Provided Services • LSA Settings: operationally used since 2006 • Deployed on SUNLHCLOG to get best performance • Used for LEIR, SPS, SPS & LHC transfer lines, LHC HWC • Continuously evolving due to requirements from LHC and PS • Measurement Service: operationally used since mid-2005 • Satisfying central short-term persistence for Java clients • Provides data filtering and transfer to long-term logging service • Generates accelerator statistics • Increasingly used for complete accelerator complex • Logging Service: operationally used since mid-2003 • Scope extended to all accelerators, technical data of experiments • Equipment expert data for LHC HWC: accounts for >90% volume • Largest consumer of database and application server resources LTC - Controls session - Databases
Evolution of the Logging – Data Volume LTC - Controls session - Databases
Evolution of the Logging – Data Rates CIET CRYO QPS LTC - Controls session - Databases
Performance Hitting The Limits • I/O Limits • I/O subsystem is used for reading and writingdata • Recent samples: 4 to 37 clients waiting for I/O subsystem No of active sessions waiting for I/O subsystem LTC - Controls session - Databases
Performance Hitting The Limits • CPU Limits • CPU is always needed to do anything: • Data writing and extraction • Data filtering (CPU intensive) and migration from MDBLDB • Exporting archive log files to tape, Incremental back-ups • Migrating historic data to dedicated read-only storage • Hitting the I/O limits burns CPU Percentage of CPU used on I/O wait events LTC - Controls session - Databases
Performance Hitting The Limits • Storage Limits • Pre-defined allocated data-files difficult to manage (due to size) • Monthly allocations always insufficient (necessary) • Archive log file size insufficient (when backup service down) Storage Utilisation LTC - Controls session - Databases
2008: Planned Upgrade and Migration Separate into 3 high-availability database services • Deploy each service on a dedicated Oracle Real Application Cluster • Settings & Controls Configuration (including logbooks) • Highest-availability, Fast response • Low CPU usage, Low disk I/O • ~20GB data • Measurement Service • Highest-availability • CPU intensive (data filtering MDBLDB), Very high disk I/O • ~100GB (1 week latency) or much more for HWC / LHC operation • Logging Service • High-availability • CPU intensive (data extraction), High disk I/O • ~10TB per year LTC - Controls session - Databases
CTRL CTRL CTRL CTRL 2008: Planned Upgrade and Migration Additional server for DataGuard testing: Standby database for LSA Oracle RAC 1 Oracle RAC 2 Oracle RAC 3 11.4TB usable 2 x quad-core 2.8GHz CPU 8GB RAM Clustered NAS shelf14x146GB FC disks LSA Settings Controls Configuration E-Logbook CESAR Measurements HWC Measurements Logging Clustered NAS shelf14x300GB SATA disks LTC - Controls session - Databases
2008: Planned Upgrade and Migration • Dell PowerEdge 1950 Server specifications: • 2x Intel Xeon quad-core 2.33 GHz CPU • 2x 4 MB L2 cache • 8GB RAM • 2x power supplies, Network cards (10Gb Ethernet), 2x 72GB system disks • NetApp Clustered NAS FAS3040 Storage specifications: • 2x disk Controllers (support for 336 disks (24 shelves)) • 2x disk shelves (14x 146GB Fibre Channel 10,000rpm) • 8GB RAM (cache) • RAID-DP • Redundant hot-swappable: controllers, cooling fans, power supplies, optics, and network cards • Certified >3000 I/O per second LTC - Controls session - Databases
2008: Planned Upgrade and Migration launched Sep-2007 launched Oct-2007 arrived at CERN Nov-2007 arrived at CERN Jan-2008 ordered Jan-2008 stress-tested Jan-2008 liberated Feb-2008 fully installed 7-Mar-2008 installed, configured 14-Mar-2008 deployed (AB/CO/DM) ready for switch-over (1-day stop) 21-Mar-2008? (later) (Sep-2008) • Purchase order for storage (2/11) • Purchase order for servers (7/122) • NetApps NAS storage shelves • Dell servers • Additional mounting rails for servers • Servers • Rack space • Server and storage • Oracle system software • Database structures • Database services • Switch to services of new platform • Migration of existing 5TB logging data to new platform • Purchase additional logging storage for beyond 2008 LTC - Controls session - Databases
Implications, Policy and Constraints for Applications Foreseen for all services, already implemented for a few: Implications • All applications should be cluster-aware • Database load-balancing / fail-over (connection modifications) • Application fail-over (application modifications) Policy • Follow naming conventions for data objects Constraints • Use APIs for data transfer (no direct table access) • Enforce controlled data access • Register authorized applications (purpose, responsible) • Implement application instrumentation • Provide details of all database operations (who, what, where) LTC - Controls session - Databases
Logging Data: Expected Vs Acceptable • Beam related equipment starting to produce data • BLM • 6,400 monitors * 12 * 2(losses & thresholds) + crate status = ~154,000 values per second (filtered by concentrator & MDB) • XPOC • More to come… • Limits • Maximum: 1 Hz data frequency in Logging database • Not a data dump • Consider final data usage before logging – only log what is needed • Logging noise will have a negative impact on data extraction performance and analysis LTC - Controls session - Databases
The Future Logging Data • Original idea keep data available online indefinitely • Data rates estimated ~10TB/year • Closely monitor evolution of storage usage • Order new disks for 2009 data (in Sept 2008) • Migrate existing data (~4TB) to new disks Service Availability • New infrastructure has high-redundancy for high-availability • Scheduled interventions will still need to be planned • Use of a standby database will be investigated, with the objective of reaching 100% uptime for small databases LTC - Controls session - Databases
Conclusions • Databases play a vital role in the commissioning and operation of the Accelerators • Database performance and availability have a direct impact on operations • Today, the main server SUNLHCLOG is heavily overloaded • Based on experience, and the evolution of existing services, the new database infrastructure has been carefully planned to: • Address performance issues • Provide maximum availability • Provide independence between the key services • Scale in function of data volumes, and future requirements • The new database infrastructure should be operational ahead of injector chain start-up and LHC parallel sector HWC LTC - Controls session - Databases