420 likes | 437 Views
This presentation discusses the LHC Logging Service, which captures and stores time-series data from the Large Hadron Collider (LHC), the world's largest scientific instrument. It covers the architecture, design, implementation, and database techniques used, as well as lessons learned from managing and analyzing the immense data volumes. The LHC Logging Service plays a crucial role in monitoring and understanding the behavior of the LHC, enabling groundbreaking scientific research.
E N D
Capturing, storing and using time-series data for the world's largest scientific instrumentThe LHC Logging Service By Chris Roderick UKOUG Conference 2006, Birmingham
Outline • Introduction to CERN • The LHC Logging Project • LHC Logging Service Architecture • LHC Logging Service Design & Implementation • Database Techniques Used • Lessons Learned • Conclusion • Questions The LHC Logging Service - UKOUG Conference 2006
CERNEuropean Organization for Nuclear Research • Founded in 1954 by 12 countries • Today: 20 member states • CERN does pure scientific research into the laws of nature • Providing to >7000 users (particle physicists) from all over the world: • Acceleratorsaccelerate particles to almost the speed of light • Detectors make the particles visible • The birthplace of the World Wide Web The LHC Logging Service - UKOUG Conference 2006
PS accelerator complex France Switzerland SPS accelerator LHC accelerator Geneva Airport CERN CERN 2nd site (F) CERN Main site (CH) The LHC Logging Service - UKOUG Conference 2006
The Large Hadron Collider (LHC) • Most powerful instrument ever built to investigate particles properties • 27km circumference, 100m underground • 4cathedral size underground caverns hosting huge detectors • The highest energy of any accelerator in the world • The most intense beamsof colliding particles • Operating at a temperature just above absolute zero The LHC Logging Service - UKOUG Conference 2006
Outline • Introduction to CERN • The LHC Logging Project • LHC Logging Service Architecture • LHC Logging Service Design & Implementation • Database Techniques Used • Lessons Learned • Conclusion • Questions The LHC Logging Service - UKOUG Conference 2006
The LHC Logging Project • Need to know what’s going on in and around the LHCLog heterogeneous time series data:Cryogenics temperatures, magnetic field strengths, power dissipation, vacuum pressures, beam intensities and positions…etc… • Integration with other services (e.g. alarms): • Coherentdata time-stamping • Consistent identification of the data • Oracle has a long history at CERN (since 1983 version 2.3) • Decision to go with latest available Oracle technology: • Oracle 9i Database • Oracle Application Server • Oracle Enterprise Manager • Logging Project started in 2001 • First operational implementation used in autumn 2003 The LHC Logging Service - UKOUG Conference 2006
The LHC Logging Project • Exponential increase in data volumes • Stabilise after 1st year of LHC operation ~5TB per year The LHC Logging Service - UKOUG Conference 2006
Outline • Introduction to CERN • The LHC Logging Project • LHC Logging Service Architecture • LHC Logging Service Design & Implementation • Database Techniques Used • Lessons Learned • Conclusion • Questions The LHC Logging Service - UKOUG Conference 2006
Measurement Database • Contain raw time series data, at up to 2Hz rate • Store data for 7 days • Derive new values based on measured values • Sends filtered data of interest to Logging DB • Generate statistics on accelerator performance Logging Database • Contain relatively slow time series data • Store data online for at least 20 years Today’s Data Rates Oracle RAC Database Host • 2 x SUN Fire V240 • Dual 1 GHz CPU • 4 GB RAM • Dual internal disks • Dual power supply • SUN StorEdge 3510FC disk array • 2-Gb fiber channel bus • Dual RAID controllers 1 GB cache • 12 x 146 GB 10k rpm FC disks • 24 x 300 GB 10k rpm FC disks (2007) • SUN Solaris 9 • Veritas Volume Manager 3.5 • Oracle9i Enterprise Edition Release 9.2.0.7.0 JDBC Measurement DB JDBC JDBC PL/SQL HTTP JDBC JDBC JDBC Generic Usage Logging DB Oracle Application Server hosts • HP Proliant DL380 • Dual 2.8 GHz CPU • 2.5 GB RAM • Dual power supplies and Ethernet Cards • RedHat Linux ES release 3 • Oracle Application Server 10g (9.0.4.3.0) PL/SQL HTTP Custom Java Applications 10g AS Technical Services DB <Data Source> Architecture 60,000,000 records / day LHC Logging Service 4,000,000 records / day 135,000 records / day Java Data Reading • Oracle Specific JDBC 3.0 • OraclePreparedStatement • Statement caching Java Data Loading • Oracle Specific JDBC 3.0 • OraclePreparedStatement • Batch Inserts • Statement caching 40,000,000 records / day The LHC Logging Service - UKOUG Conference 2006 XML Data Sending • SAX parser extracts data as objects • Java API loads data PL/SQL Data Loading • Intermediate ‘data loading’ account • Database link • PL/SQL collections • PL/SQL bulk operations: • BULK COLLECT • FOR ALL
Outline • Introduction to CERN • The LHC Logging Project • LHC Logging Service Architecture • LHC Logging Service Design & Implementation • Database Techniques Used • Lessons Learned • Conclusion • Questions The LHC Logging Service - UKOUG Conference 2006
Time Series Data Metadata Tables • Variables for which time series data will be logged • Users • User-Variable relations • Variable groupings Time series Tables • Index-Organized • Range Partitioned • Tablespace per range • Timestamp data type • Table per data type Audit & Error Tables • Error definitions • Error logging • Administrator notifications • Audit data: • Who did what • When • How long it took Range Partitioned VARRAY of NUMBER The LHC Logging Service - UKOUG Conference 2006
P-8 P-7 P-6 P-5 P-4 P-3 P-2 P-1 P P+1 P+2 P+3 P+4 P+5 Partitioned Measurement Data • Time series data tables in Measurement Database, also implemented as range-partitioned IOTs • Benefit for administration and query performance • 7 days of data are kept table may contain several billion rows • Data is organized in daily range partitions • Daily DBMS_JOB executes a procedure: • adding a new partition • dropping oldest partition, using UPDATE GLOBAL INDEXES(avoids the need to rebuild the global table index as an offline operation) Range-Partitioned IOT Time Series Data Table Today’s Data Add Drop The LHC Logging Service - UKOUG Conference 2006
Automated Tasks • Check available storage(used space and partition creation) • Check for broken jobs • Gather CBO statistics • Calculating system usage statistics: • No. records logged, per user, per data type during the previous day • No. of repeating values SELECTvariable_id, count(*) repeating_cnt FROM ( SELECTvariable_id,value, LAG(value)OVER(PARTITION BY variable_idORDER BY utc_stamp)last_value FROM data_numeric WHERE variable_id IN( SELECT variable_id FROM meta_variable WHERE user_name =p_user_name AND datatype_name =l_dn_datatype ) AND utc_stamp >=p_start_stamp AND utc_stamp <p_end_stamp ) WHEREvalue = last_value GROUP BYvariable_id; SELECT COUNT(*) repeating_dn_cnt FROM( SELECTvalue, LAG(value)OVER(PARTITION BY variable_idORDER BY utc_stamp)last_value FROM data_numeric WHERE variable_id IN( SELECT variable_id FROM meta_variable WHERE user_name =p_user_name AND datatype_name =l_dn_datatype ) AND utc_stamp >=p_start_stamp AND utc_stamp <p_end_stamp ) WHEREvalue = last_value; The LHC Logging Service - UKOUG Conference 2006
Data Extraction • Correlation of logged data from different systems key functional requirement • Dedicated Java Web applications based on the Apache Struts framework provide: • Time series data statistics • Data extraction to file, in various formats • Data visualization in interactive charts The LHC Logging Service - UKOUG Conference 2006
Time Scaling • Need to correlate asynchronous data from different systems • Scale-upsummary information from finer granularity information • Scale-downfiner granularity information from coarser granularity information • Since 9i, no built in PL/SQL time scaling (formerly TimeSeries cartridge) • Time-scaling functions implemented in PL/SQL using analytic functions LAG and LEAD: • Scale-down Repeat • Scale-down Interpolate • Scale-up MIN/MAX/AVG The LHC Logging Service - UKOUG Conference 2006
Time Scaling • Accessible via PL/SQL functions, returning a TABLE (user defined SQL Type) containing the time scaled data • TABLE data is returned row by row using the PIPELINED clause (no need to wait for the complete set of time scaled data to be derived) • Example SCALEUP Function usage: SELECT stamp, min_value, max_value, avg_value FROM TABLE( TIME_SCALE.SCALEUP( 24288,-- ID of variable for which data is to be scaled 1,-- No. of intervals between generated timestamps 'HOUR',-- Type of interval between generated timestamps '2006-08-22 00:00:00',-- Desired start time for generated data '2006-08-22 23:59:59',-- Desired end time for generated data 'utc_stamp',-- Name of column containing the raw timestamps 'value',-- Name of column containing the raw values 'data_numeric',-- Name of table containing time series data 'variable_id' -- Name of column containing variable ID ) ) ORDER BY stamp; The LHC Logging Service - UKOUG Conference 2006
Show Time Scaling in TIMBER The LHC Logging Service - UKOUG Conference 2006
Exception Handling • Extremely important – deals with abnormal situations • System administrators quickly aware of problems • Event and conditions which caused problem are captured • Allows APIs to behave in a predictable manner • Logging Service uses an adaptation of the exception handling framework proposed by S. Feuerstein in Oracle Magazine (May – August 2003) • Deals with: • Exception definitions (name, description, severity) • Exception capturing (what, who, when, parameter values) • Exception treatment (log, mask, raise) • Administrator notifications (depending on severity) The LHC Logging Service - UKOUG Conference 2006
Outline • Introduction to CERN • The LHC Logging Project • LHC Logging Service Architecture • LHC Logging Service Design & Implementation • Database Techniques Used • Lessons Learned • Conclusion • Questions The LHC Logging Service - UKOUG Conference 2006
Database Techniques Used • MERGE • Used to simultaneously insert and update batches of metadata • Used for time series data loading on ORA-00001 events • Combined with UPDATE triggers – capture client software or hardware problems without fatal exceptions MERGE INTO meta_variables USING ( SELECT:var_namevn,:var_descvd,:var_unitsvu FROM dual ) ON (variable_name =vn) WHEN MATCHED THEN UPDATE SET description = vd, unit = vu WHEN NOT MATCHED THEN INSERT ( variable_id, variable_name, description, units ) VALUES ( meta_variable_seq.nextval,vn,vd,vu ); The LHC Logging Service - UKOUG Conference 2006
Database Techniques Used • SQL Types • Used as tools for loading and querying data • Custom time scaling functions return SQL table types • INSERT – SELECT statements based on data in SQL table types CREATE TYPEtable_of_varcharAS TABLE OF VARCHAR2(4000); SELECTcolumn_value||' unregistered' FROM TABLE( CAST(:p_var_namesAStable_of_varchar) )v WHEREcolumn_valueNOT IN ( SELECT variable_name FROM meta_variables ); The LHC Logging Service - UKOUG Conference 2006
Database Techniques Used • BULK PL/SQL • Used by PL/SQL data loading API • BULK COLLECT for loading collections • FORALL for inserting data • Significant performance gains (depending on LIMIT clause value) PROCEDURE LOG_LOGGING_DN IS l_data logging_data_loading.data_numeric_tab; -- Collection of logging data CURSOR c IS SELECT variable_name, utc_stamp, value FROM to_log_data_numeric; BEGIN OPEN c; LOOP FETCH cBULK COLLECTINTOl_dataLIMIT 100; EXIT WHEN c%NOTFOUND; /* Send the data to the Logging DB 100 records at a time */ logging_data_loading.LOG_NUMERIC_DATA(l_data); END LOOP; CLOSE c; ……… The LHC Logging Service - UKOUG Conference 2006
Database Techniques Used • INTERVAL Data Types • Used in PL/SQL, SQL queries, and DBMS_JOB scheduling • Add clarity to code • Allow arithmetic to be carried out upon them – simplifies establishing time elapsed between consecutive client data loading sessions DBMS_JOB.INTERVAL(:job_no, 'SYSDATE + 1/24/4'); Vs DBMS_JOB.INTERVAL(:job_no, 'SYSDATE + INTERVAL ''15'' MINUTE'); CASE /* Log the value if it is outside the fixed logging interval */ WHENl_vars(i).last_logged +NUMTODSINTERVAL( l_vars(i).fixed_log_interval_size,l_vars(i).fixed_logging_interval_type ) <l_data(j).utc_stamp THEN INSERT INTO to_log_data_numeric (variable_name, utc_stamp, value) VALUES (l_vars(i).variable_name,l_data(j).utc_stamp,ROUND(l_data(j).value,l_vars(i).rounding) ); WHEN ………… The LHC Logging Service - UKOUG Conference 2006
Database Techniques Used • Multi-table Inserts • Used when inserting certain metadata • Remove the need for procedural logic, and multiple INSERT statements in the data loading API INSERT FIRST WHENdtypeIN ('NUMERIC', 'NUMSTATUS', 'TEXTUAL', 'VECTORNUMERIC') THEN INTO prof_variables(user_prof_id, variable_id) VALUES (profid,varid) WHEN dtype = 'FUNDAMENTAL' THEN INTO prof_fundamentals(user_prof_id, fundamental_id) VALUES (profid,varid) SELECT:pidprofid,:vidvarid,:dtdtypeFROM DUAL; The LHC Logging Service - UKOUG Conference 2006
Database Techniques Used • Analytic Functions • Used for Time Scaling, Auditing, Data Derivation and Filtering • Brings a new power to SQL obtain results previously only possible using procedural code • Give better performance, and reduce development time Find old range partitions to be dropped based on convention of naming partitions as<table_alias>_PARTYYYYMMDD e.g. DN_PART20061114 SELECTtable_name, partition_name FROM ( SELECT partition_name, table_name, RANK()OVER(PARTITION BY table_nameORDER BY partition_nameDESC)p_rank FROM user_tab_partitions ) WHEREp_rank>p_max_parts ORDER BY table_name,p_rankDESC; The LHC Logging Service - UKOUG Conference 2006
Outline • Introduction to CERN • The LHC Logging Project • LHC Logging Service Architecture • LHC Logging Service Design & Implementation • Database Techniques Used • Lessons Learned • Conclusion • Questions The LHC Logging Service - UKOUG Conference 2006
Lessons Learned Handling large data volumes • Range partitioned, index-organized tables • Good query performance • Simplified administration • Storing arrays of numeric data using VARRAY • Advantage of storing the data in it’s original order • Not supported over database links (need for PL/SQL type conversion) • Problematic for bulk data processing operations, where VARRAY size can vary greatly between records • No native means of comparing content of VARRAYS The LHC Logging Service - UKOUG Conference 2006
Lessons Learned Getting the data access right • Variety of data providers, with varying data volumes and rates accurate CBO statistics is an ongoing challenge • Combinations of skewed data and bind variable peeking • Temporary fix: forcing execution paths • Impedance mismatch between OO developers and database developers • Controlled database access via APIs is essential • Analytic functions – simplify development, improve performance Knowing what’s going on, and keeping control • Enterprise Manager Application Server Control 10g (9.0.4.3.0) • Intuitive, reliable, fulfils requirements for OAS administration • Functionality removed / added in the subsequent version • Client and system audit data is invaluable The LHC Logging Service - UKOUG Conference 2006
Outline • Introduction to CERN • The LHC Logging Project • LHC Logging Service Architecture • LHC Logging Service Design & Implementation • Database Techniques Used • Lessons Learned • Conclusion • Questions The LHC Logging Service - UKOUG Conference 2006
Conclusion • The Logging Project operational Logging Service, used by an increasing number of clients… • Vital to monitor client behaviour and overall service performance • Design has been validated, but the implementation continues to evolve in sync with Oracle technology • Scalability has been anticipated within the design, but additional hardware needs to be foreseen and plugged in to meet future demands • Database tuning can be improved, particularly CBO statistics strategy • With hindsight – re-consider data storage using objects • Convinced to have followed the right path, we are looking forward to the full exploitation of the Oracle-powered logging service when CERN’s flagship LHC produces its first scientific results The LHC Logging Service - UKOUG Conference 2006
Questions The LHC Logging Service - UKOUG Conference 2006
Additional Slides The LHC Logging Service - UKOUG Conference 2006
The LHC Logging Project The need for time-series data logging • The LHC will be an installation of unprecedented complexity, with dense high-energy particle beams that can potentially damage the equipment of the machine itself. • Hundred-thousands of signals coming from equipment surveillance and beam observation will be monitored to carefully examine the behaviour of this machine. • Cryogenics temperatures, magnetic field strengths, power dissipation, vacuum pressures, beam intensities and positions are all parameters that will influence the correct functioning and performance of the installation as a whole. • Each subsystem has its own data acquisition system to track parameter values over time. In order to be able to correlate this heterogeneous information, the data is “logged” in a central database, allowing end-users to: • Visualize and extract any data • Compare over time • Examine trends, • Find patterns • Discover unforeseen parameter relationships. • The need and value for such centralized logging was already demonstrated on previous occasions: • LEP Logging: data logging for CERN’s previous flagship accelerator, the Large Electron-Positron Collider (LEP), operated from 1989 till 2000. • LHC-string Logging: data logging from the prototype LHC magnet test stand, operated from 2001 till 2003. • LHC Logging project was launched in October 2001, and the first operational implementation was used in autumn 2003. The LHC Logging Service - UKOUG Conference 2006
The LHC Logging Project Initial system requirements • Two types of clients acting as data providers: • Equipment hardware related – data is progressively and increasingly available before LHC start-up, as individual subsystems go through their hardware commissioning • Particle beam related – data will only become available when the LHC is fully commissioned with the green light to receive the first particles from the injecting accelerator chain • Rough figures on data volumes – exponential increase foreseen to stabilise after first year of beam operation at around 5TB per year • Historic data of previous years will be kept on-line in the same data store, in order to facilitate data comparison over the lifetime of the LHC • Throughput and latency performance requirements could not be quantified at project start design had to anticipate scalability Overall software environment • Java, J2EE • Supervisory control and data acquisition (SCADA) systems on top of programmable logic controllers (PLC). • Heterogeneous data providers due to the specificities of the individual data acquisition systems. The LHC Logging Service - UKOUG Conference 2006
The LHC Logging Project Integration with other projects • The logging service is not an isolated tool to monitor the performance of the LHC and its subsystems • Functionality needs to be combined with other services to get a more complete picture of the machine behaviour • The Logging service is designed to be a steady, continuous, long-term data capturing tool, for a wide range of clients • The following services have complimentary functionality: • Post-Mortem service: collects and analyzes transient recordings, i.e. buffers of rapid data points (kilohertz range) from a set of crucial LHC subsystems, triggered by special events • Alarm service: centralizes notifications that indicate malfunctioning of any component to inform control room operators, and incite them for immediate action • Two prerequisites to allow a smooth integration of the services have been enforced: • Coherenttime-stamping • Consistent identification of the data The LHC Logging Service - UKOUG Conference 2006
Oracle Environment at CERN • CERN was one of the first European companies to have purchased Oracle (1983, version 2.3) • Many operational systems developed with the Oracle database as a foundation • Oracle is the standard RDBMS at CERN • Database administration managed by CERN IT department • Decentralized software development teams are responsible for database & software design • Hardware and system software infrastructure is established by IT according to specific database service requirements • In general, only terminal releases of Oracle versions are deployed as operational production platforms The LHC Logging Service - UKOUG Conference 2006
Oracle Environment for LHC Logging • Starting the logging project five years before the full blown operational use of the service – the evolution of the technology needed to be taken into account. • Strategic decisions have been made from the outset - going for the latest technology, as soon as possible: • Database technology • Design and implementation details: based on Oracle9i technology • Early 2007: migration to Oracle 10g Release 2, Enterprise Edition • Application Servers • Java is the application software platform for all new projects. • In-depth evaluation was made in 2003-2004 to compare J2EE deployment solutions • Final choice: Standalone Oracle Container for Java (OC4J) or Oracle Application Server (OAS) • Logging project opted for OAS • Enterprise Manager • Allows simple monitoring, diagnostics and repair of the deployed service • Anticipate that these actions will be partially performed by control room operators in the future • Only OAS instances are currently monitored; complete service (including database) could potentially be managed in the future The LHC Logging Service - UKOUG Conference 2006
Development & Runtime Environments • Development Tools • JDeveloper 10.1.3 is used for all Java development, and database modelling (replacing Oracle Designer). • Oracle SQLDeveloper is used for SQL and PL/SQL development. • Three runtime environments, each containing the complete setup of: • Database schemas • Data loading APIs • Data reading APIs • Application Server(s) • Development: • Developing and testing features to satisfy new requirements • Upgrading existing implementations to use new techniques or technologies • Test: • Identical to the production environment in terms of hardware and software deployment. • Used by new clients of the logging service – allowing them to validate their software against our system. • Opportunity to validate behaviour of new clients – ensuring they conform to logging service requirements: • Send data in batches • Send data at appropriate frequencies • Filter unchanged or uninteresting data • Round numeric values to appropriate precisions • Logged data is either deleted after an elapsed time, or transferred to the production environment. • Production: • For clients who have passed ‘validation’ in the TEST environment, and wish to continue logging data, to be stored beyond the lifetime of the LHC machine. The LHC Logging Service - UKOUG Conference 2006
Diagnostics • Database – ViewDB (in-house tool): • Database activity • User sessions • Resource consumption • Application Server – Oracle Enterprise Manager Application Server Control: • Monitor application server usage and performance • Access server and application log files • Control application server components • Edit configuration files • Modify Web application runtime parameters in various OC4Js • System Usage – Audit tables, populated by the various data loading APIs, database triggers, and automated procedures which analyse logged time series data: • Monitor evolution of system usage • Plan for new system resources in advance • Identify bad clients • Track system performance • Identify the causes of bursts of activity The LHC Logging Service - UKOUG Conference 2006
Evolution and Future Direction Scalability – difficult to estimate final throughput and volume rates • Not all subsystems are currently available. • Fine details of data pushing capabilities of each subsystem are not known yet. • Rationalization of information (e.g. filtering, data reduction) is of low priority to data providers. • Peak loads may occur unexpectedly from inexperienced clients. We can assume the eventual need to increase the resources of the service • Number of application servers. • Number of nodes in the database RAC cluster. • Number of disks, and disk controllers. No fundamental changes to the overall system design or the database design are foreseen. Planned upgrades • For the year 2007, an extra raw disk capacity of 7.2 TB will be installed. Unexplored possibilities • Clustering of application server hosts at software or hardware level. • Automatic load balancing between the database servers. • Data replication on a redundant system to ensure 100% uptime. • Application notification of data modifications to avoid polling by external applications. • Bringing in an Oracle consultant for system tuning. The LHC Logging Service - UKOUG Conference 2006