240 likes | 496 Views
ASE101: Building a Real-Time Database Performance Repository Using the Sybase MDA Tables. Jeffrey Wong Consultant, Anton Ventures wongjtf@aol.com August 15-19, 2004. Building a Real-Time Database Performance Repository Using the Sybase MDA Tables. Overview
E N D
ASE101: Building a Real-Time Database Performance Repository Using the Sybase MDA Tables Jeffrey Wong Consultant, Anton Ventures wongjtf@aol.com August 15-19, 2004
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Overview • Limitations of other database performance monitoring approaches • Limitations of this approach to database performance monitoring • Typical uses for the real-time (RT) MDA database system • Developmental timeline • Architecture of the RT MDA database system • Barrier synchronized time base and surrogate key equivalence • Fast insertion mechanism for capturing MDA table output • 2NF normalization and surrogate key substitution of foreign keys • Stored procedure generation from specification files • Restart of capture or archiving DML after deadlock victim termination • Visualization of archived MDA performance data • Visualization examples (3) • Futures • References • Questions and answers
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Limitations of other database performance monitoring approaches • Network Packet Sniffing – Limited to request-response measurements. No information on Sybase internals efficiency • Sampling via Separate Monitoring Process – Separate process introduces latency effects. Can be highly invasive for short sampling periods (typically, anything less than 10 seconds). IPC (inter process communication) overheads can be significant. Typically, no synchronized time base for collected samples • System Performance Report – Typically, output is not in data-mining-friendly (i.e.: tabular) format. Highly invasive for any sampling period less than two minutes • Ad-hoc Measurements – May not be monitoring the correct indicators. Very heavily dependent on the analyst’s skill level
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Limitations of this approach to database performance monitoring • MDA Table Queries “Hang” at High CPU Utilization Levels – This is a current limitation of the CIS proxy table mechanism upon which the MDA tables are based • Specialized Resource Needs – Ideally, both tempdb and the RT data capture database need to be on RAM disk, with the RT data capture database also requiring battery backup of its’ RAM disk. Next best would be configuring a named cache for tempdb and RAM disk hosting for the tempdb transaction log • Lower Limit to Sampling Granularity – Due to various technical considerations (including excessive resource utilization of CPU and disk), the current lower limit for the sampling periodicity is 5 seconds (c.f.: MDA table minimum sampling periodicity of 1 second) • Counter Wraparound – Steady-state “averaging” technique was employed to “estimate” a counter value where that counter value had overflowed and wrapped around. This may or may not always be the correct technique to use
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Typical uses for the real-time (RT) MDA database system • System Instability Root Cause Analysis – The RT MDA database system was first deployed on a warm-standby replication reporting subsystem which had been experiencing intermittent stability issues. The two root causes of system instability were rapidly identified using this analysis system • Capacity Planning and System Performance Monitoring – Actual resource utilization of a business intelligence and data mining vendor application was quantified by the RT MDA database system, prior to its’ introduction on a 7x24 OLTP production system. The new application was subsequently monitored by the RT MDA database system to certify that it exhibited appropriate resource utilization in production • System Performance Characterization – System performance characteristics of 8 7x24 OLTP production systems are being collected by the RT MDA database system, prior to these 8 systems being consolidated on a 4 node cluster hardware system. The data collected by the RT MDA database system will be used to appropriately size and configure HA resource groups for that cluster
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Developmental Timeline • April, 2003 – Release of ASE 12.5.0.3 ESD #1 (MDA Tables release). Commenced private R&D into MDA Tables functionality • July, 2003 – Commenced private development of RT MDA database system V1.0. This version of RT was single channel with a capture database and time base / surrogate key generator • September, 2003 – Deployment of RT MDA V1.0 to a production site (first commercial use). Commenced private development of RT MDA V1.1: Multi-channel for both capture and archiving databases, bit-map barrier synchronization for both capture and archiving, monitoring profiles, and capture/archiving stored procedure generation from specification templates • January, 2004 – Deployment of RT MDA V1.1 to a production site • March, 2004 – Commenced joint development of RT MDA V1.2 (visualization and capacity planning/system performance characterization) • April, 2004 – Deployment of RT MDA V1.2 (visualization) to a production site • June, 2004 – Deployment of RT MDA V1.2 (capacity planning/system performance characterization) to a production site
MDA Tables J Time Base Trigger K Capture Database I H E Archive Database A D Archive / Purge Stored Procs, G Capture Sync Archive / Purge Sync F L B Visualization Construction N Capture Stored Procs M tempdb Visualization C Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Architecture of the RT MDA database system
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Architecture of the RT MDA database system (continued) Transitions Description: • Transition A – Parallel capture of MDA table data synchronized on a standardized time base / sample number identification pair • Transitions B, C, D – Capture stored procedures perform parallel inserts into the capture database. Tempdb is used to perform fast enumeration of capture data, such that for any capture data set, each row contains a monotonically increasing unique identifier. Each capture stored procedure schedules itself in a sleep-retry loop, if there is no data to process • Transition E – Parallel archiving and purge of captured MDA table data synchronized on user configured archiving and purging time intervals • Transitions F, G, H, I – Archive / purge stored procedures perform parallel inserts into the archive database when archiving, and parallel deletes into the capture and archive databases when purging. Tempdb is used to perform fast enumeration of filtered capture data, such that for any archive data set, each row contains a monotonically increasing unique identifier. Each archive / purge stored procedure schedules itself in a sleep-retry loop, if there is no data to process
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Architecture of the RT MDA database system (continued) Transitions Description (continued): • Transitions J, K – Time base trigger fires for each generated time base / sample number identification pair stored in the capture database, thereby ensuring that the time base history tables in the capture and archive databases are kept completely in sync • Transition L – Every day at 1 AM, a batch job is executed that encapsulates the previous day’s archive data into a set of time-stamped, time-series visualization tables and views • Transitions M, N – Data mining is performed on extracted time-series visualization data at the operations analyst’s convenience. Currently, unsophisticated visualization is carried out using Microsoft Excel (using its’ in-built charting module). It is intended in the future to extend the scope of the visualization to encompass other OLAP/BI tools (such as Impromptu)
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Barrier Synchronized Time Base and Surrogate Key Equivalence • Barrier synchronization can handle up to 64 parallel channels • Most capture barrier synchronization is at one hierarchical level only, except for process monitoring which is at two hierarchical levels (this was done to enforce strict correlation between MDA process monitoring tables, albeit at the expense of introducing a one second time skew between sampling the process lookup table, and sampling the other process monitoring tables) • The time base generation is also barrier synchronized, and is currently set for 15 second sampling intervals • At the instant the current time base is synthesized, a monotonically increasing, unique surrogate key (identifier) is assigned to that time base. This surrogate key is the sample number for this particular parallel sampling activity • Also at the instant the current time base is synthesized, it, along with the sample number, initial time base indicator and sampling interval, are written to the capture time base history table (NB: This table forms the time-series frame of reference for the parallel capture activities of the RT MDA database system)
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Fast Insertion Mechanism for Capturing MDA Table Output • The quantity of data output from MDA tables in a short period of time can be quite significant. For example, a trial deployment of RT MDA V1.0 filled up 3 GB of disk in only one hour. There was only moderate activity on this target system, sampling intervals were 15 seconds, and only 4 MDA tables were being monitored • Consequently, the following techniques were employed to ensure that capture database table insertion was as fast as possible: • Most RT MDA tables are defined to use allpage locking, with SQL command retry used to recover from non-clustered index deadlocks • All DML to the capture or archive databases is done through highly-optimized (TransactSQL) stored procedures • Surrogate key updating of the control table is done once, at the end of the set insert composite transaction • Minimally logged insert commands (select into) are used to insert data into the tempdb staging tables • Set insert commands are used to insert the staging table rows into the respective capture database tables • Tempdb has been configured on each database server that is being monitored, as having both a bound named data cache and a transaction log hosted on RAM disk
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • 2NF Normalization and Surrogate Key Substitution of Foreign Keys • As discussed earlier, the volume of information that needs to be stored when building a database repository from sampled MDA table data, can be significant • Hence, for V1.1 of the RT MDA database system, an archive database was introduced, along with a parallel time base synchronization mechanism • This archive database contains 2NF versions of the data tables of the capture database, along with associated domain tables • The domain tables contain the original column data as captured from the MDA tables, along with associated surrogate keys • Part of the parallel archiving process involves substitution of the original MDA column data with the associated surrogate key. These surrogate keys now act as foreign keys for the 2NF data tables of the archive database • The space savings accomplished by the above strategy are substantial. Currently, RT MDA V1.1 can handle 7x24 capture and archiving with up to 30 parallel channels on a 15 second cyclical sampling duration and a rotating 8 day archiving window with minimal invasiveness to the host OLTP system and with modest disk needs (3GB capture db, 6GB archive db)
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Stored Procedure Generation from Specification Files • Given the performance and non-invasive requirements for the RT MDA database system, it is mandatory that the capture, archive, normalization and purge operations be performed via stored procedure executions • However, the tedious and error-prone nature with manually encoding these stored procedures necessitated a more accurate and efficient process for generating these database objects • Thus initially, the manual encoding process was performed for a representative MDA table, and for capture, archive, normalization and purge activities • Then, the finished set of four stored procedures was parameterized and encoded as three perl template programs (NB: archive and normalization activities were merged together) • Three configuration files were then constructed, with each non-blank line of the files containing the parameter information for a particular stored procedure
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Stored Procedure Generation from Specification Files (continued) • The above mentioned perl programs were next run with the respective configuration files, thereby producing the deployment source code for the RT MDA database system stored procedures • Installation and unit testing of the generated stored procedures was then carried out on a development system • Finally, the tested stored procedures were deployed on target production database servers and carefully monitored for possible malfunction
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Restart of Capture or Archiving DML After Deadlock Victim Termination • Given that most tables in the RT MDA capture database are of allpage locking (for performance reasons), the inclusion of deadlock retry logic is mandatory due to the distinct possibility of non-clustered index deadlocks occurring during normal operation • Initial implementation involved using a “C” or sybperl program per channel to re-submit the RT MDA stored procedure for execution automatically upon detection of the deadlock. However, this approach was abandoned due to logistic reasons (i.e.: installing system software on multiple production systems requires significant commitment from outside groups for any large organization) • Current implementation involves a perl program per channel performing re-submittal of a shell script for execution automatically upon detection of the deadlock. The shell script in turn calls up the RT MDA stored procedure via isql • Future plans involve adapting Jean-Paul Martin’s asemon-logger Java program to provide the same functionality that was initially implemented to handle the non-clustered deadlock issue
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Visualization of Archived MDA Performance Data • Current methodology is simply applying Microsoft Excel’s charting module to RT MDA archive data extracted using custom visualization views • Major limitation of this approach is that there is no easy way to prevent the charting module from auto scaling the chart axes (especially the Y-axis), thus distorting the relevancy of the visualization • Advantages of this approach include easy auto-filtering of column data, easy chart type composition, and fast charting performance (when the chart data is taken from a flat file rather than dynamically through a database connection) • A recent discovery has been the existence of Jean-Paul Martin’s asemon Java GUI program for display of MDA table data in a tabular format. Consequently, the current focus for visualization has been temporarily shifted toadaptation of this program to access RT MDA capture and archive databases (rather than further development of the Microsoft Excel approach) • Three visualization examples have been included in this presentation to show the effectiveness of the Microsoft Excel approach to date
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Visualization Example #1 (disk activity)
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Visualization Example #2 (engine activity)
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Visualization Example #3 (buffer pool activity)
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Visualization Example #3 (buffer pool activity, continued)
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Futures (in no particular order) • Adaptation of the RT MDA database system to utilize other data mining/BI GUI tools (such as Impromptu) • Adaptation of Jean-Paul Martin’s asemon Java GUI program to visualize data contained in both the RT MDA capture and archive databases • Adaptation of Jean-Paul Martin’s asemon-logger Java program to utilize the barrier synchronization and capture/archive stored procedures of the RT MDA database system • Implementation of missing value simulation and time axis normalization in the RT MDA archive database • Design and implementation of event notification triggers in the RT MDA capture database. It is intended to integrate these notification triggers into the resource group failover mechanism of a well-known hardware vendor’s cluster computing system • Integration of Replication Server into the RT MDA database, so that the RT MDA archive, normalization and purge operations can be handled by DSI invocation of custom function strings • Integration of Replication Server 12.5+ performance counters into the RT MDA database system
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • References • What’s new in ASE 12.5.0.3 – Sybase Incorporated (the original reference for MDA Tables functionality) • http://www.sypron.nl (Rob Verschoor’s Web site – more information of MDA Tables setup and usage) • http://www.sybase.com/developer/codexchange (source code for Jean-Paul Martin’s asemon and David Owen’s sybmon Java GUI monitoring programs are available at this Web site) • Tips, Tricks & Recipes for Sybase ASE – Rob Verschoor (where some components of the fast insertion mechanism for MDA table data capture were initially discussed) • Predictive Data Mining, A Practical Guide – Sholom M. Weiss, Nitin Indurkhya (background information on data mining techniques) • Data Mining and Statistical Analysis Using SQL – Robert P. Trueblood, John N. Lovett, Jr. (data mining statistical analysis queries using Transact-SQL)
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables • Questions and Answers • We have 10 minutes for Q&A. Please feel free to ask. Thank you • I will defer the difficult questions for when the 10 minute period has expired (your schedule permitting) • Any questions that I cannot answer I will try to do so later via email, time permitting. Please be so kind as to leave me your email address in this case • My email address is wongjtf@aol.com. If you still have questions regarding this presentation, kindly send me a note and I will respond as soon as I can. Thanks
Building a Real-Time Database Performance Repository Using the Sybase MDA Tables Jeffrey Wong Consultant, Anton Ventures wongjtf@aol.com August 17, 2004