290 likes | 449 Views
ASE 15 migration 2 case studies with prepared statements. Rev. 7.2012. Andrew Melkonyan Senior Database Architect Ness Pro Division, NESS Professional Services, Israel. ASE 15 upgrade fiasco.
E N D
ASE 15 migration 2 case studies with prepared statements Rev. 7.2012 Andrew Melkonyan Senior Database Architect Ness Pro Division, NESS Professional Services, Israel
ASE 15 upgrade fiasco • In the past two years I witnessed two large production OLTP servers failing to migrate from ASE 12.5.x to ASE 15.x. • One upgrade (Customer#1) was performed on an HP Integrity BL890c i2 host with 8 Intel(R) Itanium(R) Processor 9340s (1.6 GHz), 32 logical processors (4 per socket), 256 GB RAM. • The other upgrade (Customer#2) was performed on an M5000 server with 8 SPARC VII Processors (2.4 GHz), 32 logical processors (4 per socket), 128 GB RAM. • Disproportionally high degree of engine utilization coupled with significant drop in throughput turned each migration attempt into a failure.
Customer#1: BASE-line ASE 12.5.3: Runs about 300 transactions per second with an average of 25% engine utilization. The system runs about 600 procedure requests and 1700 statements per second (Total Rows Affected: ~6.5K, Total Index Scans ~72K, Total Lock Requests ~300K, 1.4 M bytes received/sent per second). Customer #1: an OLTP server (12.5.3) accessed by a mixture of clients – mostly Power Builder CTLIB software.
Customer#1: migration ASE 15.5:Run about 100 transactions per second with an average of 30% Engine utilization. The system run ~600 procedure and ~500 statement requests per second. No apparent problems with ASE configuration. No apparent reasons for slowing down. TF753 did not help. Migration was aborted. Early during migration ASE started to exhibit higher than expected engine utilization while the throughput sunk.
Customer#1: migration Sybase TS initial direction: procedure cache configured too small (12 GB for 24 Engine ASE) & statement cache configured too large (2 GB). The two together were thought to result in high spinlock contention on ASE resources (SSQLCACHE and RPROCMGR spinlocks).
Customer#2: BASE-line Customer #2: an OLTP server (12.5.4) accessed by a mixture of clients – JDBC, BDE and native CTLIB software. ASE 12.5.4: Runs about 900 transactions per second with an average of 40% engine utilization. The system runs approximately 1400 procedure and 800 statement requests per second (Total Rows Affected: ~15K, Total Index Scans ~65K, Total Lock Requests ~250K, 1 M bytes received/sent per second).
Customer#2: migration(s) Early during migration ASE started to exhibit extremely high engine utilization while the throughput sunk. ASE 15.x: Run about 200 transactions per second with an average of 95% Engine utilization. The system run ~650 procedure requests per second. Again, no apparent problems with ASE configuration. No apparent reasons to slow down. TF753 did not help. Migration aborted – twice (massive code review in-between).
Customer#2: migration Sybase TS initial direction: procedure cache fragmentation (4 GB for 16 engine ASE). In the aftermath of work done for Customer #1 it has become clear that here too the problem is around RPROCMGR (visible neither in regular sp_sysmon invocation nor through MDAs).
Migration aftermath: Stress tests • Following the migration, simulators were written by customers teams in order to reproduce failed migration. None of the customers succeeded to reproduce comparable throughput drop. • Only under extreme stress the same degree of spinlock contention or drop in throughput started to surface. • Narrowing down the checks, it was found that what causes a high degree of contention is running multiple prepared statements simultaneously.
Migration: next step In order to deconstruct migration failures I had to: • Learn the difference in prepared statement impact on ASE 12.5.x versus ASE 15.x. • See what is so peculiar about these customers ASE environment from the prepared statement API angle. • Learn the right way to handle prepared statement calls from the DBMS side.
1. prepared statements: faq When application code uses prepared statement API both client host and DBMS server host prepare internal memory structures designed for subsequent reuse. For ASE’s this structure is a lightweight procedure – or LWP. Depending on connection settings ASE may generate: • Fully prepared statements. • Partially prepared statements. The footprint of each on ASE is very different. When these structures are created without the purpose of being reused, they may do real damage to ASE. The distribution of prepared statement types may be inspected either with dbcccis trace flags or by inspecting monSysSQLText (DYNP are identified as “DYNAMIC_SQL”).
1a. Fully prepared statements Consider the table below: We use three JDBC clients running fully prepared statements in a loop. Instead of reusing them in client code, we just create them and drop right after being used once. All the versions up to 15.7 produce significant spinlock contention and the throughput drops. Only the “streamline SQL” option introduced in 15.7 fixes the issue (the last column: note the change in procedure removal and statement reuse). For ASE 12.5.x this is not an issue at all.
1a. Fully prepared statements Given the change in fully prepared statements footprint on ASE, the following recommendations apply for application generating fully prepared statements at high rate without aiming at reuse: • If the application layer must use prepared statement semantics and there is a high volume of fully prepared statements generated and dropped at once rather than reused, run ASE 15.7 ESD#1 or later and turn the “streamlined SQL” option on. • ASE 15.5 cannot handle high rate of fully prepared statements well. ASE 15.5 can only be considered an option if there is very little non-reused fully prepared statement calls from client or if an intervention on driver/connection level is possible (e.g. DYNAMIC_PREPARE = false for JDBC/ODBC client).
1b. PARTIALLY prepared statements With the statement cache sized properly ASE 15.x handles load generated by partially prepared statements much better than ASE 12.5.x. The problem arises only when client code generates high volume of unique SQL statements forced on the application layer into prepared statement API – again generated and dropped rather than being reused. In this case, statement cache becomes inefficient. Statements come in and out of statement cache at high rate causing statement cache turnover and excessive LWP recompilation. Throughput goes down and engine utilization goes up. In fact, in this case, fully and partially prepared statements become almost identical from the DBMS point of view.
1b. PARTIALLY prepared statements Consider the table below: We use three JDBC clients running partially prepared statements in a loop – again with no reuse: create->execute once->free. One generates unique SQL statements. Statement cache turnover and LWP cleanup go up. ASE 15.5 may handle this situation only if the “bad” unique code is executed with statement cache turned off. ASE 15.7 handles the situation gracefully – with or without the streamlined SQL option. For ASE 12.5.x this is not an issue at all
1b. Partially prepared statements Given the change in partially prepared statements footprint on ASE, the following recommendations apply for applications generating partially prepared statements at high rate without aiming at reuse : • If the application layer must use prepared statement logic and there is a high volume of uniqueprepared statements generated and dropped at once rather than reused (either due to bad coding or third party application components – e.g. data-window), the recommendation is to run ASE 15.7 ESD#1 or later. • ASE 15.5 may be considered a migration option onlyif the poorly reused uniquecode submitted using the prepared statement API is isolated on the connection level and the statement cache is turned off (e.g., set statement_cache off in login trigger).
Customer#1: base-line revisited Let’s inspect prepared statement situation in 12.5.3 for Customer#1. We may see here that there is a medium rate of procedure requests. However, there is almost no procedure removals. Client application here has little or no fully prepared statement calls (confirmed through monSysSQLText inspection). In addition, we see ~2000 statement requests per second.
Customer#1: migration revisited Let’s inspect Customer#1 migration data: The [2GB] statement cache is not large enough. We see the same high rate of procedure removals and statements cached. Note: Statement cache holds 440 statements only and occupy 2GB!
Customer#1: 2g statement cache We have experimented with statement cache size during simulation sessions and found that the greater the cache the lower is engine utilization – all the way up to 2GB statement cache. So 2GB statement cache by itself did not constitute a problem here (compare throughput below vs. migration data).
Customer#1: cached statements The simulation gave us also a chance to inspect statement cache distribution. Cache buckets were found to contain 1 to ~2000 hashed statements in very uneven distribution. Below is a sample bucket content (with 1961 SSQLs in it): What we see here is that this is the same SQL which is slightly modified by the PB client data-window component.
Cust#1: migration fiasco reason Now we have all the information we need to put things together in order to understand what went wrong during migration to ASE 15.5 for Customer#1. We have learned that both fully and partially prepared statements used in code without intention of being reused have very heavy footprint in ASE 15.x. To make things worse Customer#1 client API (data-window component) generated a huge number of large unique partially prepared statements that thrashed statement cache and resulted in high rate of statement cache turnover and LWP cleanup. This was the root cause of the disproportionally high engine utilization and the drop in throughput.
Customer#1: migration paths Given Customer#1 client code the only migration path is to use 15.7 with the latest ESD. This ASE version may handle this type of “bad” code and protect against fully or partially prepared statements misuse. If migration to 15.7 is not possible, in order to be able to migrate to 15.5 it is absolutely necessary to: • Identify and clean up “bad” code (avoid using prepared statements where not needed). • Insulate connections that may generate this type of code with “set statement_cache off” connection setting.
Customer#2: base-line revisited Let’s inspect prepared statement situation in 12.5.4 for the Customer#2. We may see here that there is a high rate of procedure requests. In addition, there is a high rate of procedure removals. Client application here uses fully prepared statement API extensively (confirmed through monSysSQLText inspection). We may also note ~800 statement requests per second.
Customer#2: migration revisited This is Customer#2 migration data: Here too we may see that there is a high rate of procedure removals. This is a footprint of fully prepared statements. Statement cache was sized 20 MB and seems not to be an issue. It has been discovered that during migration JDBC connection pool was configured DYNAMIC_PREPARE = TRUE, generating a huge amount of fully prepared statements.
Fully prepared statements & JDBC Consider the impact of JDBC connection setting on ASE. To the left is simple java code running fully prepared statements (create->run once->free). We set DYNAMIC_PREPARE to TRUE. ASE Engine load is ~16% To the right is the same code running with DYNAMIC_PREPARE set to FALSE. ASE Engine load is ~11%. There is a huge overhead on ASE caused by JDBC connection set to DYNAMIC_PREPARE = true.
Cust#2: migration fiasco reason Now we have most of the information we needed to put things together in order to understand what went wrong during migration to ASE 15.0 and ASE 15.5 for the Customer#2. We have learned that both fully and partially prepared statements used in code without intention of being reused have very heavy footprint in ASE 15.x. To make things worse, this customer’s client API generated a large number of fully prepared statements due to JDBC connection pool DYNAMIC_PREPARE = true configuration (create->execute once->drop). This was the root cause of the disproportionally high engine utilization and the drop in throughput here.
Customer#2: migration paths For the Customer#2 too the best is to migrate to 15.7. As we have seen, this version may both handle unique “bad” code sent as partially prepared statement and protect against fully prepared statements misuse. In fact, for misused fully prepared statements ASE 15.7 gives better throughput than 12.5.x. If migration to 15.7 is not possible, in order to be able to migrate to 15.5 it is absolutely necessary to: • Turn the JDBC settings for DYNAMIC PREPARE to false. • Inspect the code for existence of unique SQLs submitted as prepared statements. • Insulate connections that may generate this type of code with “set statement_cache off”.
Prepared statements and ASE It is NOT recommended to use prepared statement semantics where it is not necessary. Especially, not to create prepared statements if not designed for later reuse. The overhead it has on an ASE is very high, especially in ASE 15.x. Prepared statements are designed for reuse and must be treated accordingly. ASE 15.7 with “streamlined SQL” option turned on makes things work. Still the SQL code submitted to ASE as “non-prepared” will result in much higher throughput than when submitted as prepared statement (fully or partially). Although this seems to be obvious, this fact is very often neglected. Conclusion: if the prepared statement is a must (e.g. JDBC protocol using it for security reasons, or data-window components encapsulating customer business logic unawares) – run ASE 15.7 and turn the “streamlined SQL” on. Anything else will bring your throughput down and ASE engine utilization high up.
Q&A: assistance requests Most of the tests exhibited here were performed in a controlled environment and were tested to provide repeatable results. This study has been performed using custom tools available at http://andrewmeph.wordpress.com I am available for questions related to the tests described here either through the blog or directly at andrew.me.ph@gmail.com Don’t forget to tests things before going Production. It may hurt.