490 likes | 696 Views
Session: A11. DB2 War Stories and Scary Tales (Part 1). Robert Goodman Certified DB2 Professional Certified Business Continuity Planner. May 9, 2007 3:00 p.m. – 4:00 p.m. Platform: DB2 for z/OS. DB2 War Stories & Scary Tales. Major Areas Database Foundation Stones
E N D
Session: A11 DB2 War Stories and Scary Tales (Part 1) Robert Goodman Certified DB2 Professional Certified Business Continuity Planner May 9, 2007 3:00 p.m. – 4:00 p.m. Platform: DB2 for z/OS
DB2 War Stories & Scary Tales Major Areas • Database Foundation Stones • SQL Exposures • Questionable Strategies • DDL Exposures • Operational Exposures
DB2 on z/OS – A World Class Act! The Fastest Most Dependable Most Secure Highest Performance Most Scalable Database on the Planet! DB2 for z/OS We're Bullish on DB2!
Data Recoverability Data Security Performance & Scalability Data Concurrency Data Integrity Database Foundation Stones New Features & Combinations Can Undermine These Foundations! Understanding the Risks Is Critical!
DB2 War Stories And Scary Tales By DATABASE BOB Think Tank Publications Where Shall We Begin? Chapters • Fear of Commitment • Trigger Happy • Rotate Roulette • Extents Wide Open
Fear of Commitment Chapter 1
Cycling DB2 Start DB2 Apps Up Apps Up DB2 Stop DB2 Crash When DB2 Does Lots of Recovery Start: IRLM DM DDF Recover: Unresolved URs Doesn't deferred restart fix all issues? Deferred Restart - Gives Consistent Startup Times
Once Upon A Time In the kingdom of DB2, the time came for a systems downtime. Many tasks had been busy during the week. It came time to cycle DB2. It was cancelled, but was naughty and wouldn’t come down. After two hours, the operator forcefully killed DB2. Maintenance was applied and the system was brought up. DB2 lingered and wouldn’t wake up. After three hours, the operator killed DB2 for a second time. We tried once again to wake up DB2. This time we asked the wizards at IBM how to slay the problem. They declared that we should just let it run. Twelve hours later DB2 came up. Why did it take so long? And can this happen even today?
What Happened and Why? Force DB2 DB2 Up Sparse Updating Long URs X X X X X X X DAY1 DAY2 DAY3 DB2 Restart: Recover Incomplete URs Log1 Log2 Log3 Log1 Log2 A1 A2 A Long Running UR Had to Roll Back Using Archive Tapes & Active Logs - DB2 Log Records - 1) Do Records 2) Undo Records
Without Deferred Restart Applications Access DB Log Log A1 Forward Recovery Backward Recovery Start DB2 Restart Complete App App App App App App App App All Recoveries Must Complete Deferred Restart Applications Access DB (Quicker) Log Log A1 Forward Recovery Backward Recovery Start DB2 Restart Complete X X App App App App App App App RECP DB2 Restart Deferred Restart Gets the Subsystem Up Quicker
Turning On Deferred Restart • LIMIT BACKOUT Parameter • AUTO – Automatically recovers once DB2 is up • YES – RECOVER POSTPONED command resumes recovery • NO – Process all inflight and inabort URs • BACKOUT DURATION Parameter • log records during backward recovery before deferring Are These Parameters Set Correctly?
Does deferred restart always work? In rare cases it fails Won’t deferred restart fix all my problems? Normally Deferred pagesets still need recovery What is status of pagesets after restart? Most pagesets are available Deferred pagesets are unavailable What is the exposure? Applications which use deferred pagesets will fail DB2 Restart Questions • How can we detect long running URs? • DB2 log message - DSNR035I UNCOMMITTED UR AFTER ### CHECKPOINTS • Can I automatically cancel long running URs? • Netview can be used to do this
Trigger Happy Chapter 2
The Trigger Concept SQL SQL With Trigger Program1 Program1 SQL1 SQL1 • Trigger Facts: • Programs are unaware • Synchronous • Part of UR • Firing cost of a FETCH • Add in trigger SQL • Plus cost of trigger work • Adds to SQL elapsed Master SQL SQL Database Invokes Trigger Create Trigger After Trigger Firing A Trigger - About the Cost of A FETCH
Once Upon A Time The king declared that dashboards would help rule the kingdom. This was a daunting task for many programs had to be changed. Triggers came to the rescue, they were quick and easy. Since 1 trigger was good, many were even better. They multiplied like rabbits and soon the whole kingdom was full of triggers. The word came down from on high that things were slow. Many programs were dragging but no changes had been made. They noticed that when triggers were added, darkness descended upon the kingdom. What had gone wrong? And how could it be fixed?
Multiple Triggers Multiple Triggers • Multiple Triggers • All are synchronous • All in UR • Multiple trigger’s SQL • Fire one after another • Fired in timestamp order • Serially add to elapsed time Program SQL SQL Master Database Invokes Trigger Before Trigger After Trigger After Trigger SQL Elapsed Time Multiple Triggers Run in Sequence
Triggers & Stored Procedure Program SQL SQL DB2MSTR Database Invokes Trigger After Trigger • Triggers & Stored Procedures • Synchronous • SP program load time • SP execution time • Can make calls outside DB2 • Greatly extends total times DDF Stored Procedure Invoke SQL Elapsed Time Stored Procedures Add Significant Overhead to Triggers!
Stored Procedure w/Transition Tables • Using Transition Tables • Synchronous • Transition tables • Create table time • Use table time • Delete time • SP program load time • SP execution time • Calls outside DB2 time • Adds significantly to times Program SQL SQL MSTR Database Invokes Trigger After Trigger Stored Procedure DDF Invoke DM SQL Elapsed Time Transition Tables Add Significant Overhead
How Expensive Are Triggers? Statement Triggers – Cheapest Row Triggers – Cheap SP Triggers – Expensive SP Triggers w/Trans Vars - Priceless Fire Trigger – Cost of a FETCH +Trigger SQL – Cost of SQL + WHEN – Invoked every time trigger event happens + Transition Variables – Cost of transition table + Invoke Stored Procedure – DDF, Start thread +Resident Stored Procedure – work in SP +Non-Resident Stored Procedure –start SP + work in SP The Costs Add Up! We Used All of the Expensive Options
Generally Poor Reasons to Use Triggers • Just because they’re quick • Lazy man’s solution • Easier than changing programs • Temp fixes that become permanent • For data replication • To populate summary tables • To enforce simple value constraints • To enforce RI constraints • 9) To maintain dashboards (oops!) Misusing Triggers Can Impact Performance!
Modifying Triggers The Reality One Would Think Update TRIGGER Refresh TRIGGER DROP TRIGGER CREATE TRIGGER DROP & CREATE TRIGGER How Triggers Are Maintained Triggers - Easy to Create, Can Be Tough to Drop!
Trigger – Firing Sequence Firing Order Firing Order Trigger 1 2007-01-01 Trigger 1 2007-01-01 1 1 DROP & CREATE On 2007-02-15 Is Sequence Important? Trigger 2 2007-01-15 Trigger 3 2007-01-30 2 2 Trigger 3 2007-01-30 Trigger 2 2007-02-15 3 3 Order of Creation is Firing Order
Who Is Aware of Triggers? Source Awareness • Yes - DB2 • Should Be - DBAs • Maybe – Programmers • NO – DB2 Utilities Triggering Awareness • NO – Applications • NO – SQL • NO – DB2 Optimizer • NO – Explain • NO – Resource Limit Facility • NO – Constraints Triggers - A Run-Time Event Triggers Work Outside Programs & Triggering SQL
Invisible Program Dependencies Programs A B C D . . . • Invisible Causes of Breakage • Trigger • Trigger SQL • DDF • Transition tables • Stored procedure • Application tables • Calls outside DB2 • Unavailable resources • RI & check constraints • Utilities • Deadlocks & timeouts Any Break Causes All Triggering SQL to Fail! SQL MSTR SQL Trigger DDF Stored Procedure Calls Outside DB2 DBM1 AT TT AT AT AT Invisible to Programs Dependencies Make Administration Challenging!
Trigger Traps When DDF reloaded, the resident SP-X the transition variables no longer matched the trigger and SP-X. The trigger had to be dropped & recreated to correct this. SP-X had to be changed to include new column in TableA The Scenario 1) Update trigger on TableA Starts resident stored procedure(SP-X) Inserts before image into log – TableB 2) DBA adds column to TableA 3) Days Later - SQL updating TableA starts failing Why? DBA Corrective Actions 1) Drop the trigger (may require down-time) 2) Drop the stored procedure 3) Add column to parameter lists & SQL 4) Recreate the stored procedure 5) Recreate the trigger Alters Triggers DBA's Must Know Interdependencies to Avoid Trigger Traps & Outages!
RotateRoulette Chapter 3
The ROTATE Concept ROTATE DDL Command 1) Delete Oldest Partition Rows 2) Reuse Oldest Partition 1 Oldest Limitkey A 1 Newest Limitkey E 2 Old Limitkey B 3 New Limitkey C 4 Newer Limitkey D New Last Part Limitkey Delete Old Data Does This Match Your Needs?
Rotating Partitions ALTER TABLE table ROTATE PARTITION FIRST TO LAST ENDING AT (limitkeys) RESET; Simple, Beautiful & Non-Disruptive(?)
ROTATE In Action ROTATE DDL catg.DSNDB.db.ts.I0001.A001 LP 1 LP 4 First Last LP 2 LP 1 catg.DSNDB.db.ts.I0001.A002 First LP 3 LP 2 catg.DSNDB.db.ts.I0001.A003 LP 3 LP 4 catg.DSNDB.db.ts.I0001.A004 Last Logical Partition SYSTABLEPART(V8) Physical Partition Dataset Logical Partition After The ROTATE After Rotate - Logical & Physical Parts Don't Match!
P4 P1 P3 P2 A Series of ROTATEs ALTER TABLE ROTATE ... ENDING AT (‘E’) RESET; ALTER TABLE ROTATE ... ENDING AT (‘F’) RESET; ALTER TABLE ROTATE ... ENDING AT (‘G’) RESET; ALTER TABLE ROTATE ... ENDING AT (‘H’) RESET; 1st Rotate 2nd Rotate 3rd Rotate 4th Rotate Limitkey catg.DSNDB.db.ts.I0001.A001 P1 P1 P1 P1 ‘A’ ‘E’ L1 L4 L3 L2 L1 catg.DSNDB.db.ts.I0001.A002 P2 P2 P2 P2 ‘F’ ‘B’ L2 L1 L2 L4 L3 catg.DSNDB.db.ts.I0001.A003 P3 P3 P3 P3 ‘C’ ‘G’ L3 L2 L1 L4 L3 catg.DSNDB.db.ts.I0001.A004 P4 P4 P4 P4 ‘H’ ‘D’ L4 L2 L1 L3 L4 SYSTABLEPART Maps Logical Partitions
Once Upon A Time Version 8 was up and running well. The call came to convert to table based partitioning and reuse the oldest partition. The ROTATE command was chosen to do this non-disruptive deed. Suddenly the phone began to ring and thick darkness covered the database cubicles. A quick check revealed that two parts were mired in REORP status. User processing ground to a halt their workloads were in peril. The database guardians countered with concurrent REORG to fix the problem. This crashed and burned. Share level NONE REORG was called upon. When it finished, the sun came out and life was good again. What happened to disrupt the peace of this database kingdom?
Why Did ROTATE Set REORP? Last Part Limitkey (‘2007’) The 1st time only, ROTATE converts Indexed Based partitioning to Table Based partitioning. Because limitkey is not enforced in Index Based, the 1st and last parts have to be put in REORP status to eliminate this potential issue. Table_IBP Maximum Limitkey in last partition is not enforced by “index based partitioning” SQLCODE = 0 Index Based Partitioning INSERT INTO tableX (D_YEAR) VALUES (‘2008’) Last Part Limitkey (‘2007’) Table_TBP SQLCODE = -327 Table Based Partitioning Don't Convert to Table Based Partitioning Using ROTATE!
Converting to Table-Controlled Partitioning ALTER INDEX clustering_indexNOT CLUSTER; (conversion to table-controlled partitioning) COMMIT WORK; ALTER INDEX clustering_index CLUSTER; (clustering index reestablished) It’s Simple to Do This Before Rotate Limitkey of Last Partition Will Be Converted to Extreme Value!
Lowering Limitkeys in Last Part Conversion to Table Based Partitioning Causes the last part limitkey to be MAXVALUE ALTER TABLE table ALTER PARTITION # ENDING AT (limitkeys); (This & next partition put in REORP, data outage!) REORG TABLESPACE tablespace SCOPE PENDING SHRLEVEL NONE STATISTICS COPYDDN (Data keys beyond limitkey, discarded during REORG!) Data Outage You Will Have An Outage to Alter Limitkey
Which Partition Number is Used? -DIS DB(db) SPACENAM(ts) NAMETYPEPARTSTATUS SRG9700 TS 0002 RW -THRU 0004 SRG9700 TS 0001 RW You Must Know Which Part To Use!
Advisable Data Outage Rotating Logical Partitions ?PP SELECT PARTITION, LOGICAL_PART FROM SYSIBM.SYSTABLEPART WHERE DBNAME = ‘db' AND TSNAME = ‘ts' AND LOGICAL_PART = 1 1st LP Logical Part = ?PPPhysical Part 1 2 UNLOAD TABLESPACE tablespace PART ?PP LOAD TABLESPACE tablespace PART ?PP REPLACE using dummy SYSREC -START DB(db) SPACE (ts) PART(?PP) ACCESS FORCE COPY TABLESPACE tablespace DSNUM ?PPSHRLEVEL CHANGE (rotated partition now recoverable) 3 4 ALTER TABLE table ROTATE PARTITION FIRST TO LAST ENDING AT (limitkeys) RESET; (1stLP data deleted, becomes last LP) 5 Advisable 6
Avoiding Data Outages New Tables – Use “table based partitioning” Last Partition – Don’t set max/min limitkey (may cause -327 SQLCODEs) Converting from “Indexed Based Partitioning” Don’t convert to table based partitioning with ROTATE Use …ALTER INDEX index NOT CLUSTER Then …ALTER INDEX index CLUSTER Plan for Outage on 1st ROTATE Query for values beyond limitkey before reorg ALTER ASC/DESC limitkey from max/min value Downtime REORG to remove REORP status Know Logical Partitions Prior to Rotate Query SYSIBM.SYSTABLEPART Outage Mitigating Rotate Issues
Rotate Dangers • Knowing which physical part is 1st logical part • Long running DELETEs to empty 1st logical part (42 secs to ROTATE / delete 1,000,000 row partition) • ROTATE can cause an outage (REORP status) (convert to table based partitioning or ALTER limitkeys) • Which part # to use for Commands / DDL / Utilities • Mistakenly rotating the wrong table (DDL reuse or finger fault) • Adding partitions to ROTATEd tables (Confusion factor on first & last parts) • With ascending keys, trying to insert null key inserts • Recoverability after a ROTATE • Attempting to REBALANCE ROTATEd tables
Which partition will rotate next? 1st Logical partition - query SYSIBM.SYSTABLEPART Does rotate interrupt availability? Yes – if indexed based partitioning (convert to table based, last 2 partitions in REORP) Yes – If last part limitkey is altered No – table based partitioning & limitkey doesn’t need to be altered Can rotate be blocked? Set MAXVALUE in limitkey of last partition Show we use rotate to Convert to Table Based Partitioning? Not advisable (last 2 partitions in REORP) Does rotate behave differently with Index Based Partitioning? Yes - the first time Does rotate rename datasets? No – does SQL deletes instead Does rotate delete SYSCOPY entries? No – puts a rotate row in SYSCOPY If accidentally rotate can I recover? No – rotate row in SYSCOPY prevents recovery prior to ROTATE The Rotate Questions
Extents Wide Open Chapter 4
Request 1 Track Request 1 Track Request 1 Track The Extent Concept VOL001 1st Extent 2nd Extent catg.DSNDB.db.ts.I0001.A001 VOL002 Extent Consolidation DB2 requests space from z/OS which finds blocks of space and updates the VTOC and Catalog. VOL003 3rd Extent Getting Allocations Is Relatively Slow
Once Upon A Time Life in the database kingdom was good. Autonomic features had eliminated many servant duties. One day a troubled user called. A table was broken and needed to be recovered. Luckily it had only had five million rows. We proclaimed that it would be back in merely moments. Tapes were mounted, disks were spinning and the clock was ticking. Five minutes turned to ten and then to fifteen. After having many discussions with management, the recovery finally finished. Why did a small table take so long to recover?
Extent Evolution Cylinder Track Block 20th Century DASD z/OS Rules z/OS Rules DASD Reality 21st Century DASD DASD Subsystem Constrained By z/OS Rules Not DASD Reality!
Logical Extent Limits Max Extents / Dataset 255 extents z/OS 1.6 7,257 extents z/OS 1.7 Max Extents / Volume 123 extents / volume Large DASD increase this issue (Mod 27s & 54s) Extent Rule 5 pieces primary extent Whatever can get on secondary Extent Limits We Must Follow z/OS Rules!
IBM Strategies Tolerate More Extents z/OS 1.7 – 7,257 extents Make It Harder to Hit Limits SMS Extent Consolidation Automate Extent Management V8 Sliding Secondaries Solving Systemic Extent Issues Some Problems Still Exist!
305 276 256 118 115 63 60 45 25 16 How Extents Affect Utilities (V8) Table 1.2 M Rows and 1 Index Elapsed Seconds Extents Affect Writing Utilities!
DASD Fragmentation STOGROUPs by size Standardized allocations z/OS Slow Allocations Fewer Extents Automated Method Use sliding secondaries Can cause fragmentation Managed Method STOGROUPs by size Standardized allocations From z/OS We Need Faster Allocation Search Faster Cataloging z/OS Avoiding Extent Issues
Beware Be Knowledgeable Be Careful Do Excellent Work DB2 War Stories And Scary Tales By DATABASE BOB Think Tank Publications In Closing Want to Hear More? Part 2 Is Up To You! Please Fill Out Your Evaluations
Robert Goodman Session: A11 DB2 War Stories and Scary Tales (Part 1) robert.goodman@flhosp.org Thanks for Coming...