970 likes | 1.26k Views
Problem Determination Techniques for IBM-Informix Dynamic Server. John F. Miller III. Las Vegas, NV October 27 – October 31, 2003. Member of Technical Support for 14 Years Resolution Team Engineer Author of many Informix support tools Written many of the new features
E N D
Problem Determination Techniques for IBM-Informix Dynamic Server John F. Miller III Las Vegas, NV October 27 – October 31, 2003
Member of Technical Support for 14 Years Resolution Team Engineer Author of many Informix support tools Written many of the new features Heavily involved with OnLine Beta programs 4.0, 4.1, 5.0, MT5.0, 6.0, 7.0, 7.3 Worked on many Elite VAR Benchmarks John F. Miller III
Assertion Failure Files onstat New options Changes Admin Gotcha’s Optimizer & Update Statistics Understanding I/O Network Setup Optimization & Tuning Btree Cleaner Long Transaction & Dynamic Log Files OutLine
Three types CRASH FAILURE WARNING Two parts to an assertion failure Top generated by the server Bottom generated by SYSALARMPROGRAM Assertion Failures
Top of an AF File 12:23:45 Informix Dynamic Server Version 9.40.U Software Serial Number RDS#N000000 12:23:45 Assert Failed: No Exception Handler 12:23:45 Who: Session(13, informix@superman.portland.ibm.com, 13541, 182474584) Thread(37, sqlexec, add5838, 1) File: mtex.c Line: 420 12:23:45 Results: Exception Caught. Type: MT_EX_OS, Context: mem 12:23:45 Action: Please notify Informix Technical Support. 12:23:45 Stack for thread: 37 sqlexec base: 0x0b463000 len: 36864 pc: 0x008693bc tos: 0x0b46ae50 state: running vp: 1
Bottom Section of an AF File -------------------------------- Begin System Alarm Program Output --------------------------------- Assertion Failure Type: CRASH Host Name: superman.portland.ibm.com Database Server Name: jcpatriot Time of failure: Fri Sep 12 12:23:46 PDT 2003 AF file: /tmp/af.40d1d40 Shared memory file: None System Blocking: OFF ===========------------- - - - - - -
Bottom Section of an AF File /vobs/tristarp/sqldist/bin/onstat -g ses 13: Informix Dynamic Server Version 9.40.U -- On-Line -- Up 00:15:42 -- 28672 Kbytes session #RSAM total used dynamic id user tty pid hostname threads memory memory explain 13 informix 2 13541 superman 1 40960 35936 off tid name rstcb flags curstk status • sqlexec add5838 ---P--- 4440 running sqscb info scb sqscb optofc pdqpriority sqlstats optcompind directives b3b5bd0 b477018 0 0 0 2 1 Sess SQL Current Iso Lock SQL ISAM F.E. Id Stmt type Database Lvl Mode ERR ERR Vers Explain 13 INSERT jc NL Not Wait 0 0 9.03 Off Current SQL statement : insert into jctab values (12)
Allow the system to continue even when a thread has an assertion failure (crash, failure or warning) If the thread who crashes is not hold a critical resource then only this thread is impacted, the rest of the system continues Thread Suspension
What the current setting means check $INFORMIXDIR/etc/evidence.sh ONCONFIG parameters which can be set Thread Suspension /vobs/tristarp/sqldist/bin/onstat -g ras: Informix Dynamic Server Version 9.40.U -- On-Line -- Up 00:15:46 -- 28672 Kbytes Diagnostics Current Settings AFCRASH 0x00000421 AFFAIL 0x00000401 AFWARN 0x00000401 AFDEBUG 0 AFLINES 0 Adding 0x20 into AFCRASH attempt’s to hang the current thread allowing the system to continue
New options recently added Useful changes John’s favorite options Monitoring & tuning the different caches and buffer pool using onstat onstat
onstat –g env User/System evironment onstat –g stm Statement memory usage onstat –g rwm Read/Write Mutex onstat –C Btree Scanner onstat –g cac Display various caches New onstat Options
onstat –g env [sid] Server Environment Session Environment cmd> onstat -g env Server start-up environment: Variable Value [values-list] DBDELIMITER | DBPATH . DBTEMP /tmp cmd> onstat -g env 147 Environment for session 147: Variable Value [values-list] CLIENT_LOCALE en_US.8859-1 DBDELIMITER | DBPATH //jmiller_94 DBTEMP /usr/tmp
Shows all sql statements currently active Shows the amount of memory each SQL statement currently using in bytes onstat –g stm command> onstat -g stm session 16 -------------------------------------- sdblock heapsz statement ('*' = Open cursor) bca2018 11936 *select * from customer, orders session 14 --------------------------------------- sdblock heapsz statement ('*' = Open cursor) bc4d018 2048 Database 'insdel' bc4d140 8496 select count(*) from informix.systables; bc4d268 6264 *select * from t2
onstat –P • partnum • Decimal number link to systables:partnum • Total • Number of pages currently buffered for this table/fragment • Data • Number of data page currently buffered for this table/fragment Buffer Pool System Statistics • Data • Percentage of the buffer pool containing data pages • Btree • Percentage of the buffer pool containing index pages • Other • Control pages, such as, reserve pages,bitmap partnum total btree data other dirty 1048707 10 6 2 2 0 1048710 4 1 1 2 0 1048714 3 1 1 1 0 1049081 257 0 255 2 0 1049472 5462 0 5459 3 4635 Totals: 9000 52 5747 3201 4636 Percentages: Data 72.86 Btree 25.58 Other 01.57 Ensure the ratio of System Data/Btree pages looks reasonable for your workload
Influence Table Buffer Ratios • Memory resident tables • Tells the buffer manager to lock pages from a table, index or single fragment into memory • Pages for this table will not be removed from the buffer pool
onstat changes • onstat –m • Displays the message file when the server is offline • onstat –c • Displays the configuration file when the server is offline
Useful onstat Options • onstat –P Buffers by Table • onstat –g tpf Actions by Thread • onstat –g ppf Actions by Table
Buffer Cache Stores commonly accessed data/index pages from disk Data Dictionary Cache used to store commonly accessed system catalog information Data Distributions Cache to store update statistics medium/high data used by the optimizer to build query plans Stored Procedure Cache Caches stored procedure code Extensible Caches Monitoring Different Caches
System Read Cache Increasing buffers improves read cache rate 90% or better is a good read cache System Write Cache Increasing buffers may not help the write cache LRU parameters and CKPTINTVL greatly affect the write cache Buffer Cache Sizing (onstat –p) Profile dskreads pagreads bufreads %cached dskwrits pagwrits bufwrits %cached 78201 141506 1538858 94.92 14480 23668 298884 95.16
Shows the tables in the data dictionary Total number of current entries Currently 2 Total number of available cache locations Currently 31 * 10 = 310 Data Dictionary Cache onstat -g dic Dictionary Cache: Number of lists: 31, Maximum list size: 10 list# size refcnt dirty? heapptr table name -------------------------------------------------------- 11 1 1 no a2ecf88 d1@olympia:jmiller.t2 25 1 0 no a2ec330 d1@olympia:informix.systables Total number of dictionary entries: 2
Cache used to store update statistics medium or high information Data Distributions Cache onstat –g dsc Distribution Cache: Number of lists : 31 PC_POOLSIZE : 50 Number of entries : 2 Number of entries in use : 0 Distribution Cache Entries: list# id ref_cnt dropped? heap_ptr distribution name ----------------------------------------------------------------- 15 0 0 0 abe2c20 stores@oly:customer.lname 7 0 0 0 abe3820 stores@oly:customer.company Total number of distribution entries: 2.
Stored Procedure Cacheonstat –g prc Stored Procedure Cache: Number of lists : 31 PC_POOLSIZE : 50 Number of entries : 3 Number of inuse entries : 0 Stored Procedure Cache Entries: list# id ref_cnt dropped? heap_ptr procedure name -------------------------------------------------------------- 4 1 0 0 abe3c20 stores7@olympia:informix.systdist 22 2 0 0 a2f0c58 sysmaster@olympia:informix.bitval 14 3 0 0 a2f0458 sysmaster@olympia:informix.l2date Total number of procedure entries: 3.
Extensible Caches • User Defined Routines • Routine Cache • Type Cache • Cast Cache • OpClass • Access Method Cache • Secondary Access Method Cache • User Defined Aggregate Cache
Actions by Table onstat -g ppf Partition profiles partnum lkrqs lkwts dlks touts isrd iswrt isrwt isdel bfrd bfwrt seqsc 9437341 11 0 0 0 0 4 0 0 26 14 0 9437394 205 0 0 0 63 11 0 8 318 129 6 10485999 15 0 0 0 0 6 0 0 29 18 0 11534359 5 0 0 0 0 1 0 0 18 8 0 6291561 0 0 0 0 1854 0 0 0 3605 0 103 • Table • Locks • SQL like Actions • Buffers
Actions by Table onstat -g ppf Partition profiles partnum lkrqs lkwts dlks touts isrd iswrt isrwt isdel bfrd bfwrt seqsc 9437341 11 0 0 0 0 4 0 0 26 14 0 9437394 205 0 0 0 63 11 0 8 318 129 6 10485999 15 0 0 0 0 6 0 0 29 18 0 11534359 5 0 0 0 0 1 0 0 18 8 0 6291561 0 0 0 0 1854 0 0 0 3605 0 103 • partnum • Decimal number linked to systables:partnum • lkrqs LocK ReQuestS • Number of times a lock has been acquired on the table • lkwts LocK WaiTS • Number of times a user had to wait when acquiring a lock on this table • Dlks DeadLocKS
Actions by Table onstat -g ppf Partition profiles partnum lkrqs lkwts dlks touts isrd iswrt isrwt isdel bfrd bfwrt seqsc 9437341 11 0 0 0 0 4 0 0 26 14 0 9437394 205 0 0 0 63 11 0 8 318 129 6 10485999 15 0 0 0 0 6 0 0 29 18 0 11534359 5 0 0 0 0 1 0 0 18 8 0 6291561 0 0 0 0 1854 0 0 0 3605 0 103 • isrd ISam Reads • Number of times a rows has been read from this table • iswrt ISam WriTes • Number of calls to insert a row into this table • isrwt ISam ReWriTes • Number of calls to update a row in this table • Isdel ISam DELetes • Number of calls to delete a row from this table
Actions by Table onstat -g ppf Partition profiles partnum lkrqs lkwts dlks touts isrd iswrt isrwt isdel bfrd bfwrt seqsc 9437341 11 0 0 0 0 4 0 0 26 14 0 9437394 205 0 0 0 63 11 0 8 318 129 6 10485999 15 0 0 0 0 6 0 0 29 18 0 11534359 5 0 0 0 0 1 0 0 18 8 0 6291561 0 0 0 0 1854 0 0 0 3605 0 103 • bfrd BuFfer ReaDs • Number of individual page reads from this table • bfwrt BuFfer WRites • Number of times a page in this table has been modified • seqsc SeQuential SCans • Number of sequential table scans Watch the ratio of bfrd/seqsc as an indicator of a poorly indexed table
User Thread Locks Log Records Actions by User/Thread onstat -g tpf tid lkreqs lkw dl to lgrs isrd iswr isrw isdl isct isrb lx bfr bfw lsus lsmx seq 42 35 0 0 0 408 0 0 0 0 11 0 0 207 128 856 120 0 48 35110 0 0 0 32 0 0 0 0 14 0 0 638 2294 0 208 0 3873 4646 0 0 0 0 1469 89 0 0 0 0 0 6683 1080 0 0 2 3876 498 0 0 0 0 93 25 0 2 0 0 0 535 200 0 0 16 3648 1946 0 0 0 68 6266 42 4 1 5 0 0 4238 495 0 4488 10 • SQL Like Actions • Transactions • Buffers
Actions by User/Thread onstat -g tpf tid lkreqs lkw dl to lgrs isrd iswr isrw isdl isct isrb lx bfr bfw lsus lsmx seq 42 35 0 0 0 408 0 0 0 0 11 0 0 207 128 856 120 0 48 35110 0 0 0 32 0 0 0 0 14 0 0 638 2294 0 208 0 3873 4646 0 0 0 0 1469 89 0 0 0 0 0 6683 1080 0 0 2 3876 498 0 0 0 0 93 25 0 2 0 0 0 535 200 0 0 16 3648 1946 0 0 0 68 6266 42 4 1 5 0 0 4238 495 0 4488 10 • tid Thread ID • Each user can have one or more threads, the TID has a many-to-one relationship to the user's Session ID
Actions by User/Thread onstat -g tpf tid lkreqs lkw dl to lgrs isrd iswr isrw isdl isct isrb lx bfr bfw lsus lsmx seq 42 35 0 0 0 408 0 0 0 0 11 0 0 207 128 856 120 0 48 35110 0 0 0 32 0 0 0 0 14 0 0 638 2294 0 208 0 3873 4646 0 0 0 0 1469 89 0 0 0 0 0 6683 1080 0 0 2 3876 498 0 0 0 0 93 25 0 2 0 0 0 535 200 0 0 16 3648 1946 0 0 0 68 6266 42 4 1 5 0 0 4238 495 0 4488 10 • lgrs LoG Records • The number of log records created by this thread • lsus Log Space Used • The number of bytes of logical log space used by this thread • lsmx Log Space Maximum • The space used by the thread's largest transaction • A way of finding users who are on the verge of creating a long transaction
Actions by User/Thread onstat -g tpf tid lkreqs lkw dl to lgrs isrd iswr isrw isdl isct isrb lx bfr bfw lsus lsmx seq 42 35 0 0 0 408 0 0 0 0 11 0 0 207 128 856 120 0 48 35110 0 0 0 32 0 0 0 0 14 0 0 638 2294 0 208 0 3873 4646 0 0 0 0 1469 89 0 0 0 0 0 6683 1080 0 0 2 3876 498 0 0 0 0 93 25 0 2 0 0 0 535 200 0 0 16 3648 1946 0 0 0 68 6266 42 4 1 5 0 0 4238 495 0 4488 10 • isct ISam CommiT • The number of commits • isrb ISam RollBack • The number of times this thread has rolled back a transaction • lx Long Transactions • The number of long transactions perpetrated by this thread
Actions by User/Thread onstat -g tpf tid lkreqs lkw dl to lgrs isrd iswr isrw isdl isct isrb lx bfr bfw lsus lsmx seq 42 35 0 0 0 408 0 0 0 0 11 0 0 207 128 856 120 0 48 35110 0 0 0 32 0 0 0 0 14 0 0 638 2294 0 208 0 3873 4646 0 0 0 0 1469 89 0 0 0 0 0 6683 1080 0 0 2 3876 498 0 0 0 0 93 25 0 2 0 0 0 535 200 0 0 16 3648 1946 0 0 0 68 6266 42 4 1 5 0 0 4238 495 0 4488 10 • isrd ISam Reads • Number of times a row has been read from this table • Iswr ISam WRites • Number of calls to insert a row into this table • Isrw ISam ReWrites • Number of calls to update a row in this table • Isdl ISam DeLetes • Number of calls to delete a row from this table
Actions by User/Thread onstat -g tpf tid lkreqs lkw dl to lgrs isrd iswr isrw isdl isct isrb lx bfr bfw lsus lsmx seq 42 35 0 0 0 408 0 0 0 0 11 0 0 207 128 856 120 0 48 35110 0 0 0 32 0 0 0 0 14 0 0 638 2294 0 208 0 3873 4646 0 0 0 0 1469 89 0 0 0 0 0 6683 1080 0 0 2 3876 498 0 0 0 0 93 25 0 2 0 0 0 535 200 0 0 16 3648 1946 0 0 0 68 6266 42 4 1 5 0 0 4238 495 0 4488 10 • lkreqs LocK REQuestS • Number of times a lock has been acquired by this thread • lkwts LocK WaiTS • Number of times a user had to wait when acquiring a lock • dlks DeadLocKS • Number of Deadlock errors encountered by this user • to lock TimeOuts • Number of times a user waited on a lock, but failed to acquire the lock in the time allotted
Actions by User/Thread onstat -g tpf tid lkreqs lkw dl to lgrs isrd iswr isrw isdl isct isrb lx bfr bfw lsus lsmx seq 42 35 0 0 0 408 0 0 0 0 11 0 0 207 128 856 120 0 48 35110 0 0 0 32 0 0 0 0 14 0 0 638 2294 0 208 0 3873 4646 0 0 0 0 1469 89 0 0 0 0 0 6683 1080 0 0 2 3876 498 0 0 0 0 93 25 0 2 0 0 0 535 200 0 0 16 3648 1946 0 0 0 68 6266 42 4 1 5 0 0 4238 495 0 4488 10 • Monitor the ratio of all SQL like actions over buffer actions. This gives an indication of how much work is done for each row processed • For OLTP users this ratio will be low. For DW users this ratio should be higher. • A high ratio can mean indexing or optimizer issues (isrd + iswr + isrw + isdl) / (bfr + bfw)
One Location to set all Informix environment variables DBA sets defaults for the entire OnLine System $INFORMIXDIR/etc/informix.rc User’s defaults for all his/her sessions $HOME/.informix Check format of files with checkenv $INFORMIXDIR/bin/checkenv Using informix.rc to Establish Administrator Defaults
Minimum of 2 shared memory segments, but typically 3-8 required for normal operations Generally set SHMMAX to the amount of memory OnLine is expected to use or more Kernel ResourcesShared Memory Performance Tip Try to minimize the number of shared memory segments
Kernel ResourcesFile Descriptors • Number of files per process • Key considerations • Networked systems • KAIO • Max open files across entire host • Key considerations • AIO VPS • Error messages are not always reported
Restartable vs. Suspended Restore • Restartable Restore • When the database engine prematurely shuts down during a restore, it may be restarted in recovery mode • By default it is OFF. TURN IT ON. • Suspended Restore • When the archive client receives an error which is restartable and the database engine does not shut down
Optimizer & Update Statistics • Identifying suspect queries • Checking optimizer decisions • Starting point for running update statistics • Running update statistics faster
Find all tables doing sequential scans having a size greater than 100 pages Tables not Indexed Correctly select dbsname, tabname , pf_seqscans , npdata, npused from sysptntab, systabnames, sysptnhdr where pf_seqscans > 0 and sysptnhdr.npdata > 100 and sysptnhdr.npused > 100 and sysptnhdr.partnum = systabnames.partnum and systabnames.partnum = sysptntab.partnum order by pf_seqscans DESC
These consume a large quantity of resources so validate they are correct Finding Expensive SQL Statements select dbsname, tabname , pf_seqscans , npdata, npused from sysptntab, systabnames, sysptnhdr where pf_seqscans > 0 and sysptnhdr.npdata > 16 and sysptnhdr.npused > 16 and sysptnhdr.partnum = systabnames.partnum and systabnames.partnum = sysptntab.partnum order by pf_seqscans DESC
Data 20280 rows “C1” serial column What to examine Number of rows returned Access Method Cost Examining the Running Query QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 20888 Estimated # of Rows Returned: 6760 1) miller3.t1: SEQUENTIAL SCAN Filters: miller3.t1.c1 > 20200
Update Statistics Medium VS High The better the information, the more accurate the estimate.
Examining the Running QueryNo Statistics VS Medium Statistics No Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 20888 Estimated # of Rows Returned: 6760 1) miller3.t1: SEQUENTIAL SCAN Filters: miller3.t1.c1 > 20200 Medium Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 21 Estimated # of Rows Returned: 19 1) miller3.t1: INDEX PATH (1) Index Keys: c1 (Serial, fragments: ALL) Lower Index Filter: t1.c1 > 20250 Overall performance improved The estimates were more accurate The query plan changed Estimated Cost Huge drop in Estimated Cost Access Method Changed from Sequential Scan to Index Path Total Row Returned Actual 30 estimated 6760 for no statistics 19 for medium
Examining the Running QueryMedium Statistics VS High Statistics Medium Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 21 Estimated # of Rows Returned: 19 1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250 High Statistics QUERY: ------ select * from t1 where c1 > 20200 Estimated Cost: 33 Estimated # of Rows Returned: 30 1) miller3.t1: INDEX PATH (1) Index Keys: c1 Lower Index Filter: t1.c1 > 20250 Overall performance did not change The estimates were slightly more accurate The query plan did not change Total Row Returned Actual 30, estimated 19 for medium statistics, 30 for high statistics The estimated from medium is fairly close Access Method No change in Access Method Estimated Cost No significant difference
Update statistics HIGH Lead keys of index Keys of an index which are duplicated and the first unique key All columns that use “=“ as filters All join columns Update statistics MEDIUM All other columns Update statistics LOW All indexed columns for which you did not run update statistics HIGH General Rules for Running Update Statistics
Turn on PDQ when running update statistics, but only for tables Avoid PDQ when updating statistics for procedures When running high or medium increase the memory update statistics has to work with Enable parallel sorting (i.e. PSORT_NPROCS) Considerations Change the RESOLUTION to 1.5 Increasing the number of bins for the distributions Increasing the sample size for update statistics medium Tuning with the New Statistics
Table: jmiller.t9 Mode: HIGH Number of Bins: 267 Bin size 1082 Sort data 101.4 MB Sort memory granted 4.0 MB Estimated number of table scans 10 PASS #1 c9 PASS #2 c5 PASS #3 c7 PASS #4 c6 ….. PASS #10 c4 Completed pass 1 in 0 minutes 24 seconds Completed pass 2 in 0 minutes 20 seconds Completed pass 3 in 0 minutes 17 seconds Completed pass 4 in 0 minutes 17 seconds Completed pass 5 in 0 minutes 17 seconds Completed pass 6 in 0 minutes 15 seconds Completed pass 7 in 0 minutes 14 seconds Completed pass 8 in 0 minutes 15 seconds Completed pass 9 in 0 minutes 16 seconds Completed pass 10 in 0 minutes 14 seconds Example of the current update statistics Total Time 146 seconds
Table: jmiller.t9 Mode: HIGH Number of Bins: 267 Bin size 1082 Sort data 101.4 MB PDQ memory granted 106.5 MB Estimated number of table scans 1 PASS #1 c1,c2,c3,c4,c5,c6,c7,c8,c9,c10 Index scans disabled Light scans enabled Completed pass 1 in 0 minutes 29 seconds Enabling PDQ with Update Statistics PDQ Memory Features Enabled Total Time 29 seconds