530 likes | 730 Views
Database Monitoring With. Tom Bascom President, Greenfield Technologies. Agenda. Why do you need a monitor? Monitoring Alternatives What Are VSTs? A Monitoring Architecture Customizing And Extending The Code Basic Capabilities Advanced Features. Why Do You Need A Monitor?. Baselining
E N D
Database Monitoring With Tom Bascom President, Greenfield Technologies
Agenda • Why do you need a monitor? • Monitoring Alternatives • What Are VSTs? • A Monitoring Architecture • Customizing And Extending The Code • Basic Capabilities • Advanced Features
Why Do You Need A Monitor? • Baselining • Benchmarking • Interactive troubleshooting • Capacity management • Resource Optimization
Agenda • Why do you need a monitor? • Monitoring Alternatives • What Are VSTs? • A Monitoring Architecture • Customizing And Extending The Code • Basic Capabilities • Advanced Features
Monitoring Alternatives • SAR, vmstat, iostat • Glance, TOPAS, Navisphere, Measureware, PerfMon … • TOP, NMON • PROMON • Fathom • ProMonitor • ProTop!
Progress Focused • Interactive, Real-Time • Sample Oriented • Multi-platform • VST Based • 4GL Code • Open Source • Free!
Agenda • Why do you need a monitor? • Monitoring Alternatives • What Are VSTs? • A Monitoring Architecture • Customizing And Extending The Code • Basic Capabilities • Advanced Features
What Are VSTs? • Virtual System Tables • A 4GL View of Progress Data Structures (the same as those shown in PROMON.) • No Performance Impact (unless you do some really dumb things!) • Primarily Read-Only • Not Terribly “User Friendly” • Quirky at times…
Some VST Quirks • Updateable: • _startup._spin • Private buffers • APW settings • Table & Index Ranges: • -tablebase, -tablerangesize • -indexbase, -indexrangesize • Table & Index Window can be reset! • Quirky Keys: • _myconnection… • _tablestat & _indexstat
User Number/Id VST Confusion… find _myconnection no-lock. find _connect no-lock where _connect-usr = _myconn-userid. display _connect-usr _connect-id _myconn-userid. find _userio no-lock where _userio-usr = _connect-usr. display _userio-id _userio-usr. User-Id _Connect-Id MyConn-UserId _UserIO-Id Usr ======= =========== ============= =========== =========== 253 254 253 254 253
Table Stats /** This does NOT work if –tablebase <> 1!!! find _File no-lock where _File._File-num = p_tbl. find _TableStat no-lock where _TableStat-id = p_tbl. display p_tbl _file-num _TableStat-id. **/ /*** instead, use the following: ***/ find _TableStat no-lock where _TableStat-id = p_tbl. find _File no-lock where _File._File-num = _TableStat-id. display p_tbl _file-num _TableStat-id.
Index Name find _IndexStat no-lock where _IndexStat-id = p_idx. find _Index no-lock where _Index._Idx-num = _IndexStat-id. find _File where recid( _File ) = _Index._File-recid. tt_index.idxnote = _File._File-name + “.” + _Index._Index-name + ( if _file._prime-index = recid(_index) then “ P" else “ " ) + ( if _index._unique then "U" else "" )
Agenda • Why do you need a monitor? • Monitoring Alternatives • What Are VSTs? • A Monitoring Architecture • Customizing And Extending The Code • Basic Capabilities • Advanced Features
A Monitoring Architecture • VST Based • Multi-Platform • UNIX Character • HTML • Windows GUI • Using Publish & Subscribe • More than just a VST Browser! • Customizable!
Agenda • Why do you need a monitor? • Monitoring Alternatives • What Are VSTs? • A Monitoring Architecture • Customizing And Extending The Code • Basic Capabilities • Advanced Features
Customizing And Extending The Code • Events That A Module Handles • Structure Of A Module • Defining the Display • Maintaining State • Adding Help • Making A Module Available
Events That A Module Handles • Mon-Restart • Empty Temp-Table • Remove self from memory • Mon-Init • Empty Temp-Table • Define Display Data Elements • Mon-Update • Refresh Data • Calculate intervals, rates and so forth • Update UI Temp-Table with results
Structure Of A Module {lib/protop.i}def var support as character no-undo initial “Resources”.{lib/tt_xstat.i}procedure mon-restart: empty temp-table tt_xstat. delete procedure this-procedure.end.procedure mon-init:empty temp-table tt_xstat. /* define labels */end.procedure mon-update: /* the real work */end.subscribe to “mon-restart” anywhere run-procedure “mon-restart”.subscribe to “mon-init” anywhere run-procedure “mon-init”.subscribe to “mon-update” anywhere run-procedure “mon-update”.publish “register-disp-type” ( input support ).
Defining the Display ui-define-label( support, 1, 1, "xid", " Id" ). ui-define-label( support, 1, 2, "xname", "Resource " ). ui-define-label( support, 1, 5, "stat1", " Locks" ). ui-define-label( support, 1, 6, "stat2", " Waits" ). ui-define-label( support, 1, 8, "stat-ratio", " Lock%" ). ui-define-label( support, /* display type */ 1, /* variant */ 8, /* order */ "stat-ratio", /* data element name */ " Lock%“ /* label value */ ).
Maintaining State define temp-table tt_xstat no-undo field xid as integer field xvalid as logical field xname as character field misc1 as character field misc2 as character field stat1 as integer extent 5 field stat2 as integer extent 5 field stat3 as integer extent 5 field stat-ratio as decimal index xid-idx is unique primary xid.
Sample, Summary, Rate & Raw Data • BaseValue • LastValue • ThisValue • SampleTime • SummaryTime • SampleRate = (ThisValue – LastValue) / SampleTime. • SummaryRate = (ThisValue –BaseValue) /SummaryTime. • SampleRaw = (ThisValue – LastValue) / 1. • SummaryRaw = (ThisValue – BaseValue) / 1.
Updating Data for each dictdb._Resrc no-lock: run update_xstat ( input _Resrc-Id, input _Resrc-name, input "", input "", input _Resrc-lock, input _Resrc-wait, input 0 ). end. ui-det(support, 1, i, 1, "xid", string(tt_xstat.xid, ">>9")). ui-det(support, 1, i, 2, "xname", string(tt_xstat.xname, "x(20)")). ui-det(support, 1, i, 5, "stat1", string((tt_xstat.stat1[x]/z), ">>>>>>>>>9")). ui-det(support, 1, i, 6, "stat2", string((tt_xstat.stat2[x]/z), ">>>>>>>>>9")). ui-det(support, 1, i, 8, "stat-ratio", string(tt_xstat.stat-ratio, ">>9.99%")).
Adding Help • Help files are in the “hlp” directory. • File name is value(“hlp/” + support + “.hlp”) • Title the screen. • Provide an overview of the screen. Try to explain why the metrics are important and how they are related to other metrics. • Define each label and give some insight into its meaning. • Provide explanations of any codes that might appear under a label.
FileIO.hlp IO Operations to Database Extents Id:The extent id number. Extent Name:The file name of the extent. Mode: The "mode" in which the file is opened. Possible values are: BUFIOThe extent is opened for buffered IO. UNBUFIOThe extent is opened for un-buffered IO. BOTHIOThe extent is opened for both buffered and un-buffered IO. Variable extents are opened with BOTHIO (there are two file descriptors unless you're using -directio.) BlkSz: The Block size for the extent. This potentially varies between data, before-image and after-image extents. Values are expressed in bytes.
Making A Module Available • Drop it into the “mon/” directory. • “mon/mymetric.p” • If it is OS specific use the “os/” directory • “os/AIX/df.p” • “os/Linux/netstat.p” • Send me a copy so that I can include it in the base distribution!
Agenda • Why do you need a monitor? • Monitoring Alternatives • What Are VSTs? • A Monitoring Architecture • Customizing And Extending The Code • Basic Capabilities • Advanced Features
Basic Capabilities • Summary Data • Blocked Clients & Open Transactions • Table & Index Activity • User Activity • Estimating Big B • Latches & Resources • Storage Area Capacity • Balancing IO • Clients & Servers
Summary Data 11:32:52 ProTop xvi -- Progress Database Monitor 07/05/05 Sample sports [/db/sports] Rate Hit Ratio: 182:1 195:1 Commits: 149 195 Sessions: 2057 Miss% : 0.549% 0.512% Latch Waits: 13 16 Local: 953 Hit% : 99.45% 99.48% Tot/Mod Bufs: 60002 3167 Remote: 956 Log Reads: 76342 80927 Evict Bufs: 2 1 Batch: 1045 OS Reads: 419 414 Lock Table: 1516 3 Server: 97 Rec Reads: 23789 23619 LkHWM|OldTrx: 1392 00:00 Other: 51 Log/Rec: 3.2091 3.4264 Old/Curr BI: 54 54 TRX: 26 Area Full: 1 98.60% After Image: Disabled Blocked: 0
BI Clusters for each _Trans no-lock where _Trans-usrnum <> ?: if _Trans-counter <> ? and _Trans-counter > 0 then do: if oldbi = 0 or _Trans-counter < oldbi then oldbi = _Trans-counter. currbi = max( currbi, _Trans-counter ). end. end. find _BuffStatus no-lock. currbi = _BfStatus-LastCkpNum. if oldbi = 0 then oldbi = currbi. /* if no TRX is active… */
Blocked Sessions Blocked Sessions Usr Name Waiting Note --- -------- -------- ---------------------------------- 24 tom 00:00:32 REC XQH 102 [Order] julia, peter 22 tucker 00:00:02 REC XQH 201 [Cust] astro, tiger 321 julia 00:00:00 BKSH:83524928:
Locked Records for each _Lock no-lock while _Lock-usr <> ?: if _Lock-recid = _Connect-wait1 then do: find _file where _file._file-num = _Lock-table. bxtbl = _file._file-name. end. if _Lock-usr = _Connect-usr then bxwait = bxwait + “ “ + _Lock-flags. else bxque = bxque + " " + _Lock-name. end. bxnote = bxtbl + bxwait + bxque.
Open Transactions Open Transactions Usr Name TRX Num BI Clstr Start Trx Stat Duration Wait ---- ----- -------- -------- -------- -------- -------- ---------- 9 tom 2432897 1024 15:39:05 ACTIVE 00:00:01 -- 29440 20 jami 2432896 - ALLOCATE 00:00:00 -- 20115 5 emily 2432898 1024 15:39:06 ACTIVE 00:00:00 -- 21952 7 peter 2432899 1024 15:39:06 ACTIVE 00:00:00 -- 19040 23 julia 2418661 - ALLOCATE 00:00:00 -- 0 22 astro 2417938 - ALLOCATE 00:00:00 -- 0
Table Activity Table Statistics Tbl# Table Name Create Read Update Delete ---- ---------------- --------- --------- --------- --------- 4 OrderLine 1 28715 11 1 18 Order 0 2384 1 0 24 POLine 0 848 1 0 23 PurchaseOrder 0 627 40 0 21 Bin 0 216 0 0 2 Customer 18 175 20 20 1 Invoice 1 148 3 0
Index Activity Index Statistics Idx# Index Name Create Read Split Delete BlkDel ---- --------------- -- ------ ------ ------ ------ ------ 904 usage 14 31597 0 13 0 78 journal P 0 21011 0 0 0 435 keyindex 0 7376 0 0 0 388 icest PU 0 1995 0 0 0 1251 keyindex 0 1991 0 0 0 1247 warehs U 0 945 0 0 0 900 stuff PU 1 783 0 1 0
User IO Activity UIO Usr Name Flags PID DB Access OS Rd OS Wr Hit% ---- ------- ----- ------ --------- ----- ----- ------- 13 tom SB 13590 2266 200 1 91.13% 10 jami SB 13584 190 6 1 97.10% 16 julia SB 13596 185 6 1 97.03% 17 peter SB 13598 181 5 1 97.07% 15 emily SB 13594 177 5 1 97.12% 11 tiger SB* 13586 166 4 0 97.58% 14 tucker SB 13592 159 5 1 97.10% 19 granite SB 13602 146 1 0 99.25% 7 astro SB 13578 145 4 1 97.16%
Estimating Big B Big B GuessTimator Pct Big B % db Size Hit:1 Miss% Hit% OS Rd ----- --------- --------- ----- ------ ------- ----- 10% 6000 0.124% 30 3.306% 96.694% 1343 25% 15001 0.311% 48 2.091% 97.909% 849 50% 30001 0.622% 68 1.479% 98.521% 601 100% 60002 1.243% 96 1.046% 98.954% 425 <= 150% 90003 1.865% 117 0.854% 99.146% 347 200% 120004 2.486% 135 0.739% 99.261% 300 400% 240008 4.973% 191 0.523% 99.477% 213
Big B http://www.peg.com/lists/dba/history/200301/msg00509.html MissPct = 100 * ( 1 – ( LogRd – OSRd ) / LogRd )). HitPct = 100 – MissPct. OSRd = LogRd * ( MissPct / 100 ). m2 = m1 * exp(( b1 / b2 ), 0.5 ).
Resource Waits Resource Waits Id Resource Locks Waits Lock% --- -------------------- ---------- ---------- ------- 10 DB Buf S Lock 2661 0 100.00% 6 Record Get 658 0 100.00% 7 DB Buf Read 40 0 100.00% 2 Record Lock 21 0 100.00% 11 DB Buf X Lock 11 0 100.00% 19 TXE Share Lock 11 0 100.00% 8 DB Buf Write 3 0 100.00% 21 TXE Commit Lock 2 0 100.00% 1 Shared Memory 0 0 0.00% 3 Schema Lock 0 0 0.00%
Latch Waits Latch Waits Id Latch Requests Waits Lock% --- -------------------- ---------- ---------- ------- 28 MTL_BF4 5540 33 99.40% 17 MTL_BHT 4205 106 97.49% 21 MTL_LRU 4154 55 98.68% 10 MTL_LHT 1800 24 98.65% 15 MTL_LKF 1798 0 100.00% 26 MTL_BF2 1218 6 99.48% 27 MTL_BF3 1184 10 99.13% 25 MTL_BF1 1150 10 99.16% 4 MTL_OM 913 4 99.60%
Storage Area Capacity Area Statistics A# Area Name Alloc Var Hi Water Free %Used Note -- ------------ ------- ----- -------- ------ ------- ------- 68 order_idx 16 1998 1927 87 12044% i(3) 67 order 256 14670 14860 66 5805% t(1) 6 Schema Area 256 1454 1391 319 543% i(25) * 3 BI Area 32000 13070 45056 14 141% 13 customer 512000 55565 567515 50 111% t(15) 92 After Image 0 5199 5191 8 100% Busy 49 order-line 32000 2 25164 6838 79% t(1) 61 inventory 128000 2 94897 33105 74% t(1) 55 discount 1024000 0 755885 268114 74% t(1) 57 employee 2048000 0 1442919 605076 70% t(1)
Storage Area Capacity for each _AreaStatus no-lock, _Area no-lock where _Area._Area-num = _AreaStatus._AreaStatus-Areanum: bfree = _AreaStatus-Totblocks - _AreaStatus-Hiwater. if ( _AreaStatus-Freenum <> ? ) then bfree = bfree + _AreaStatus-Freenum. if bfree = ? then bfree = _AreaStatus-totblocks. used = (( _AreaStatus-totblocks - bfree) / _AreaStatus-totblocks ) * 100. end.
Storage Area Contents for each _storageobject no-lock where _storageobject._area-number = xid and _storageobject._object-num > 0 and _storageobject._object-associate > 0: if _storageobject._object-type = 1 then so_tbl = so_tbl + 1. else if _storageobject._object-type = 2 then so_idx = so_idx + 1. end. /* ianum = initial area number… */
Balancing IO Database File IO Id Ext Name Mode Blksz Size Read Wrt Ext ---- ---------- ------- ----- ------- ----- --- --- 63 s2k_29.d1 F UNBUF 8192 2048000 11828 0 0 64 s2k_29.d2 F UNBUF 8192 2048000 7790 0 0 124 s2k_55.d2 F UNBUF 8192 2048000 432 0 0 125 s2k_55.d3 F UNBUF 8192 2048000 367 8 0 123 s2k_55.d1 F UNBUF 8192 2048000 220 0 0 67 s2k_30.d1 F UNBUF 8192 2048000 106 0 0 57 s2k_26.d1 F UNBUF 8192 1024000 26 2 0 128 s2k_56.d1 F UNBUF 8192 2048000 19 1 0 135 s2k_57.d6 F UNBUF 8192 2048000 12 0 0 140 s2k_58.d2 F UNBUF 8192 1024000 11 1 0 121 s2k_54.d1 F UNBUF 8192 256000 7 0 0 139 s2k_58.d1 F UNBUF 8192 1024000 6 0 0 134 s2k_57.d5 F UNBUF 8192 2048000 5 0 0 69 s2k_31.d1 F UNBUF 8192 128000 4 0 0 73 s2k_33.d1 F UNBUF 8192 128000 3 0 0 3 s2k.b2 V UNBUF 16384 0 0 0 0
Servers and Clients Servers Srv Type Port Con Max MRecv MSent RRecv RSent QSent Slice --- ----- ----- --- --- ----- ----- ----- ----- ----- ----- 1 Login 7150 0 1 0 0 0 0 0 0 2 Auto 1026 10 55 0 0 0 0 0 0 3 Auto 1027 10 55 23 13 0 6 10 86 Server IO Srv Type Port Con Max DB Access OS Rd OS Wr Hit% --- ----- ---- --- --- --------- ----- ------ ------- 19 Auto 1043 10 55 5041 2 0 99.96% 20 Auto 1044 10 55 1348 1 0 99.96% 18 Auto 1042 10 55 157 1 0 99.51% 16 Auto 1040 10 55 42 1 0 98.70%
Agenda • Why do you need a monitor? • Monitoring Alternatives • What Are VSTs? • A Monitoring Architecture • Customizing And Extending The Code • Basic Capabilities • Advanced Features
Drill Down User Details Usr#: 23 Name: tom PID: 18570 Device: /dev/pts/3 Transaction: Jul 7 15:20:36 2005 ACTIVE 00:00:45 REC 5892 Blocked On: REC XQH 5892 [Customer] peter User 23's Other Sessions Usr Name Flags PID DB Access OS Rd OS Wr Hit% ---- ------- ----- ------ ---------- ------- ------ ------- 23 tom S * 18570 9 2 0 81.61% 0 tom O 18017 0 0 0 0.00% 22 tom S 18542 8534 134 15 98.43% 24 tom S 18576 3964 64 31 98.38%
Alerts & Alarms # $PROTOP/etc/alert.cfg # # Metric Type ? Target Message Action # ========= ==== == ====== =========== ==================== LogRd num > 100000 "&1 &2 &3" alert-log OSRd num > 500 "&1 &2 &3" alert-log BufFlsh num > 0 "&1 &2 &3" alert-log,alert-mail Trx num > 200 "&1 &2 &3" alert-log,alert-mail LatchTMO num > 200 "&1 &2 &3" alert-log,alert-mail ResrcWt num > 200 "&1 &2 &3" alert-log,alert-mail