610 likes | 621 Views
An expert system in SAS to evaluate z/OS system performance, providing detailed analysis reports. Updated every six months. Available for a 45-day trial. Component delivery dates and product documentation included.
E N D
An Expert System designed to evaluate IBM z/OS systems
Product Overview • Helps analyze performance of z/OS systems. • Written in SAS (only SAS/BASE is required). • Runs as a batch job on mainframe (or on PC). • Processes data in a standard performance • data base (either MXG, SAS/ITRM, or MICS). • Produces narrative reports showing results • from analysis! • Product is updated every six months • 45-day trial is available (see license agreement • for details).
Components Delivered • SRM Component * March 1991 • TSO Component * April 1991 • MVS Component * June 1991 • * These legacy components apply only to Compatibility Mode • DASD Component October 1991 • CICS Component May 1992 • WLM Component April 1995 • DB2 Component October 1999 • WMQ Component June 2004
Product Documentation • Each component has an extensive User Manual • available in hard-copy or CD, and web-enabled • Describes the likely impact of each finding • Discusses the performance issues associated • with each finding • Suggests ways to improve performance and describes alternative solutions • Provides specific references to IBM or other documents relating to the findings • More than 4,000 pages for all components
WLM Component • Checks for problems in service definition • Identifies reasons performance goals were missed • Analyzes general system problems: • Coupling facility/XCF • Paging subsystem • System logger • WLM-managed initiators • Excessive CPU use by SYSTEM or SYSSTC • IFA/zAAP, zIIP, and IOP/SAP processors • PR/SM, LPAR, and HiperDispatch problems • Intelligent Resource Director (IRD) problems
WLM Component - sample report RULE WLM103: SERVICE CLASS DID NOT ACHIEVE VELOCITY GOAL DB2HIGH (Period 1): Service class did not achieve its velocity goal during the measurement intervals shown below. The velocity goal was 50% execution velocity, with an importance level of 2. The '% USING' and '%TOTAL DELAY' percentages are computed as a function of the average address space ACTIVE time. The 'PRIMARY,SECONDARY CAUSES OF DELAY' are computed as a function of the execution delay samples on the local system. ‑‑‑‑‑‑LOCAL SYSTEM‑‑‑‑‑‑‑‑ % % TOTAL EXEC PERF PLEX PRIMARY,SECONDARY MEASUREMENT INTERVAL USING DELAY VELOC INDX PI CAUSES OF DELAY 21:15‑21:30,08SEP1998 16.6 83.4 17% 3.02 2.36 DASD DELAY(99%) RULE WLM361: NON‑PAGING DASD I/O ACTIVITY CAUSED SIGNIFICANT DELAYS DB2HIGH (Period 1): A significant part of the delay to the service class can be attributed to non‑paging DASD I/O delay. The below data shows intervals when non‑paging DASD delay caused DB2HIGH to miss its performance goal: AVG DASD AVG DASD ‑‑AVERAGE DASD I/O TIMES‑ MEASUREMENT INTERVAL I/O RATE USING/SEC RESP WAIT DISC CONN 21:15‑21:30,08SEP1998 31 1.405 0.010 0.003 0.004 0.002
WLM Component - sample report RULE WLM601: TRANSPORT CLASS MAY NEED TO BE SPLIT You should consider whether the DEFAULT transport class should be split. A large percentage of the messages were too small, while a significant percentage of messages were too large. Storage is wasted when buffers are used by messages that are too small, while unnecessary overhead is incurred when XCF must expand the buffers to fit a message. The CLASSLEN parameter establishes the size of each message buffer, and the CLASSLEN parameter was specified as 16,316 for this transport class. This finding applies to the following RMF measurement intervals: SENT SMALL MESSAGES MESSAGES TOTAL MEASUREMENT INTERVAL TO MESSAGES THAT FIT TOO BIG MESSAGES 10:00‑10:30,26MAR1996 JA0 4,296 0 57 4,353 12:00‑12:30,26MAR1996 Z0 2,653 6 762 3,421 12:30‑13:00,26MAR1996 Z0 2,017 0 109 2,126 RULE WLM316: PEAK BLOCKED WORK WAS MORE THAN GUIDANCE The SMF statistics showed that blocked workload waited longer than specified by the BLWLINTHD parameter in IEAOPTxx. A maximum of more than 2 address spaces and enclaves were concurrently blocked during the interval. BLWLINTHD BLWLTRPCT --BLOCKED WORKLOAD-- MEASUREMENT INTERVAL IN IEAOPT IN IEAOPT AVERAGE PEAK 7:14- 7:29,01OCT2010 20 5 0.002 63 7:29- 7:44,01OCT2010 20 5 0.000 22 7:44- 7:59,01OCT2010 20 5 0.001 49 7:59- 8:14,01OCT2010 20 5 0.001 63 8:14- 8:29,01OCT2010 20 5 0.002 62
WLM Component - sample report RULE WLM893: LOGICAL PROCESSORS IN LPAR HAD SKEWED ACCESS TO CAPACITY LPAR SYSC: HiperDispatch was specified for one or more LPARs in this CPC, and at least one LPAR used one or more high polarity central processors. LPAR SYSC was not operating in HiperDispatch Management Mode, and experienced a skew of its access to physical processors because the high polarity processors and medium polarity processors used by LPARs running in HiperDispatch Management Mode. The information below shows the number of logical processors that were assigned to LPAR SYSC and each logical processor share of physical a processor. The CPU activity skew is shown during each RMF interval, showing the minimum, average, and maximum CPU busy for the logical processors assigned to LPAR SYSC. LOGICAL CPUS % PHYSICAL CPU ACTIVITY SKEW MEASUREMENT INTERVAL ASSIGNED CPU SHARE MIN AVG MAX 13:59-14:14,15SEP2009 2 45.5 28.2 43.3 58.4 RULE WLM537: ZAAP-ELIGIBLE WORK HAD HIGH GOAL IMPORTANCE Rule WLM530 or Rule WLM535 was produced for this system, indicating that a relatively large amount of zAAP-eligible work was processed on a central processor. One possible cause of this situation is that the zAAP-eligible work was assigned a relatively high Goal Importance (the Goal Importance was either Importance 1 or Importance 2). Please see the discussion in the WLM Component User Manual for an explanation of this issue.
DB2 Component • Analyzes standard DB2 interval statistics • Applies analysis from DB2 Administration Guide • and DB2 Performance Guide (with DB2 9.1) • Analyzes DB2 Versions 3, 4, 5, 6, 7, 8, and 9 • Evaluates overall DB2 constraints, buffer pools, • EDM pool, RID list processing, Lock Manager, • Log Manager, DDF, and data sharing • All analysis can be tailored to your site!
DB2 Component Typical DB2 local buffer constraints • There might be insufficient buffers for work files • There were insufficient buffers for work files in merge passes • Buffer pool was full • Hiperpool read requests failed (pages stolen by system) • Hiperpool write requests failed (expanded storage not available • Buffer pool page fault rate was high • Data Management Threshold (DMTH) was reached • DWQT and VDWQT might be too large • DWQT, VDWQT, or VPSEQT might be too small
DB2 Component Typical DB2 I/O prefetch constraints • Sequential prefetch was disabled, buffer shortage • Sequential prefetch was disabled, unavailable read engine • Sequential prefetch not scheduled, prefetch quantity = 0 • Synchronous read I/O and sequential prefetch was high • Dynamic sequential prefetch was high (before DB2 8.1) • Synchronous read I/O was high
DB2 Component Typical DB2 parallel processing constraints • Parallel groups fell back to sequential mode • Parallel groups reduced due to buffer shortage • Prefetch quantity reduced to one‑half of normal • Prefetch quantity reduced to one‑quarter of normal • Prefetch I/O streams were denied, shortage of buffers • Page requested for a parallel query was unavailable
DB2 Component Typical DB2 EDM pool constraints • Failures were caused by full EDM pool • Low percent of DBDs found in EDM pool • Low percent of CT Sections found in EDM pool • Low percent of PT Sections found in EDM pool • Size of EDM pool could be reduced • Excessive Class 24 (EDM LRU) latch contention
DB2 Component Typical DB2 Lock Manager constraints • Work was suspended because of lock conflict • Locks were escalated to shared mode • Locks were escalated to exclusive mode • Lock escalation was not effective • Work was suspended for longer than time-out value • Deadlocks were detected
DB2 Component Typical DB2 Log Manager constraints • Archive log read allocations exceeded guidance • Archive log write allocations exceeded guidance • Waits were caused by unavailable output log buffer • Log reads satisfied from active log data set • Log reads were satisfied from archive log data set • Failed look-ahead tape mounts
DB2 Component Typical DB2 Data Sharing constraints • Group buffer pool is too small • Incorrect directory entry/data entry ratio • Directory reclaims resulting in cross-invalidations • Castout processing occurring in “spurts” • Excessive lock contention or false lock contention • GBPCACHE ALL inappropriately specified • GBPCACHE CHANGED inappropriately specified • Conflicts between applications
DB2 Component - sample report RULE DB2-208: VIRTUAL BUFFER POOL WAS FULL Buffer Pool 2: A usable buffer could not be located in virtual Buffer Pool 2, because the virtual buffer pool was full. This condition should not normally occur, as there should be ample buffers. You should consider using the -ALTER BUFERPOOL command to increase the virtual buffer pool size (VPSIZE) for the virtual buffer pool. This situation occurred during the intervals shown below: BUFFERS NUMBER OF TIMES MEASUREMENT INTERVAL ALLOCATED POOL WAS FULL 10:54-11:24, 15SEP1999 100 12 11:24-11:54, 15SEP1999 100 13 RULE DB2-216: BUFFER POOLS MIGHT BE TOO LARGE Buffer Pool 1: The page fault rates for read and write I/O indicated that the buffer pools might be too large for the available processor storage. This situation occurred for Buffer Pool 1 during the intervals shown below: BUFFERS PAGE-IN FOR PAGE-IN FOR PAGE MEASUREMENT INTERVAL ALLOCATED READ I/O WRITE I/O RATE 11:15-11:45, 16SEP1999 25,000 36,904 195 41.2 11:45-12:15, 16SEP1999 25,000 30,892 563 35.0 12:45-13:15, 16SEP1999 25,000 23,890 170 26.7
DB2 Component - sample report RULE DB2-230: SEQUENTIAL PREFETCH WAS DISABLED - BUFFER SHORTAGE Buffer Pool BP1: Sequential prefetch is disabled when there is a buffer shortage, as controlled by the Sequential Prefetch Threshold (SPTH). Ideally, sequential prefetch should not be disabled, since performance is adversely affected. If sequential prefetch is disabled a large number of times, the buffer pool size might be too small. The sequential prefetch threshold was reached for Buffer Pool BP1 during the intervals shown below. BUFFERS TIMES SEQUENTIAL PREFETCH MEASUREMENT INTERVAL ALLOCATED DISABLED (BUFFER SHORTAGE) 5:00- 5:15, 15MAY2009 268,000 125 BP1 5:15- 5:30, 15MAY2009 268,000 1,533 BP1 RULE DB2-234: WRITE ENGINES WERE NOT AVAILABLE FOR ASYNCHRONOUS I/O Buffer Pool BP13: DB2 has 600 deferred write engines available for asynchronous I/O operations. When all 600 write engines are used, synchronous writes are performed. The application is suspended during synchronous writes, and performance is adversely affected. This situation occurred for Buffer Pool BP13 during the intervals shown below: BUFFERS TIMES WRITE ENGINES MEASUREMENT INTERVAL ALLOCATED WERE NOT AVAILABLE 5:45- 6:00, 15MAY2009 12,800 44 BP13
DB2 Component - sample report RULE DB2-423: DATABASE ACCESS THREAD WAS QUEUED, ZPARM LIMIT WAS REACHED Database access threads were queued because the ZPARM maximum for active remote threads was reached. You should consider increasing the maximum number of database access threads allowed. This situation occurred during the intervals shown below: DATABASE ACCESS THREADS QUEUED MEASUREMENT INTERVAL ZPARM LIMIT REACHED 11:24-11:54, 01OCT2010 9 RULE DB2-512: LOG READS WERE SATISFIED FROM ACTIVE LOG DATA SET The DB2 Log Manager statistics revealed that more than 25% of the log reads were satisfied from the active log data set. It is preferable that the data be in the output buffer, but this is not always possible with an active DB2 environment. However, if a large percent of reads are satisfied from the active log, you should ensure that the output buffer is as large as possible. This finding occurred during the intervals shown below: TOTAL LOG LOG READS FROM MEASUREMENT INTERVAL READS ACTIVE LOG DATA SET PERCENT 14:24-14:54, 01OCT2010 6,554 4,678 71.4 14:54-15:24, 01OCT2010 7,274 3,695 50.8
DB2 Component - sample report RULE DB2-601: COUPLING FACILITY READ REQUESTS COULD NOT COMPLETE Group Buffer Pool 6: Coupling facility read requests could not be completed because of a lack of coupling facility storage resources. This situation occurred for Group Buffer Pool 6 during the intervals shown below: GROUP BUFFER POOL TIMES CF READ MEASUREMENT INTERVAL ALLOCATED SIZE REQUESTS NOT COMPLETE 11:01-11:31, 14OCT1999 38M 130 RULE DB2-610: GBPCACHE(N0) OR GBPCACHE NONE MIGHT BE APPROPRIATE Group Buffer Pool 4: This buffer pool had a very small amount of read activity relative to write activity. Pages read were less than 1% of the pages written. Since so few pages were read from this group buffer pool, you should consider specifying GPBCACHE(NO) for the group buffer pool or specifying GBPCACHE NONE for the page sets using the group buffer pool. This situation occurred for Group Buffer Pool 4 during the intervals shown below: GROUP BUFFER POOL PAGES PAGES READ MEASUREMENT INTERVAL ALLOCATED SIZE READ WRITTEN PERCENT 10:34-11:04, 14OCT1999 38M 14 18,268 0.07%
CICS Component • Processes CICS Interval Statistics contained in • MXG Performance Data Base (standard SMF 110) • Analyzes all releases of CICS (CICS/ESA, • CICS/TS for OS390, and CICS/TS for z/OS) • Applies most analysis techniques contained in • IBM’s CICS Performance Guides • Produces specific suggestions for improving • CICS performance
CICS Component (Major areas analyzed) • Virtual and real storage (MXG/AMXT/TCLASS) • VSAM and File Control (NSR and LSR pools) • Database management (DL/I, IMS, DB2) • Journaling (System and User journals) • Network and VTAM (RAPOOL, RAMAX) • CICS Facilities (temp storage, transient data) • ISC/IRC (MRO, LU61., LU6.2 modegroups) • System logger • Temporary Storage • Coupling Facility Data Tables (CFDT) • CICS-DB2 Interface • Open TCB pools • TCP/IP and SSL
CICS Component - sample report RULE CIC101: CICS REACHED MAXIMUM TASKS TOO OFTEN The CICS statistics revealed that the number of attached tasks was restricted by the MXT operand, but storage did not appear to be constrained. CPExpert suggests that you consider increasing the MXT value in the System Initialization Table (SIT) for this region. This finding applies to the following CICS statistics intervals: TIMES PEAK TIME STATISTICS MXT -PEAK TASKS- MAXTASK MAXTASK WAITING COLLECTION TIME APPLID VALUE TOTAL USER REACHED QUEUE MAXTASK 0:00,01OCT2010 CICSIDG. 20 46 20 36 8 0:02:29.0 RULE CIC140: THE NUMBER OF TRANSACTION ERRORS IS HIGH The CICS statistics revealed that more than 5 transaction errors were related to terminals. These transactions errors may indicate that there is an attempted security breach, there may be problems with the terminal, or perhaps additional operator training is indicated. This finding applies to the following CICS statistics intervals: STATISTICS COLLECTION TIME APPLID TERMINAL NUMBER OF ERRORS 0:00,01OCT2010 CICSPROD T2M1 348 0:00,01OCT2010 CICSPROD T2M2 60 0:00,01OCT2010 CICSPROD T2M6 348
CICS Component - sample report RULE CIC170: MORE THAN ONE STRING SPECIFIED FOR WRITE-ONLY ESDS FILE More than one string was specified for a VSAM ESDS file that was used exclusively for write operations. Specifying more than one string can significantly affect performance because of exclusive control conflict that can occur. If this finding occurs for all normal CICS processing you should consider specifying only one string in the ESDS file definition. STATISTICS NUMBER OF COLLECTION TIME APPLID VSAM FILE WRITE OPERATIONS 0:00,16MAR2010 CICSYA LNTEMSTR 431,436 RULE CIC267: INSUFFICIENT SESSIONS MAY HAVE BEEN DEFINED CPExpert believes that an insufficient number of sessions may have been defined for the CICS DAL1 connection, or the application system could have been issuing ALLOCATE requests too often. CPExpert suggests you consider increasing the number of sessions defined for the connection, or you should increase the ALLOCQ guidance variable to cause CPExpert to signal a potential problem only when you view the problem as serious. For APPC modegroups, this finding applies only to generic ALLOCATE requests. This finding applies to the following CICS statistics intervals: STATISTICS ALLOCATE REQUESTS COLLECTION TIME APPLID RETURNED TO USERS 10:00,26MAR2008 CICSDTL1 335
CICS Component - sample report RULE CIC267: INSUFFICIENT SESSIONS MAY HAVE BEEN DEFINED CPExpert believes that an insufficient number of sessions may have been defined for the CICS DAL1 connection, or the application system could have been issuing ALLOCATE requests too often. The number of ALLOCATE requests returned was greater than the value specified for the ALLOCQ guidance variable in USOURCE(CICGUIDE). CPExpert suggests you consider increasing the number of sessions defined for the connection, or you should increase the ALLOCQ guidance variable to cause CPExpert to signal a potential problem only when you view the problem as serious. For APPC modegroups, this finding applies only to generic ALLOCATE requests. This finding applies to the following CICS statistics intervals: STATISTICS ALLOCATE REQUESTS COLLECTION TIME APPLID RETURNED TO USERS 10:00,26MAR2008 CICSDTL1 335 11:00,26MAR2008 CICSDTL1 12 12:00,26MAR2008 CICSDTL1 27
CICS Component - sample report RULE CIC307: FREQUENT LOG STREAM DASD-SHIFTS OCCURRED CICS75.A075CICS.DFHLOG: More than 1 log stream DASD-shift was initiated for this log stream during the intervals shown below. A DASD-shift event occurs when system logger determines that a log stream must stop writing to one log data set and start writing to a different data set. You normally should allocate sufficiently large log data sets so that a DASD-shift occurs infrequently. ------NUMBER OF DASD LOG SHIFTS------ SMF INTERVAL DURING INTERVAL DURING PAST HOUR 14:45,16MAR2010 1 2 RULE CIC650: CICS EVENT PROCESSING WAS DISABLED IN CICS EVENTBINDING Event Processing was disabled in EVENTBINDING, with the result that events defined in the EVENTBINDING were not captured by CICS Event Processing. You should investigate the Event Binding to determine whether the Binding should be enabled or disabled for the region. This finding applies to the following CICS statistics intervals: STATISTICS COLLECTION TIME 0:00,12MAR2009 3:00,12MAR2009 6:00,12MAR2009
DASD Component • Processes SMF Type 70(series) to automatically • build model of your I/O configuration. • Identifies performance problems with devices • which have most potential for improvement. • PEND delays • Disconnect delays • Connect delays • IOSQ delays • Shared DASD conflicts • Analyzes SMF Type 42(DS) and Type 64 to • identify VSAM performance problems.
DASD Component - sample report RULE DAS100: VOLUME WITH WORST OVERALL PERFORMANCE VOLSER DB2327 (device 2A1F) had the worst overall performance during the entire measurement period (10:00, 16FEB2001 to 11:00, 16FEB2001). This volume had an overall average of 56.8 I/O operations per second, was busy processing I/O for an average of 361% of the time, and had I/O operations queued for an average of 1% of the time. Please note that percentages greater than 100% and Average Per Second Delays greater than 1 indicate that multiple I/O operations were concurrently delayed. This can happen, for example, if multiple I/O operations were queued or if multiple I/O operations were PENDing. The following summarizes significant performance characteristics of VOLSER DB2327: I/O --- AVERAGE PER SECOND DELAYS--- MAJOR MEASUREMENT INTERVAL RATE RESP CONN DISC PEND IOSQ PROBLEM 10:00-10:30,16FEB2001 59.1 1.308 0.316 0.004 0.988 0.000 PEND TIME 10:30-11:00,16FEB2001 57.2 3.792 0.300 0.004 3.483 0.006 PEND TIME 11:00-11:30,16FEB2001 54.2 5.769 0.279 0.004 5.464 0.023 PEND TIME
DASD Component - sample report RULE DAS130: PEND TIME WAS MAJOR CAUSE OF I/O DELAY. A major cause of the I/O delay with VOLSER DB2327 was PEND time. The average per-second PEND delay for I/O is shown below: PEND PEND PEND PEND PEND TOTAL MEASUREMENT INTERVAL CHAN DIR PORT CONTROL DEVICE OTHER PEND 10:00-10:30,16FEB2001 0.492 0.000 0.000 0.000 0.495 0.988 10:30-11:00,16FEB2001 1.927 0.000 0.000 0.000 1.556 3.483 11:00-11:30,16FEB2001 2.840 0.000 0.000 0.000 2.624 5.464 RULE DAS160: DISCONNECT TIME WAS MAJOR CAUSE OF I/O DELAY. A major cause of the I/O delay with VOLSER DB26380 was DISCONNECT time. DISC time for modern systems is a result of cache read miss operations, potentially back-end staging delay for cache write operations, peer-to-peer remote copy (PPRC) operations, and other miscellaneous reasons. --PERCENT-- DASD CACHE ----CACHE---- READ WRITE TO TO MEASUREMENT INTERVAL READS WRITES HITS HITS CACHE DASD PPRC BPCR ICLR 8:30- 8:45,22OCT2001 14615 932 19.2 100.0 11825 903 0 0 0 8:45- 9:00,22OCT2001 14570 921 20.7 100.0 11567 907 0 0 0
DASD Component - sample report RULE DAS300: PERHAPS SHARED DASD CONFLICTS CAUSED PERFORMANCE PROBLEMS Accessing conflicts caused by sharing VOLSER DB2700 between systems might have caused performance problems for the device during the measurement intervals shown below. Conflicting systems had the indicated I/O rate, average CONN time per second, average DISC time per second, average PEND time per second, and average RESERVE time to the device. Even moderate CONN, DISC, or RESERVE can cause delays to shared devices. .. I/O MAJOR OTHER -------OTHER SYSTEM DATA-------- MEASUREMENT INTERVAL RATE PROBLEM SYSTEM I/O RATE CONN DISC PEND RESV 8:30- 8:45,22OCT2001 31.3 QUEUING SY1 35.0 0.041 0.001 0.455 0.000 SY2 88.2 0.100 0.003 0.714 0.000 SY3 109.0 0.123 0.003 0.723 0.000 TOTAL 232.2 0.264 0.006 1.892 0.000 8:45- 9:00,22OCT2001 25.7 QUEUING SY1 46.4 0.054 0.001 0.565 0.000 SY2 98.2 0.112 0.003 0.836 0.000 SY3 119.0 0.136 0.003 0.846 0.000 TOTAL 263.5 0.303 0.007 2.247 0.000
DASD Component - sample report RULE DAS607: VSAM DATA SET IS CLOSE TO MAXIMUM NUMBER OF EXTENTS VOLSER: RLS003. More than 225 extents were allocated for the VSAM data sets listed below. The VSAM data sets are approaching the maximum number of extents allowed. The below shows the number of extents and the primary and secondary space allocation: .. TOTAL EXTENTS ---ALLOCATIONS--- SMF TIME STAMP JOB NAME VSAM DATA SET .. EXTENTS THIS OPEN PRIMARY SECONDARY 10:30,11MAR2002 CICS2ABA RLSADSW.VF01D.DATAENDB.DATA................. 229 4 30 CYL 1 CYL RULE DAS625: NSR WAS USED, BUT LARGE PERCENT OF ACCESS WAS DIRECT VOLSER: MVS902. Non-Shared resources (NSR) was specified as the buffering technique for the below VSAM data sets, but more than 75% of the I/O activity was direct access. NSR is not designed for direct access, and many of the advantages of NSR are not available for direct access. You should consider Local Shared Resources (LSR) for the below VSAM data sets (perhaps using System Managed Buffers to facilitate the use of LSR). The I/O RATE is for the time the data set was open. The SMF TIME STAMP and JOB NAME are from the last record for the data set. .. I/O OPEN -ACCESS TYPE (PCT)- SMF TIME STAMP JOB NAME VSAM DATA SET .. RATE DURATION SEQUENTIAL DIRECT 13:19,19SEP2002 NRXX807. SDPDPA.PK.MVSP.RT.NDMGIX.DATA............... 8.4 0:07:08 0.0 100.0 13:19,19SEP2002 NRXX807. SDPDPA.PR.MVSP.RT.NDMGIXD.DATA.............. 11.2 0:06:42 0.0 100.0 13:33,19SEP2002 TSJHM... SDPDPA.PR.MVSP.RT.NDMRQFDA.DATA............. 0.3 2:21:58 0.0 100.0 13:33,19SEP2002 TSJHM... SDPDPA.PR.MVSP.RT.NDMRQF.DATA............... 2.8 3:37:53 0.0 100.0 13:33,19SEP2002 TSJHM... SDPDPA.PK.MVSP.RT.NDMTCF.DATA............... 11.1 6:24:10 0.1 99.9
DASD Component (Application Analysis) • Requires simple modification to MXG or MICS • Modification collects job step data while • processing SMF Type 30 (Interval) records • Typically requires less than 10 cylinders • Data is correlated with Type 74 information • CPExpert associated performance problems to • specific applications (jobs and job steps) • CPExpert can perform “Loved one” analysis of • DASD performance problems
WMQ Component • Analyzes SMF Type 115 statistics, as processed • by MXG or MICS and placed into performance • data base. • MQMLOG - Log manager statistics • MQMMSGDM - Message/data manager statistics • MQMBUFER - Buffer Manager statistics • MQMCFMGR - Coupling Facility Manager stats • Type 115 records should be synchronized with • SMF interval recording interval. • IBM says overhead to collect accounting data is • negligible.
WMQ Component • Optionally analyzes SMF Type 116 accounting • data, as processed by MXG or MICS and • placed into performance data base. • MQMACCTQ - Thread-level accounting data • MQMQUEUE - Queue-level accounting data • Type 116 records should be synchronized with • SMF interval recording interval. • IBM says overhead to collect accounting data is • 5-10%
WebSphere MQ Typical queue manager problems • Assignment of queues to page sets • Assignment of page sets to buffer pools • Queue manager parameters • Index characteristics of queues • Characteristics of messages in queues • Characteristics of MQ calls CPExpert analysis uses SMF Type 116 records
WebSphere MQ Typical buffer manager problems Buffer thresholds exceeded for pool Buffers assigned per pool (too few/too many) Message traffic Message characteristics Application design CPExpert analysis uses SMF Type 115 records
WebSphere MQ Typical log manager problems Log buffers assigned Active log use characteristics Archive log use characteristics Tasks backing out System paging of log buffers Excessive checkpoints taken CPExpert analysis uses SMF Type 115 records
WebSphere MQ Typical DB2-interface problems Thread delays DB2 server processing delays Server requests queued Server tasks experienced ABENDs Deadlocks in DB2 Maximum request queue depth was too large CPExpert analysis uses SMF Type 115 records
WebSphere MQ Typical Shared queue problems Structure was full Large number of application structures defined MINSIZE is less than SIZE for CSQ.ADMIN SIZE is more than double MINSIZE ALLOWAUTOALT(YES) not specified FULLTHRESHOLD value might be incorrect CPExpert analysis uses SMF Type 115 records and Type 74 (Coupling Facility) records
WebSphere MQ – sample report RULE WMQ100: MESSAGES WERE WRITTEN TO PAGE SET ZERO More than 0 messages were written to Page Set Zero during the intervals shown below. Messages should not be written to Page Set Zero, since serious WebSphere MQ system problems could occur if Page Set Zero should become full. This finding relates to queue SYSTEM.COMMAND.INPUT MESSAGES WRITTEN STATISTICS INTERVAL TO PAGE SET ZERO 13:16-14:45, 28AUG2003 624 RULE WMQ122: DEAD.LETTER QUEUE IS INAPPROPRIATE FOR PAGE SET ZERO Buffer Pool 0. The DEAD.LETTER queue was assigned to Page Set Zero. A dead-letter queue stores messages that cannot be routed to their correct destinations. If the DEAD-LETTER queue grows large unexpectedly, Page Set Zero can become full, and WebSphere MQ can enter a serious stress condition. You should redefine the DEAD.LETTER queue to a page set other than Page Set Zero. This finding relates to queue SYSTEM.DEAD.LETTER.QUEUE
WebSphere MQ – sample report RULE WMQ110: EXPYRINT VALUE IS OFF OR TOO SMALL Buffer Pool 3. There were more than 25 expired messages skipped when scanning a queue for a specific message. Processing expired messages adds both CPU time and elapsed time to the message processing. With WebSphere 5.3, the EXPYRINT keyword was introduced to allow the queue manager to automatically determine whether queues contained expired messages and to eliminate expired messages at the interval specified by the EXPYRINT value. This finding applies to queue: DPS.REPLYTO.RCB.IVR04 GET BROWSE EXPIRED MESSAGES STATISTICS INTERVAL SPECIFIC SPECIFIC PROCESSED 13:41-13:41, 03JUL2003 0 0 313 RULE WMQ320: APPLICATIONS WERE SUSPENDED FOR LOG WRITE BUFFERS Applications were suspended while in-storage log buffers are being written to the active log. This finding normally means that too few log buffers were assigned. However, the finding could mean that there is an I/O configuration problem and the log buffer writes to the active log are delayed for I/O reasons. This finding applies to the following statistics intervals. NUMBER OF SUSPENSIONS STATISTICS INTERVAL WAITING ON OUTPUT BUFFERS 14:19-14:44, 12SEP2003 139
WebSphere MQ – sample report RULE WMQ201: BUFFER POOL ENCOUNTERED SYNCHRONOUS (5%) THRESHOLD Buffer Pool 0. This buffer pool encountered the Synchronous Write threshold (less than 5% of the pages in the buffer pool were "stealable" or more than 95% of the pages were on the Deferred Write queue). While the Synchronous Page Writer is executing, updates to any page cause the page to be written immediately to the page set (the page is not placed on the Deferred Write Queue, but is written immediately to the page set as a synchronous write operation). This situation harms performance of applications, and is an indicator that the buffer pool is in danger of encountering a Short on Storage condition. BUFFERS TIMES AT IMMEDIATE STATISTICS INTERVAL ASSIGNED 5% THRESHOLD WRITES 17:08-17:09, 07OCT2003 1,050 19 19 RULE WMQ205: HIGH I/O RATE TO PAGE SETS WITH SHORT-LIVED MESSAGES Buffer Pool 0. This buffer pool had short-lived messages assigned. The total I/O rate (read and write activity) to page sets for the short-lived messages was more than 0.5 pages per second. Writing pages to the page set and subsequently reading the pages from the page set cause I/O overhead and delay to the application. This finding applies to the following intervals: BUFFERS PAGES PAGES I/O RATE STATISTICS INTERVAL ASSIGNED WRITTEN READ WITH DASD 11:32-11:32, 24JUL2006 50,000 101 0 50.5
WebSphere MQ – sample report RULE WMQ300: ARCHIVE LOGS WERE USED FOR BACKOUT WebSphere MQ applications issued log reads to the archive log file for backout more than 0 times during the WebSphere MQ statistics intervals shown below. Most log read requests should come from the output buffer or the active log. Using archive logs for backout purposes often indicates that either the active log files were too small or long-running applications were backing out work. NUMBER OF LOG READS STATISTICS INTERVAL FROM ARCHIVE LOG 4:30- 5:00, 12SEP2003 192 RULE WMQ611: LARGE NUMBER OF APPLICATION STRUCTURES WERE DEFINED SMF TYPE74 (Structure) statistics showed that more than 5 application structures were defined to a coupling facility. IBM suggests that you should have as few application structures as possible. Having multiple application structures in a coupling facility can degrade performance. WEBSPHERE MQ COUPLING FACILITY STRUCTURES DEFINED CF1 8 CF2 9 CF3 8
CPExpert Release 18.1 (Issued April 2008) • Major enhancements with this update: • Provided support for z10 server • Provided analysis of HiperDispatch problems • Provided new reports to help analysis of DB2 buffer pool problems • Expanded the CPExpert email feature to the DASD Component • Provided additional analysis features for the WebSphere MQ Component
CPExpert Release 18.2 (Issued October 2008) • Major enhancements with this update: • Provided support for z/OS Version 1, Release 10 • Provided additional analysis of z/OS performance problems (in WLM Component), including reduced CPU speed caused by cooling unit failure • Provided new reporting of rules based on History information kept by CPExpert (applies to all components except DB2 Component) • Added masking technique to select CICS regions (by region Group), DASD volumes (including SMS Storage Groups), and WebSphere MQ subsystems
CPExpert Release 19.1 (Issued April 2009) • Major enhancements with this update: • Enhanced WLM Component with analysis of more z/OS performance problems, including Enqueue Promoted Dispatching Priority analysis • Project the amount of zAAP-eligible work that could be offloaded to a zAAP processor, if a zAAP processor were assigned to the LPAR • Provided more analysis of CICS temporary storage in CICS Component • Added Resource Enqueue analysis to DASD Component
CPExpert Release 19.2 (Issued October 2009) • Major enhancements with this update: • Provided support for z/OS Version 1, Release 11 • Provide support for CICS/TS Release 4.1. • Added analysis of Resource Enqueue contention between different levels of Goal Importance to WLM Component • Added analysis of CICS Event Processing to the CICS Component (applicable to CICS/TS 4.1) • Allow users to specify narrative descriptions of individual DB2 buffer pools in CPExpert reports
CPExpert Release 20.1 (Issued April 2010) • Major enhancements with this update: • Enhanced WLM Component with analysis of SMF buffer specifications and other SMF performance constraints • Support analysis of VSAM performance problems when analyzing a MICS performance data base, but using MXG TYPE42DS and MXG TYPE64 files • Allow selection of up to 20 unique DB2 subsystems while analyzing performance problems with DB2 subsystems, and add logic to handle the case where an installation has multiple identical DB2 subsystem names defined in z/OS images
CPExpert Release 20.2 (Issued October 2010) • Major enhancements with this update: • Provided support z/OS Version 1 Release 12 • Provided support for z/Enterprise System (z196) • Enhanced WLM Component to provide analysis of dropped SMF records and analysis of SMF flood facility (available with z/OS V1R12) • Enhanced WLM Component to provide Management Overview of CPExpert findings, with web-enabled documentation links • Enhanced the WebSphere MQ Component to provide analysis of a non-indexed request/reply-to queue
CPExpert Release 21.1 (Issued April 2011) • Major enhancements with this update: • Provided new “analysis area” to include: • Analysis of address spaces queued for logical processor • Analysis of work units queued for logical processor • Include analysis of queuing due to “power steering” option • Provided additional analysis of HiperDispatch