130 likes | 270 Views
Issues after Workload Moves. And why we implemented IRD to fix. Workload Separation For Licence Savings. Original setup Mixture of licences across LPARs Licence charges based on LPAR size, not product usage within the LPAR. Z9 2094 – 713. SYSA. SYSB. SYSK. IMS. IMS. DB2. DB2. DB2.
E N D
Issues after Workload Moves And why we implemented IRD to fix
Workload Separation For Licence Savings • Original setup • Mixture of licences across LPARs • Licence charges based on LPAR size, not product usage within the LPAR Z9 2094 – 713 SYSA SYSB SYSK IMS IMS DB2 DB2 DB2 CICS CICS CICS BATCH BATCH BATCH DB2 SP DB2 SP DB2 SP MQ MQ MQ
Workload Separation For Licence Savings IMS LPAR BATCH LPAR CICS LPAR • New Setup • IMS, CICS and general LPAR • IMS subsystems merged • DB2 datasharing • MSU licence cost on SYSB is substantially cheaper. • Move as much non-IMS, non-CICS work to SYSB as possible. SYSA SYSB SYSK IMS DB2 DB2 DB2 BMPBATCH CICS DLIBATCH BATCH DB2 SP MQ MQ MQ
Working day 1 • Lots of batch, overnight and during day • Always our peak day of month • Peak occurs during overnight batch • Batch failures and overruns common • Online systems are also busiest on this day • System can often run capped
Workload Moves • No major issues with online • No major issues with normal batch, but • Major issue with WD1 batch • Many DB2 batch jobs elapsed time increased 2-3 times when moved to SYSB • Changing priority made no difference in most cases • If reran on SYSA, would run much faster • Jobs not delayed for CPU • Nothing in RMFWDM or DB2 thread stats to explain • Dasd response times similar SYSA/B • LPARs on same CEC, shared channels
Thoughts and Guesses • Delay in servicing I/O interrupts • Higher TPI% on SYSB, but not much • CPENABLE setting (10,30) – in line with IBM recommendations • Logical processor share% very low on SYSB 16% vs 80% SYSA • Same number of CPUs online SYSB as SYSA • Weightings not changed to reflect workload moves • Significant when capped • Weightings need to be more dynamic overnight vs online • LPAR dispatch time 12.5 ms
Thoughts and Guesses • Found ‘Short CP’ paper by Kathy Walsh • Describes how high priority work may be delayed • In relation to online CICS • I/O not mentioned • But posting I/O complete IS high priority work • Recommends reducing online CPUs to match workload • Recommends making sure weights appropriate • Recommends using IRD to monitor and automate changes
What is a short CP? • SYSB weight set to expect 120 MSU (approx 2 engines) • When capped, this is enforced • SYSB Z/OS thinks it has 11 engines, but each dispatched only about 11/2 of the time • High priority work is not dispatched • All dispatched CPs might be disabled for interrupts due to CPENABLE • Short CP ratio – MVSBUSY/LPARBUSY
2/1/2012 Notes on Metrics • DASD responses appear similar between systems • I/O rate appears depressed on SYSB when capped, more than SYSA • 90% of I/Os serviced by 4 out of 11 CPs • But each CP only has 16% logical processor share • Weights are about 80/20 between SYSA/SYSB • When capped MSUs moved toward these ratios • SYSB needs more than SYSA in the batch window • SYSA needs more than SYSB in the online window
Recreation of problem • Lower the cap on development system • Run several low priority looping jobs • Run I/O intensive Job at high priority – no CPU delays • Problem recreated only when • System was capped • Weight was too low for system MSU • Additional CPs were varied online
Comparison • SYSB logical processor share 60-100%, compared to 16% previously • Helps processor cache • SYSA logical processor share 60% compared to 80% • Job FNMDMP05 • 15h17mins elapsed, 55mins cpu, 642K excp 01/02 • 8h35mins elapsed, 48mins cpu, 648K excp 01/03 • DB2 DBM1 address space (does database I/O) • 499,271,506 I/Os take 02:50:58 CPU hours 01/02 • 549,277,707 I/OS take 02:28:45 CPU hours 01/03