460 likes | 596 Views
Understanding I/O Performance with PATROL-Perform and PATROL-Predict. Debbie Sheetz Sr. Staff Consultant BMC Software. I/O Performance Analysis Overview. I/O metric definitions Baseline I/O performance analysis What–if I/O performance analysis. How Important is I/O to Performance?.
E N D
Understanding I/O Performance with PATROL-Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software
I/O Performance Analysis Overview • I/O metric definitions • Baseline I/O performance analysis • What–if I/O performance analysis C4P075
How Important is I/O to Performance? • Predict/Visualizer presents a unified view of the system so that the relative contributions of CPU and disk I/O can be assessed • Don’t solve a problem that you don’t have CPU is the dominant factor here C4P075
Source of I/O Metrics • Key to understanding I/O is to know your metrics • Disks are reported/collected as they are defined/known to UNIX or NT • This may or may not correspond 1-to-1 to physical units • Disk configuration is collected from standard interface for the particular OS • Disk statistics are collected from standard interface for the particular OS (same metrics used by iostat, etc.) • Analyze/Predict interprets and reports based on these metrics C4P075
I/O Configuration Collection Issues • Sometimes the disk configuration is reported as “Unknown” • Three possible causes • Disk configuration is not available from the OS • Standard interface to OS fails to return the disk configuration • Collected disk configuration is not matched by an entry in the hardware (.hrw) and .odm • RAID is not collected directly • This DOES NOT AFFECT the baseline metrics or baseline model calibration • For certain ‘what-if’ disk modeling scenarios, the disk must be identified C4P075
Key I/O Metrics • A few metrics tell most of the story about disk I/O • Disk throughput • Data transferred (e.g. bytes, words, etc.) • Disk reads/writes • Disk accesses • Disk utilization (active time) C4P075
I/O Metrics: Throughput • Data transferred (e.g. bytes, words, etc.) • PATROL-Perform and Predict report I/Os in 4 KB units • Consistency for reporting (Analyze, Visualizer, Predict) • Ease of modeling I/O cross-node and cross-platform • Units measured vary by platform • HP, OSF: words Disk Statistics, Words Xfered • Solaris, AIX: blocks Disk Statistics, Blocks Read/Written • NT: bytes NT Physical Disk, Disk Read/Write C4P075
I/O Metrics: Throughput • Disk accesses (i.e. transfers) • Number of times an I/O request was made of the disk • Size of data transfer can vary • Doesn’t matter where the I/O is actually serviced: • Physical disk (seek, latency, and data transfer) • Cache on the disk • Cache on the disk controller • Doesn’t matter whether RAID or non-RAID • Similar metrics collected for UNIX/NT • UNIX Disk Statistics, Transfers • NT NT Physical Disk, Disk Transfers/Sec C4P075
I/O Metrics: Throughput • Disk reads/writes • Number of times a read vs. write I/O request was made of the disk • Size of data transfer can vary • Different metrics collected for UNIX/NT • Solaris, AIX Disk Statistics, Blocks Read/Written • HP, OSF Not Available • NT NT Physical Disk, Disk Read/Write Bytes/Sec • Reported in Analyze/Predict in 4 KB=I/O rates C4P075
I/O Metrics: Utilization (Active Time) • Disk utilization (active time) • Amount of time disk was observed to be actively servicing an I/O request • Doesn’t matter where the I/O is actually serviced: • Physical disk (seek, latency, and data transfer) • Cache on the disk • Cache on the disk controller • Doesn’t matter whether RAID or non-RAID • Should reflect the relative efficiency of I/O processing when compared with disk throughput measures • Use disk service time for this (service time = utilization / IOs) C4P075
I/O Metrics: Utilization (Active Time) • Disk active time • Different metrics collected for UNIX/NT • UNIX Disk Statistics, Active Time • NT NT Physical Disk, % Disk Time • Windows 2000 NT Physical Disk, % Idle Time • Windows/NT metrics are reinterpreted by Analyze • Perfmon caps calculated utilization at 100% • Observations of collected Windows/NT disk data show utilizations well over 100% • Analyze scales all collected NT times down • Perfmon and Analyze/Predict will not match C4P075
I/O Metrics Collection Issues • If “iostat” can’t see it, the collector can’t collect it • The OS is supplying the metrics • If the metrics are missing or incorrect, both “iostat” and PATROL-Perform/Predict, etc. will report the same • Problem needs to be addressed by the OS vendor • Refer any questions about valid I/O metrics to BMC Technical Support • Always need to know the exact platform (e.g. HP 11.00, 64-bit) • Run iostat and the collector in parallel • Use current collector for the platform C4P075
Baseline I/O Performance Analysis Overview • Observe key disk I/O metrics from baseline measurements • Identify I/O patterns • For the system • For a disk or group of disks • Distribution amongst disks • For a workload/transaction • Determine how important I/O is to overall performance C4P075
Baseline I/O Performance Analysis Overview • Observe key disk I/O metrics from baseline measurements • Identify I/O performance characteristics • Relative speed of I/O processing • Read/write ratios • Blocksize used • Disk utilization objectives • Distribution amongst disks C4P075
Baseline Case Study CPU pattern doesn’t precisely match I/O pattern C4P075
Baseline Case Study • I/O is dominated by one oracle instance, but there are other contributors • Study patterns within days and across days, weeks, etc. C4P075
Baseline Case Study • I/O is the major component of response time during prime time C4P075
Baseline Case Study Distribution of I/O amongst disks is fairly even C4P075
I/O Analysis Technique: CUTDISK • How to filter I/O data so only the important disks are studied? • Use “CUT DISK” feature • In Analyze • In Manager • If already specified in .an file input to Manager, don’t need Manager specification, too • Analyze/Predict reports shorter, Visualizer files smaller, Visualizer database smaller, Visualizer graphics easier to present C4P075
I/O Analysis Technique: CUTDISK • Concept is to aggregate I/O from less utilized disks, preserve important disks individually • I/Os are NOT removed from the model • Choose appropriate threshold • I/O rate or Disk utilization may be used • Threshold value can be set for a specific purpose • Setting of 0 removes only disks which are not used at all • Setting of 5% utilization removes most disks • Paging disks are never removed C4P075
I/O Analysis Technique: CUTDISK • Specify under Options, Cut Disk Options in Analyze C4P075
I/O Analysis Technique: CUTDISK • Specify under Options, Advanced Features in Manager C4P075
Baseline Case Study • Observe Disk Utilization patterns Utilizations mostly even, most under 40% C4P075
Baseline Case Study • Observe Disk processing efficiency Looks good! Most service times under 5 ms per 4 KB transfer. A few outliers could use a closer look … C4P075
Baseline Case Study • Look at ssd4 High service time isn’t so high after all: 12.69 transfers divided by 9.85 I/Os is 1.3. That means 12.11 service time is for 1.3 actual data transfers or 9.3 ms per physical transfer. C4P075
Baseline Case Study • Look at ssd3 High service time isn’t really high here either: 10.66 transfers divided by 1.37 I/Os is 7.8. That means 53.84 service time is for 7.8 actual data transfers or 6.9 ms. Another way to think about this is that the average blocksize is 4 KB / 7.8 or .5 KB. C4P075
Baseline Case Study • In fact, good (larger) blocksizes explain the good disk performance These graphics show roughly a 2:1 ratio between I/Os and transfers, or an 8 KB blocksize C4P075
Baseline Case Study Conclusion • Even though I/O is a major contributor to response time, there are no obvious tuning opportunities • Continue to study the key I/O metrics over time • Identify trends in I/O performance C4P075
What-if I/O Performance Analysis Overview • Via the Predict model, you can change: • I/O patterns • For the system • Change in workload volume • Change in the types of workloads • For a disk or group of disks • Distribution amongst disks • Change in amount of transaction I/O required C4P075
What-if I/O Performance Analysis Overview • Via the Predict model, you can change: • I/O performance characteristics • Relative speed of I/O processing • Disk configuration change • Blocksize used C4P075
What-if I/O Performance Analysis Overview • Predict shows how this affects performance • Performance objectives • Workload/transaction response objectives • Disk utilization objectives • Reports I/O patterns • System • Distribution amongst disks • Reports individual disk performance • Can view results in Predict and/or Visualizer C4P075
What-if Case Study • Management wants to know how performance will change if a new RAID disk technology is implemented • Study strategy • Perform Visualizer analysis of baseline I/O performance characteristics, build baseline model • Perform Visualizer analysis of benchmark of I/O using new disk technology (IBM “Shark”) • Use Predict to do ‘what-if’ C4P075
What-if Case Study: Benchmark Data Analysis • Benchmark demonstrates substantial I/O rate • Since current system has high I/O rates, a subset of the benchmark will be studied C4P075
What-if Case Study: Benchmark Data Analysis • Selected subset of the benchmark C4P075
What-if Case Study: Benchmark Data Analysis • Key I/O characteristics: I/Os vs. transfers Ratio of I/Os to transfers is about 5.7, or 23 KB per native I/O access C4P075
What-if Case Study: Benchmark Data Analysis • Key I/O characteristic: reads vs. writes Ratio of reads to writes is about 1.5:1 C4P075
What-if Case Study: Benchmark Data Analysis • Key I/O characteristic: service time for 4 KB I/O Predominant service time is about .5 ms C4P075
What-if Case Study: Benchmark Data Analysis • Key I/O characteristic: service time for 4 KB I/O View by controller, disks over 5% utilization Note less efficiency at lower I/O load C4P075
What-if Case Study: Change Model • Only one change is needed in the Predict model • Set the disk service time/IO according to the benchmark • DO NOT use the hardware table method because more specific info is available • Hardware table method applies ratio of new disk type to current disk type • Both disk types must be in the hardware table • Baseline disk type must be specified C4P075
What-if Case Study: Change Model • Model must be baselined • Two methods for changing service time • Edit the disk service time/IO in the GUI • Use a command file if there are many disks Command file format MODIFY DISK hdisk10 EDISKTIME .5 MODIFY DISK hdisk11 EDISKTIME .5 Etc. C4P075
What-if Case Study: Modeling Results • Model is evaluated and net change is observed << Baseline What–if >> C4P075
What-if Case Study: Modeling Results • Relative reduction in response time reported with relative response time Reduction of 26% for the workload of interest C4P075
What-if Case Study: Modeling Results • Why not a larger reduction? << Baseline What-if >> New service time/utilization is about 75% of baseline (.5 ms / .65 ms) for the disks doing the most I/O C4P075
What-if Case Study: Modeling Results • What else will improve performance more? More even I/O distribution in benchmark C4P075
What-if Case Study: Modeling Results • What else will improve performance more? Possible use of more optimistic service time, e.g. .45 ms observed with CUTDISK set at 100 IO/sec Should confirm with more benchmark data and/or vendor C4P075
What-if Case Study Conclusion • Change to new technology will • Reduce I/O service time • Reduce I/O wait time • From reduced utilization (due to service time decrease) • From better I/O distribution (due to more even utilizations) • Reduction not as large as expected because current I/O performance is already good (.65 ms vs. .5 ms) • Allows for additional workload growth compared with current technology C4P075