1 / 46

Understanding I/O Performance with PATROL-Perform and PATROL-Predict

Understanding I/O Performance with PATROL-Perform and PATROL-Predict. Debbie Sheetz Sr. Staff Consultant BMC Software. I/O Performance Analysis Overview. I/O metric definitions Baseline I/O performance analysis What–if I/O performance analysis. How Important is I/O to Performance?.

quant
Download Presentation

Understanding I/O Performance with PATROL-Perform and PATROL-Predict

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding I/O Performance with PATROL-Perform and PATROL-Predict Debbie Sheetz Sr. Staff Consultant BMC Software

  2. I/O Performance Analysis Overview • I/O metric definitions • Baseline I/O performance analysis • What–if I/O performance analysis C4P075

  3. How Important is I/O to Performance? • Predict/Visualizer presents a unified view of the system so that the relative contributions of CPU and disk I/O can be assessed • Don’t solve a problem that you don’t have CPU is the dominant factor here C4P075

  4. Source of I/O Metrics • Key to understanding I/O is to know your metrics • Disks are reported/collected as they are defined/known to UNIX or NT • This may or may not correspond 1-to-1 to physical units • Disk configuration is collected from standard interface for the particular OS • Disk statistics are collected from standard interface for the particular OS (same metrics used by iostat, etc.) • Analyze/Predict interprets and reports based on these metrics C4P075

  5. I/O Configuration Collection Issues • Sometimes the disk configuration is reported as “Unknown” • Three possible causes • Disk configuration is not available from the OS • Standard interface to OS fails to return the disk configuration • Collected disk configuration is not matched by an entry in the hardware (.hrw) and .odm • RAID is not collected directly • This DOES NOT AFFECT the baseline metrics or baseline model calibration • For certain ‘what-if’ disk modeling scenarios, the disk must be identified C4P075

  6. Key I/O Metrics • A few metrics tell most of the story about disk I/O • Disk throughput • Data transferred (e.g. bytes, words, etc.) • Disk reads/writes • Disk accesses • Disk utilization (active time) C4P075

  7. I/O Metrics: Throughput • Data transferred (e.g. bytes, words, etc.) • PATROL-Perform and Predict report I/Os in 4 KB units • Consistency for reporting (Analyze, Visualizer, Predict) • Ease of modeling I/O cross-node and cross-platform • Units measured vary by platform • HP, OSF: words Disk Statistics, Words Xfered • Solaris, AIX: blocks Disk Statistics, Blocks Read/Written • NT: bytes NT Physical Disk, Disk Read/Write C4P075

  8. I/O Metrics: Throughput • Disk accesses (i.e. transfers) • Number of times an I/O request was made of the disk • Size of data transfer can vary • Doesn’t matter where the I/O is actually serviced: • Physical disk (seek, latency, and data transfer) • Cache on the disk • Cache on the disk controller • Doesn’t matter whether RAID or non-RAID • Similar metrics collected for UNIX/NT • UNIX Disk Statistics, Transfers • NT NT Physical Disk, Disk Transfers/Sec C4P075

  9. I/O Metrics: Throughput • Disk reads/writes • Number of times a read vs. write I/O request was made of the disk • Size of data transfer can vary • Different metrics collected for UNIX/NT • Solaris, AIX Disk Statistics, Blocks Read/Written • HP, OSF Not Available • NT NT Physical Disk, Disk Read/Write Bytes/Sec • Reported in Analyze/Predict in 4 KB=I/O rates C4P075

  10. I/O Metrics: Utilization (Active Time) • Disk utilization (active time) • Amount of time disk was observed to be actively servicing an I/O request • Doesn’t matter where the I/O is actually serviced: • Physical disk (seek, latency, and data transfer) • Cache on the disk • Cache on the disk controller • Doesn’t matter whether RAID or non-RAID • Should reflect the relative efficiency of I/O processing when compared with disk throughput measures • Use disk service time for this (service time = utilization / IOs) C4P075

  11. I/O Metrics: Utilization (Active Time) • Disk active time • Different metrics collected for UNIX/NT • UNIX Disk Statistics, Active Time • NT NT Physical Disk, % Disk Time • Windows 2000 NT Physical Disk, % Idle Time • Windows/NT metrics are reinterpreted by Analyze • Perfmon caps calculated utilization at 100% • Observations of collected Windows/NT disk data show utilizations well over 100% • Analyze scales all collected NT times down • Perfmon and Analyze/Predict will not match C4P075

  12. I/O Metrics Collection Issues • If “iostat” can’t see it, the collector can’t collect it • The OS is supplying the metrics • If the metrics are missing or incorrect, both “iostat” and PATROL-Perform/Predict, etc. will report the same • Problem needs to be addressed by the OS vendor • Refer any questions about valid I/O metrics to BMC Technical Support • Always need to know the exact platform (e.g. HP 11.00, 64-bit) • Run iostat and the collector in parallel • Use current collector for the platform C4P075

  13. Baseline I/O Performance Analysis Overview • Observe key disk I/O metrics from baseline measurements • Identify I/O patterns • For the system • For a disk or group of disks • Distribution amongst disks • For a workload/transaction • Determine how important I/O is to overall performance C4P075

  14. Baseline I/O Performance Analysis Overview • Observe key disk I/O metrics from baseline measurements • Identify I/O performance characteristics • Relative speed of I/O processing • Read/write ratios • Blocksize used • Disk utilization objectives • Distribution amongst disks C4P075

  15. Baseline Case Study CPU pattern doesn’t precisely match I/O pattern C4P075

  16. Baseline Case Study • I/O is dominated by one oracle instance, but there are other contributors • Study patterns within days and across days, weeks, etc. C4P075

  17. Baseline Case Study • I/O is the major component of response time during prime time C4P075

  18. Baseline Case Study Distribution of I/O amongst disks is fairly even C4P075

  19. I/O Analysis Technique: CUTDISK • How to filter I/O data so only the important disks are studied? • Use “CUT DISK” feature • In Analyze • In Manager • If already specified in .an file input to Manager, don’t need Manager specification, too • Analyze/Predict reports shorter, Visualizer files smaller, Visualizer database smaller, Visualizer graphics easier to present C4P075

  20. I/O Analysis Technique: CUTDISK • Concept is to aggregate I/O from less utilized disks, preserve important disks individually • I/Os are NOT removed from the model • Choose appropriate threshold • I/O rate or Disk utilization may be used • Threshold value can be set for a specific purpose • Setting of 0 removes only disks which are not used at all • Setting of 5% utilization removes most disks • Paging disks are never removed C4P075

  21. I/O Analysis Technique: CUTDISK • Specify under Options, Cut Disk Options in Analyze C4P075

  22. I/O Analysis Technique: CUTDISK • Specify under Options, Advanced Features in Manager C4P075

  23. Baseline Case Study • Observe Disk Utilization patterns Utilizations mostly even, most under 40% C4P075

  24. Baseline Case Study • Observe Disk processing efficiency Looks good! Most service times under 5 ms per 4 KB transfer. A few outliers could use a closer look … C4P075

  25. Baseline Case Study • Look at ssd4 High service time isn’t so high after all: 12.69 transfers divided by 9.85 I/Os is 1.3. That means 12.11 service time is for 1.3 actual data transfers or 9.3 ms per physical transfer. C4P075

  26. Baseline Case Study • Look at ssd3 High service time isn’t really high here either: 10.66 transfers divided by 1.37 I/Os is 7.8. That means 53.84 service time is for 7.8 actual data transfers or 6.9 ms. Another way to think about this is that the average blocksize is 4 KB / 7.8 or .5 KB. C4P075

  27. Baseline Case Study • In fact, good (larger) blocksizes explain the good disk performance These graphics show roughly a 2:1 ratio between I/Os and transfers, or an 8 KB blocksize C4P075

  28. Baseline Case Study Conclusion • Even though I/O is a major contributor to response time, there are no obvious tuning opportunities • Continue to study the key I/O metrics over time • Identify trends in I/O performance C4P075

  29. What-if I/O Performance Analysis Overview • Via the Predict model, you can change: • I/O patterns • For the system • Change in workload volume • Change in the types of workloads • For a disk or group of disks • Distribution amongst disks • Change in amount of transaction I/O required C4P075

  30. What-if I/O Performance Analysis Overview • Via the Predict model, you can change: • I/O performance characteristics • Relative speed of I/O processing • Disk configuration change • Blocksize used C4P075

  31. What-if I/O Performance Analysis Overview • Predict shows how this affects performance • Performance objectives • Workload/transaction response objectives • Disk utilization objectives • Reports I/O patterns • System • Distribution amongst disks • Reports individual disk performance • Can view results in Predict and/or Visualizer C4P075

  32. What-if Case Study • Management wants to know how performance will change if a new RAID disk technology is implemented • Study strategy • Perform Visualizer analysis of baseline I/O performance characteristics, build baseline model • Perform Visualizer analysis of benchmark of I/O using new disk technology (IBM “Shark”) • Use Predict to do ‘what-if’ C4P075

  33. What-if Case Study: Benchmark Data Analysis • Benchmark demonstrates substantial I/O rate • Since current system has high I/O rates, a subset of the benchmark will be studied C4P075

  34. What-if Case Study: Benchmark Data Analysis • Selected subset of the benchmark C4P075

  35. What-if Case Study: Benchmark Data Analysis • Key I/O characteristics: I/Os vs. transfers Ratio of I/Os to transfers is about 5.7, or 23 KB per native I/O access C4P075

  36. What-if Case Study: Benchmark Data Analysis • Key I/O characteristic: reads vs. writes Ratio of reads to writes is about 1.5:1 C4P075

  37. What-if Case Study: Benchmark Data Analysis • Key I/O characteristic: service time for 4 KB I/O Predominant service time is about .5 ms C4P075

  38. What-if Case Study: Benchmark Data Analysis • Key I/O characteristic: service time for 4 KB I/O View by controller, disks over 5% utilization Note less efficiency at lower I/O load C4P075

  39. What-if Case Study: Change Model • Only one change is needed in the Predict model • Set the disk service time/IO according to the benchmark • DO NOT use the hardware table method because more specific info is available • Hardware table method applies ratio of new disk type to current disk type • Both disk types must be in the hardware table • Baseline disk type must be specified C4P075

  40. What-if Case Study: Change Model • Model must be baselined • Two methods for changing service time • Edit the disk service time/IO in the GUI • Use a command file if there are many disks Command file format MODIFY DISK hdisk10 EDISKTIME .5 MODIFY DISK hdisk11 EDISKTIME .5 Etc. C4P075

  41. What-if Case Study: Modeling Results • Model is evaluated and net change is observed << Baseline What–if >> C4P075

  42. What-if Case Study: Modeling Results • Relative reduction in response time reported with relative response time Reduction of 26% for the workload of interest C4P075

  43. What-if Case Study: Modeling Results • Why not a larger reduction? << Baseline What-if >> New service time/utilization is about 75% of baseline (.5 ms / .65 ms) for the disks doing the most I/O C4P075

  44. What-if Case Study: Modeling Results • What else will improve performance more? More even I/O distribution in benchmark C4P075

  45. What-if Case Study: Modeling Results • What else will improve performance more? Possible use of more optimistic service time, e.g. .45 ms observed with CUTDISK set at 100 IO/sec Should confirm with more benchmark data and/or vendor C4P075

  46. What-if Case Study Conclusion • Change to new technology will • Reduce I/O service time • Reduce I/O wait time • From reduced utilization (due to service time decrease) • From better I/O distribution (due to more even utilizations) • Reduction not as large as expected because current I/O performance is already good (.65 ms vs. .5 ms) • Allows for additional workload growth compared with current technology C4P075

More Related