Understanding i o performance with patrol perform and patrol predict
Download
1 / 46

Understanding I/O Performance with PATROL-Perform and PATROL-Predict - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Understanding I/O Performance with PATROL-Perform and PATROL-Predict. Debbie Sheetz Sr. Staff Consultant BMC Software. I/O Performance Analysis Overview. I/O metric definitions Baseline I/O performance analysis What–if I/O performance analysis. How Important is I/O to Performance?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Understanding I/O Performance with PATROL-Perform and PATROL-Predict' - quant


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Understanding i o performance with patrol perform and patrol predict

Understanding I/O Performance with PATROL-Perform and PATROL-Predict

Debbie Sheetz

Sr. Staff Consultant

BMC Software


I o performance analysis overview
I/O Performance Analysis Overview PATROL-Predict

  • I/O metric definitions

  • Baseline I/O performance analysis

  • What–if I/O performance analysis

C4P075


How important is i o to performance
How Important is I/O to Performance? PATROL-Predict

  • Predict/Visualizer presents a unified view of the system so that the relative contributions of CPU and disk I/O can be assessed

    • Don’t solve a problem that you don’t have

CPU is the dominant factor here

C4P075


Source of i o metrics
Source of I/O Metrics PATROL-Predict

  • Key to understanding I/O is to know your metrics

    • Disks are reported/collected as they are defined/known to UNIX or NT

      • This may or may not correspond 1-to-1 to physical units

      • Disk configuration is collected from standard interface for the particular OS

      • Disk statistics are collected from standard interface for the particular OS (same metrics used by iostat, etc.)

    • Analyze/Predict interprets and reports based on these metrics

C4P075


I o configuration collection issues
I/O Configuration Collection Issues PATROL-Predict

  • Sometimes the disk configuration is reported as “Unknown”

    • Three possible causes

      • Disk configuration is not available from the OS

      • Standard interface to OS fails to return the disk configuration

      • Collected disk configuration is not matched by an entry in the hardware (.hrw) and .odm

  • RAID is not collected directly

  • This DOES NOT AFFECT the baseline metrics or baseline model calibration

    • For certain ‘what-if’ disk modeling scenarios, the disk must be identified

C4P075


Key i o metrics
Key I/O Metrics PATROL-Predict

  • A few metrics tell most of the story about disk I/O

    • Disk throughput

      • Data transferred (e.g. bytes, words, etc.)

        • Disk reads/writes

      • Disk accesses

    • Disk utilization (active time)

C4P075


I o metrics throughput
I/O Metrics: Throughput PATROL-Predict

  • Data transferred (e.g. bytes, words, etc.)

    • PATROL-Perform and Predict report I/Os in 4 KB units

      • Consistency for reporting (Analyze, Visualizer, Predict)

      • Ease of modeling I/O cross-node and cross-platform

    • Units measured vary by platform

      • HP, OSF: words Disk Statistics, Words Xfered

      • Solaris, AIX: blocks Disk Statistics, Blocks Read/Written

      • NT: bytes NT Physical Disk, Disk Read/Write

C4P075


I o metrics throughput1
I/O Metrics: Throughput PATROL-Predict

  • Disk accesses (i.e. transfers)

    • Number of times an I/O request was made of the disk

      • Size of data transfer can vary

      • Doesn’t matter where the I/O is actually serviced:

        • Physical disk (seek, latency, and data transfer)

        • Cache on the disk

        • Cache on the disk controller

      • Doesn’t matter whether RAID or non-RAID

    • Similar metrics collected for UNIX/NT

      • UNIX Disk Statistics, Transfers

      • NT NT Physical Disk, Disk Transfers/Sec

C4P075


I o metrics throughput2
I/O Metrics: Throughput PATROL-Predict

  • Disk reads/writes

    • Number of times a read vs. write I/O request was made of the disk

      • Size of data transfer can vary

    • Different metrics collected for UNIX/NT

      • Solaris, AIX Disk Statistics, Blocks Read/Written

      • HP, OSF Not Available

      • NT NT Physical Disk, Disk Read/Write Bytes/Sec

  • Reported in Analyze/Predict in 4 KB=I/O rates

C4P075


I o metrics utilization active time
I/O Metrics: Utilization (Active Time) PATROL-Predict

  • Disk utilization (active time)

    • Amount of time disk was observed to be actively servicing an I/O request

      • Doesn’t matter where the I/O is actually serviced:

        • Physical disk (seek, latency, and data transfer)

        • Cache on the disk

        • Cache on the disk controller

      • Doesn’t matter whether RAID or non-RAID

    • Should reflect the relative efficiency of I/O processing when compared with disk throughput measures

      • Use disk service time for this (service time = utilization / IOs)

C4P075


I o metrics utilization active time1
I/O Metrics: Utilization (Active Time) PATROL-Predict

  • Disk active time

    • Different metrics collected for UNIX/NT

      • UNIX Disk Statistics, Active Time

      • NT NT Physical Disk, % Disk Time

      • Windows 2000 NT Physical Disk, % Idle Time

    • Windows/NT metrics are reinterpreted by Analyze

      • Perfmon caps calculated utilization at 100%

      • Observations of collected Windows/NT disk data show utilizations well over 100%

      • Analyze scales all collected NT times down

      • Perfmon and Analyze/Predict will not match

C4P075


I o metrics collection issues
I/O Metrics Collection Issues PATROL-Predict

  • If “iostat” can’t see it, the collector can’t collect it

    • The OS is supplying the metrics

    • If the metrics are missing or incorrect, both “iostat” and PATROL-Perform/Predict, etc. will report the same

    • Problem needs to be addressed by the OS vendor

  • Refer any questions about valid I/O metrics to BMC Technical Support

    • Always need to know the exact platform (e.g. HP 11.00, 64-bit)

    • Run iostat and the collector in parallel

    • Use current collector for the platform

C4P075


Baseline i o performance analysis overview
Baseline I/O Performance Analysis Overview PATROL-Predict

  • Observe key disk I/O metrics from baseline measurements

    • Identify I/O patterns

      • For the system

      • For a disk or group of disks

        • Distribution amongst disks

      • For a workload/transaction

    • Determine how important I/O is to overall performance

C4P075


Baseline i o performance analysis overview1
Baseline I/O Performance Analysis Overview PATROL-Predict

  • Observe key disk I/O metrics from baseline measurements

    • Identify I/O performance characteristics

      • Relative speed of I/O processing

      • Read/write ratios

      • Blocksize used

      • Disk utilization objectives

        • Distribution amongst disks

C4P075


Baseline case study
Baseline Case Study PATROL-Predict

CPU pattern doesn’t precisely match I/O pattern

C4P075


Baseline case study1
Baseline Case Study PATROL-Predict

  • I/O is dominated by one oracle instance, but there are other contributors

  • Study patterns within days and across days, weeks, etc.

C4P075


Baseline case study2
Baseline Case Study PATROL-Predict

  • I/O is the major component of response time during prime time

C4P075


Baseline case study3
Baseline Case Study PATROL-Predict

Distribution of I/O amongst disks is fairly even

C4P075


I o analysis technique cutdisk
I/O Analysis Technique: CUTDISK PATROL-Predict

  • How to filter I/O data so only the important disks are studied?

  • Use “CUT DISK” feature

    • In Analyze

    • In Manager

    • If already specified in .an file input to Manager, don’t need Manager specification, too

  • Analyze/Predict reports shorter, Visualizer files smaller, Visualizer database smaller, Visualizer graphics easier to present

C4P075


I o analysis technique cutdisk1
I/O Analysis Technique: CUTDISK PATROL-Predict

  • Concept is to aggregate I/O from less utilized disks, preserve important disks individually

  • I/Os are NOT removed from the model

  • Choose appropriate threshold

    • I/O rate or Disk utilization may be used

    • Threshold value can be set for a specific purpose

      • Setting of 0 removes only disks which are not used at all

      • Setting of 5% utilization removes most disks

      • Paging disks are never removed

C4P075


I o analysis technique cutdisk2
I/O Analysis Technique: CUTDISK PATROL-Predict

  • Specify under Options, Cut Disk Options in Analyze

C4P075


I o analysis technique cutdisk3
I/O Analysis Technique: CUTDISK PATROL-Predict

  • Specify under Options, Advanced Features in Manager

C4P075


Baseline case study4
Baseline Case Study PATROL-Predict

  • Observe Disk Utilization patterns

Utilizations mostly even, most under 40%

C4P075


Baseline case study5
Baseline Case Study PATROL-Predict

  • Observe Disk processing efficiency

Looks good! Most service times under 5 ms per 4 KB transfer. A few outliers could use a closer look …

C4P075


Baseline case study6
Baseline Case Study PATROL-Predict

  • Look at ssd4

High service time isn’t so high after all: 12.69 transfers divided by 9.85 I/Os is 1.3. That means 12.11 service time is for 1.3 actual data transfers or 9.3 ms per physical transfer.

C4P075


Baseline case study7
Baseline Case Study PATROL-Predict

  • Look at ssd3

High service time isn’t really high here either: 10.66 transfers divided by 1.37 I/Os is 7.8. That means 53.84 service time is for 7.8 actual data transfers or 6.9 ms. Another way to think about this is that the average blocksize is 4 KB / 7.8 or .5 KB.

C4P075


Baseline case study8
Baseline Case Study PATROL-Predict

  • In fact, good (larger) blocksizes explain the good disk performance

These graphics show roughly a 2:1 ratio between I/Os and transfers, or an 8 KB blocksize

C4P075


Baseline case study conclusion
Baseline Case Study Conclusion PATROL-Predict

  • Even though I/O is a major contributor to response time, there are no obvious tuning opportunities

  • Continue to study the key I/O metrics over time

    • Identify trends in I/O performance

C4P075


What if i o performance analysis overview
What-if I/O Performance Analysis Overview PATROL-Predict

  • Via the Predict model, you can change:

    • I/O patterns

      • For the system

        • Change in workload volume

        • Change in the types of workloads

      • For a disk or group of disks

        • Distribution amongst disks

      • Change in amount of transaction I/O required

C4P075


What if i o performance analysis overview1
What-if I/O Performance Analysis Overview PATROL-Predict

  • Via the Predict model, you can change:

    • I/O performance characteristics

      • Relative speed of I/O processing

        • Disk configuration change

        • Blocksize used

C4P075


What if i o performance analysis overview2
What-if I/O Performance Analysis Overview PATROL-Predict

  • Predict shows how this affects performance

    • Performance objectives

      • Workload/transaction response objectives

      • Disk utilization objectives

    • Reports I/O patterns

      • System

      • Distribution amongst disks

    • Reports individual disk performance

    • Can view results in Predict and/or Visualizer

C4P075


What if case study
What-if Case Study PATROL-Predict

  • Management wants to know how performance will change if a new RAID disk technology is implemented

  • Study strategy

    • Perform Visualizer analysis of baseline I/O performance characteristics, build baseline model

    • Perform Visualizer analysis of benchmark of I/O using new disk technology (IBM “Shark”)

    • Use Predict to do ‘what-if’

C4P075


What if case study benchmark data analysis
What-if Case Study: Benchmark Data Analysis PATROL-Predict

  • Benchmark demonstrates substantial I/O rate

  • Since current system has high I/O rates, a subset of the benchmark will be studied

C4P075


What if case study benchmark data analysis1
What-if Case Study: Benchmark Data Analysis PATROL-Predict

  • Selected subset of the benchmark

C4P075


What if case study benchmark data analysis2
What-if Case Study: Benchmark Data Analysis PATROL-Predict

  • Key I/O characteristics: I/Os vs. transfers

Ratio of I/Os to transfers is about 5.7, or 23 KB per native I/O access

C4P075


What if case study benchmark data analysis3
What-if Case Study: Benchmark Data Analysis PATROL-Predict

  • Key I/O characteristic: reads vs. writes

Ratio of reads to writes is about 1.5:1

C4P075


What if case study benchmark data analysis4
What-if Case Study: Benchmark Data Analysis PATROL-Predict

  • Key I/O characteristic: service time for 4 KB I/O

Predominant service time is about .5 ms

C4P075


What if case study benchmark data analysis5
What-if Case Study: Benchmark Data Analysis PATROL-Predict

  • Key I/O characteristic: service time for 4 KB I/O

View by controller, disks over 5% utilization

Note less efficiency at lower I/O load

C4P075


What if case study change model
What-if Case Study: Change Model PATROL-Predict

  • Only one change is needed in the Predict model

    • Set the disk service time/IO according to the benchmark

    • DO NOT use the hardware table method because more specific info is available

      • Hardware table method applies ratio of new disk type to current disk type

      • Both disk types must be in the hardware table

      • Baseline disk type must be specified

C4P075


What if case study change model1
What-if Case Study: Change Model PATROL-Predict

  • Model must be baselined

  • Two methods for changing service time

    • Edit the disk service time/IO in the GUI

    • Use a command file if there are many disks

Command file format

MODIFY DISK hdisk10

EDISKTIME .5

MODIFY DISK hdisk11

EDISKTIME .5

Etc.

C4P075


What if case study modeling results
What-if Case Study: Modeling Results PATROL-Predict

  • Model is evaluated and net change is observed

<< Baseline

What–if >>

C4P075


What if case study modeling results1
What-if Case Study: Modeling Results PATROL-Predict

  • Relative reduction in response time reported with relative response time

Reduction of 26% for the workload of interest

C4P075


What if case study modeling results2
What-if Case Study: Modeling Results PATROL-Predict

  • Why not a larger reduction?

<< Baseline

What-if >>

New service time/utilization is about 75% of baseline (.5 ms / .65 ms) for the disks doing the most I/O

C4P075


What if case study modeling results3
What-if Case Study: Modeling Results PATROL-Predict

  • What else will improve performance more?

More even I/O distribution in benchmark

C4P075


What if case study modeling results4
What-if Case Study: Modeling Results PATROL-Predict

  • What else will improve performance more?

Possible use of more optimistic service time, e.g. .45 ms observed with CUTDISK set at 100 IO/sec

Should confirm with more benchmark data and/or vendor

C4P075


What if case study conclusion
What-if Case Study Conclusion PATROL-Predict

  • Change to new technology will

    • Reduce I/O service time

    • Reduce I/O wait time

      • From reduced utilization (due to service time decrease)

      • From better I/O distribution (due to more even utilizations)

  • Reduction not as large as expected because current I/O performance is already good (.65 ms vs. .5 ms)

  • Allows for additional workload growth compared with current technology

C4P075


ad