Performance Troubleshooting

Performance Troubleshooting Valentin Bondzio, Research Engineer

Agenda Approaching Performance Issues Esxtop introduction (Troubleshooting Examples) References

Approaching Performance Issues Perception “XY Problem” Comparison Benchmark / Dependencies Tools

Perception • Subjective and generic, not quantified: • “A user reported that his application is slow.“ • "Our VM used to be much faster.“ • Somewhat quantified, only symptoms: • "CPU usage during a network file transfer is 90%.“ • "The VM seems to hang whenever I start a program.“ • Symptoms might be miles away from the root cause, e.g.: • A VM has a noticeable time-skew and lags behind • Root cause: Antivirus scans at the same time -> stress on the storage

“XY Problem” • Wrong conclusions about the issue lead to the wrong questions • An example: <problem X> Alice’s car won’t start <problem Y> She asks Bob to help her replace the battery … <problem Y> The car still does not start <problem X> The real issue is no gas in the tank • Keeping an open mind will reduce the time wasted • Approach the issue from all sides and don’t rush to conclusions • Take note of all the symptoms and the state of the environment

Comparisons • Common comparisons are: • The old system vs. the new one • A physical vs. a virtual system • This usually means different settings or underlying hardware • Example: • The CPUs in the old box might be 2 generations behind, but it has twice as many • The underlying RAID layout in the new system is different • Do not compare apples with oranges • Make sure the workload / benchmark is consistent and repeatable • Keep the configuration as equal as possible, example: • A 4 core physical system should be compared to a 4 vCPU VM that is not contended

Benchmarks / Dependencies • Is the benchmark reproducible? • Do not use the live system where e.g. the amount of users might vary • Be aware that most benchmarks stress multiple components, e.g.: • IO tests from within the VM will also stress the CPU • A file copy over the network could also be affected by the storage speed (r/w) • The goal is to find the bottleneck • identify the workload pattern of the production • benchmark components (CPU, Memory, Network, Disk) on their own

Tools • Performance Charts in vCenter Server • use it to check for patterns across multiple VMs / Hosts / Datastores • compare current loads to ones in the past • esxtop • our “goto” tool, enough granularity for 99% of issues • vscsiStats • Identify IO pattern • In Guest Tools • Iometer / Iozone • Process Explorer / atop

Esxtop introduction Navigation CPU Memory

Navigation (“V”) • ‘V’ show VM’s only

Navigation (Views and Fields) • Esxtop Views • c:cpu, i:interrupt, m:memory, n:network, d:disk adapter, u:disk device, v:disk VM, p:power mgmt • ‘f’ Fields • ‘h’ Help

CPU (USED / UTIL) • PCPU USED (%) • “effective work”, non-halted cycles in reference to the nominal frequency • PCPU UTIL (%) • non-halted cycles in reference to the elapsed time with current frequency • CORE UTIL (%) • only displayed when Hyper-Threading is enabled 100 % UTIL 50 % UTIL 100 % USED 1.3 GHz 2.6 GHz 50 % USED 25 % USED

CPU (USED / UTIL) • Why is PCPU USED (%) different from PCPU UTIL (%)? • Frequency scaling • Downscaling (due to power management, e.g. Intel SpeedStep) • ‘p’ Power Management View • Upscaling (due to dynamic overclocking, e.g. Intel Turbo Boost) • Hyper-Threading, ESXi 5.0 charges 62.5% per logical CPU (concurrent use) '+' means busy, '-' means idle. (1) PCPU 0: +++++----- (UTIL: %50 / USED: %50) PCPU 1: -----+++++ (UTIL: %50 / USED: %50) (2) PCPU 0: +++++----- (UTIL: %50 / USED: %31.25) PCPU 1: +++++----- (UTIL: %50 / USED: %31.25) (3) PCPU 0: +++++----- (UTIL: %50 / USED: %42.5, i.e. %30 + %20/1.6) PCPU 1: ---+++++-- (UTIL: %50 / USED: %42.5, i.e. %20/1.6 + %30)

CPU (general per VM counters) • %USED • amount of CPU usage that is accounted for this world / VM • %RUN • percentage of total scheduled runtime • %RDY (Ready Time) • percentage of time the VM was ready to run but not scheduled • %MLMTD (Max Limited) • percentage of time not scheduled due to a CPU Limit (part of %RDY) • %SWPWT (Swap Wait) • amount of time the VM was not scheduled due to a memory swap in from disk

CPU related (limits) • Two VMs on the same vSwitch, VM1 responds slow to requests • In this example, represented by Ping* • VM1 is busy • high %RDY time indicates that the VM is contended for CPU resources • %RDY = %MLMTD means all of the ready time is caused by a CPU limit *Ping is not a performance benchmark! In this case just an easy replacement and visualisation for server requests.

CPU related (limits) • Check that there is a CPU limit with the ‘h’ field (CPU ALLOC) • AMAX indicates a 2000 MHz limit (1000 MHz per vCPU) • removing the limit will normalize the responsiveness of VM1

CPU related (fairness) • Performance not as good as expected on Xeon 5500 and later • Intel Hyper-Threading is enabled • Performance degradation especially noticeable if: • CPU utilization of the host is higher 50% • the workload has a particular kind of bursty CPU usage pattern • The fairness scheduling algorithm works different with HT enabled • VMs that lag behind in “vtime” are given a full core for each vCPU to catch up • Equal to setting the Hyperthreaded Core Sharing Mode for that VM to “None” + HT2 Enable HT Core (no HT) HT1

CPU related (fairness) • Fairness is important to honor shares, reservations and limits • Defaults are performing good in most scenarios • Some workloads benefit from a higher “fairness threshold” • “HaltingIdleMsecPenalty” and “HaltingIdleMsecPenaltyMax” • Controls how far behind a VM can fall before it will be given a full core • “HIMP” is per vCPU/ “HIMPmax” is per VM • Not much performance benefit with more than HIMP = 2000 / HIMPmax= 16000 • Always remember to also increase HIMPmax, since the default is 800 • Setting is deprecated in ESXi 5.0 • Scheduler is enhanced to maximize throughput and fairness with HT • Upgrade to 5.0 not yet an option? Benchmark your systems with higher “HIMP” KB: HaltingIdleMsecPenalty Parameter: Guidance for Modifying vSphere's Fairness/Throughput Balance

CPU related (power capping) • The Guest seems to use a lot of CPU • Frequency Scaling • In most cases controlled by the BIOS power options

CPU Related (power capping) • Fujitsu • Consult your vendor! • HP • IBM

CPU related (power capping) • Check ESX host power policy • Contact your hardware support • BIOS or hardware issues could lead to frequency downscaling

Memory (Memory reclamation counters) • t • MCTLSZ (MemCtl Size) / MCTLTGT (MemCtl Target) • currently reclaimed memory via ballooning / balloon reclamation goal • > 0 target means active memory pressure • SWCUR (Swapped Currently) / SWTGT (Swap Target) • amount of Guest memory that is swapped to disk / target swap size • SWCUR is not actively reduced, the Guest must touch the pages • SWR/s (Swap Read) / SWW/s (Swap Write) • Guest memory in MB/s that is currently paged in / out by the hypervisor • > 0 SWR/s will affect the Guest (check %SWPWT in the CPU view)

Memory related (limit) • A Limit will deny physical resource even if they are available • While VM memory is swapped in, the VM will not be scheduled • Check %SWPWT in the CPU view

Memory related (limit) • It is still the most common reason for performance issues • You can check if a VM has a limit via the GUI, a PowerCLI query or esxtop: • memory view with the ‘f’ MEM ALLOC field • -1 is the default of “unlimited” • PowerCLI in only 3 lines: Get-VMGet-VMResourceConfigurationWhere-Object {$_.MemLimitMB -ne '-1'} • Check your Templates for long forgotten limit settings

Memory (general per VM counters) • MEMSZ (Memory Size) • Amount of assigned VM memory • TCHD (Touched) • recently used memory based on statistic sampling from the VMkernel • not comparable to Guest OS internal consumption counters • SHRDSVD (Shared Saved) • memory that is saved for this VM because of TPS (Transparent Page Sharing) • GRANT • memory that has been touched at least once by the VM • GRANT – SHRDSVD = VM memory that is backed by machine memory • COWH (Copy-On-Write Hinted) • memory that is already hashed and could be shared

Memory (counter mapping esxtop -> vSphere Client) • esxtop (memory view) • VM Resource Allocation tab • Consumed = GRANT - SHRDSVD+ OVHD • Active = TCHD • Host VM Summary tab • Host Mem - MB = Consumed • Guest Mem - % = Active

Memory related (memory consumption) • Host memory usage alarm • High “Host Mem”, very low “Guest Mem” • esxtop memory view • very low amount of shared pages • high amount of shareable pages

Memory related (memory consumption) • Compared to another VM with the same amount of memory • relative low “Host Mem” • very high amount of shared pages, mostly zero pages • relative low amount of shareable pages • Transparent Page Sharing can only share small (4KB) pages • VMs running with the hwMMU mode are backed with large (2MB) pages • This can result in ~ 20% performance improvement KBs: Use of large pages can cause memory to be fully allocated and Transparent Page Sharing (TPS) in hardware MMU systems

Memory related (memory consumption) • LPs will be broken down once the host becomes overcommitted • Identify the Monitor Mode of a VM • via the CLI: # grep"MONITOR MODE" vmware.log | cut -d ":" -f 4- vmx| MONITOR MODE: allowed modes : BT HV HWMMU vmx| MONITOR MODE: user requested modes : HWMMU vmx| MONITOR MODE: guestOS preferred modes: BT HWMMU HV vmx| MONITOR MODE: filtered list : HWMMU

Take Home Message • Check for unintended memory limits • Make sure power management is set according to your needs • Disable C1E in the BIOS for latency sensitive applications • Document performance issues thoroughly • Thank you for your time and enjoy VMworld Europe

References • CPU Scheduling / Memory Management • VMware vSphere: The CPU Scheduler in VMware ESX 4.1 • Understanding Memory Resource Management in VMware vSphere 5.0 • Memory Resource Management in VMware ESX Server • Understanding Host and Guest Memory Usage • Large Page Performance • Whitepaper for RVI (AMD) Performance Improvements • Whitepaper for EPT (Intel) Performance Improvements • Best Practices / Troubleshooting Guide • Performance Best Practices for VMware vSphere 5.0 • VMware vCenter Server Performance and Best Practices for vSphere 4.1 • Troubleshooting Performance Related Problems in vSphere 4.1 Environments

References • Virtual Machine Monitor • Software and Hardware Techniques for x86 Virtualization • Virtual Machine Monitor Execution Modes in VMware vSphere 4.0 • A Comparison of Software and Hardware Techniques for x86 Virtualization • Performance aspects of x86 virtualization • The Evolution of an x86 Virtual Machine Monitor (non-free) • vCenter Stats / General • vSphere Resource Management Guide 5.0 • vCenter Performance Counters • Understanding VirtualCenter Performance Statistics • Virtualization performance: perspectives and challenges ahead (non-free)

References • Esxtop • ESXtop for Advanced Users (2008) • ESXtop for Advanced Users (2009) • Troubleshooting using ESXTOP for Advanced Users (2010) • Interpreting esxtop 4.1 Statistics • esxtop (yellow bricks) (external)

Backup Slide: Disk (latency counters) • ‘u’ Disk Device View • GAVG/cmd (Guest) • Latency observed by the Guest OS • KAVG/cmd (VMkernel) • Latency introduced by the VMkernel • DAVG/cmd (Device) • Latency observed above the SCSI layer • QAVG/cmd (Queue) • Queue time that is introduced by the physical layer VM GAVG VMKernel Layers KAVG Linux SCSI Layer QAVG DAVG

Backup Slide: Disk (new in ESX 4.1) • New in ESX 4.1 • ‘v’ VM Disk view now using vSCSIstats • Now only displays VM worlds • ‘e’ will now show the backing disk instead of the VM’s sub worlds • vm-support –S will no longer collect per VM storage stats • Now incudes NFS stats

Acknowledgements • Many thanks to Emiliano Turra for answering so many questions over the last years!

Performance Troubleshooting

Performance Troubleshooting

Presentation Transcript

Microsoft SQL Azure Performance Considerations and Troubleshooting

Chapter 7: Troubleshooting Network Performance Issues

Maximizing Windows 7 Performance: Troubleshooting Tips

Troubleshooting

Caché Performance Troubleshooting Part II The System

Maximizing windows 8 performance, Troubleshooting tips

Performance Troubleshooting EPiServer CMS Web Sites

VSP3866 Performance Best Practices and Troubleshooting

Ensemble Performance Troubleshooting

Troubleshooting

Troubleshooting Performance

Troubleshooting Performance

Troubleshooting

Troubleshooting

Troubleshooting and performance monitoring

Troubleshooting and performance monitoring

Troubleshooting

Troubleshooting

Caché Performance Troubleshooting Part I The Application

Troubleshooting

Chapter 7: Troubleshooting Network Performance Issues

Troubleshooting