390 likes | 781 Views
Performance Troubleshooting. Valentin Bond zio , Research Engineer. Agenda. Approaching Performance Issues Esxtop introduction ( Troubleshooting Examples) References. Approaching Performance Issues . Perception “ XY Problem ” Comparison Benchmark / Dependencies Tools. Perception.
E N D
Performance Troubleshooting Valentin Bondzio, Research Engineer
Agenda Approaching Performance Issues Esxtop introduction (Troubleshooting Examples) References
Approaching Performance Issues Perception “XY Problem” Comparison Benchmark / Dependencies Tools
Perception • Subjective and generic, not quantified: • “A user reported that his application is slow.“ • "Our VM used to be much faster.“ • Somewhat quantified, only symptoms: • "CPU usage during a network file transfer is 90%.“ • "The VM seems to hang whenever I start a program.“ • Symptoms might be miles away from the root cause, e.g.: • A VM has a noticeable time-skew and lags behind • Root cause: Antivirus scans at the same time -> stress on the storage
“XY Problem” • Wrong conclusions about the issue lead to the wrong questions • An example: <problem X> Alice’s car won’t start <problem Y> She asks Bob to help her replace the battery … <problem Y> The car still does not start <problem X> The real issue is no gas in the tank • Keeping an open mind will reduce the time wasted • Approach the issue from all sides and don’t rush to conclusions • Take note of all the symptoms and the state of the environment
Comparisons • Common comparisons are: • The old system vs. the new one • A physical vs. a virtual system • This usually means different settings or underlying hardware • Example: • The CPUs in the old box might be 2 generations behind, but it has twice as many • The underlying RAID layout in the new system is different • Do not compare apples with oranges • Make sure the workload / benchmark is consistent and repeatable • Keep the configuration as equal as possible, example: • A 4 core physical system should be compared to a 4 vCPU VM that is not contended
Benchmarks / Dependencies • Is the benchmark reproducible? • Do not use the live system where e.g. the amount of users might vary • Be aware that most benchmarks stress multiple components, e.g.: • IO tests from within the VM will also stress the CPU • A file copy over the network could also be affected by the storage speed (r/w) • The goal is to find the bottleneck • identify the workload pattern of the production • benchmark components (CPU, Memory, Network, Disk) on their own
Tools • Performance Charts in vCenter Server • use it to check for patterns across multiple VMs / Hosts / Datastores • compare current loads to ones in the past • esxtop • our “goto” tool, enough granularity for 99% of issues • vscsiStats • Identify IO pattern • In Guest Tools • Iometer / Iozone • Process Explorer / atop
Esxtop introduction Navigation CPU Memory
Navigation (“V”) • ‘V’ show VM’s only
Navigation (Views and Fields) • Esxtop Views • c:cpu, i:interrupt, m:memory, n:network, d:disk adapter, u:disk device, v:disk VM, p:power mgmt • ‘f’ Fields • ‘h’ Help
CPU (USED / UTIL) • PCPU USED (%) • “effective work”, non-halted cycles in reference to the nominal frequency • PCPU UTIL (%) • non-halted cycles in reference to the elapsed time with current frequency • CORE UTIL (%) • only displayed when Hyper-Threading is enabled 100 % UTIL 50 % UTIL 100 % USED 1.3 GHz 2.6 GHz 50 % USED 25 % USED
CPU (USED / UTIL) • Why is PCPU USED (%) different from PCPU UTIL (%)? • Frequency scaling • Downscaling (due to power management, e.g. Intel SpeedStep) • ‘p’ Power Management View • Upscaling (due to dynamic overclocking, e.g. Intel Turbo Boost) • Hyper-Threading, ESXi 5.0 charges 62.5% per logical CPU (concurrent use) '+' means busy, '-' means idle. (1) PCPU 0: +++++----- (UTIL: %50 / USED: %50) PCPU 1: -----+++++ (UTIL: %50 / USED: %50) (2) PCPU 0: +++++----- (UTIL: %50 / USED: %31.25) PCPU 1: +++++----- (UTIL: %50 / USED: %31.25) (3) PCPU 0: +++++----- (UTIL: %50 / USED: %42.5, i.e. %30 + %20/1.6) PCPU 1: ---+++++-- (UTIL: %50 / USED: %42.5, i.e. %20/1.6 + %30)
CPU (general per VM counters) • %USED • amount of CPU usage that is accounted for this world / VM • %RUN • percentage of total scheduled runtime • %RDY (Ready Time) • percentage of time the VM was ready to run but not scheduled • %MLMTD (Max Limited) • percentage of time not scheduled due to a CPU Limit (part of %RDY) • %SWPWT (Swap Wait) • amount of time the VM was not scheduled due to a memory swap in from disk
CPU related (limits) • Two VMs on the same vSwitch, VM1 responds slow to requests • In this example, represented by Ping* • VM1 is busy • high %RDY time indicates that the VM is contended for CPU resources • %RDY = %MLMTD means all of the ready time is caused by a CPU limit *Ping is not a performance benchmark! In this case just an easy replacement and visualisation for server requests.
CPU related (limits) • Check that there is a CPU limit with the ‘h’ field (CPU ALLOC) • AMAX indicates a 2000 MHz limit (1000 MHz per vCPU) • removing the limit will normalize the responsiveness of VM1
CPU related (fairness) • Performance not as good as expected on Xeon 5500 and later • Intel Hyper-Threading is enabled • Performance degradation especially noticeable if: • CPU utilization of the host is higher 50% • the workload has a particular kind of bursty CPU usage pattern • The fairness scheduling algorithm works different with HT enabled • VMs that lag behind in “vtime” are given a full core for each vCPU to catch up • Equal to setting the Hyperthreaded Core Sharing Mode for that VM to “None” + HT2 Enable HT Core (no HT) HT1
CPU related (fairness) • Fairness is important to honor shares, reservations and limits • Defaults are performing good in most scenarios • Some workloads benefit from a higher “fairness threshold” • “HaltingIdleMsecPenalty” and “HaltingIdleMsecPenaltyMax” • Controls how far behind a VM can fall before it will be given a full core • “HIMP” is per vCPU/ “HIMPmax” is per VM • Not much performance benefit with more than HIMP = 2000 / HIMPmax= 16000 • Always remember to also increase HIMPmax, since the default is 800 • Setting is deprecated in ESXi 5.0 • Scheduler is enhanced to maximize throughput and fairness with HT • Upgrade to 5.0 not yet an option? Benchmark your systems with higher “HIMP” KB: HaltingIdleMsecPenalty Parameter: Guidance for Modifying vSphere's Fairness/Throughput Balance
CPU related (power capping) • The Guest seems to use a lot of CPU • Frequency Scaling • In most cases controlled by the BIOS power options
CPU Related (power capping) • Fujitsu • Consult your vendor! • HP • IBM
CPU related (power capping) • Check ESX host power policy • Contact your hardware support • BIOS or hardware issues could lead to frequency downscaling
Memory (Memory reclamation counters) • t • MCTLSZ (MemCtl Size) / MCTLTGT (MemCtl Target) • currently reclaimed memory via ballooning / balloon reclamation goal • > 0 target means active memory pressure • SWCUR (Swapped Currently) / SWTGT (Swap Target) • amount of Guest memory that is swapped to disk / target swap size • SWCUR is not actively reduced, the Guest must touch the pages • SWR/s (Swap Read) / SWW/s (Swap Write) • Guest memory in MB/s that is currently paged in / out by the hypervisor • > 0 SWR/s will affect the Guest (check %SWPWT in the CPU view)
Memory related (limit) • A Limit will deny physical resource even if they are available • While VM memory is swapped in, the VM will not be scheduled • Check %SWPWT in the CPU view
Memory related (limit) • It is still the most common reason for performance issues • You can check if a VM has a limit via the GUI, a PowerCLI query or esxtop: • memory view with the ‘f’ MEM ALLOC field • -1 is the default of “unlimited” • PowerCLI in only 3 lines: Get-VMGet-VMResourceConfigurationWhere-Object {$_.MemLimitMB -ne '-1'} • Check your Templates for long forgotten limit settings
Memory (general per VM counters) • MEMSZ (Memory Size) • Amount of assigned VM memory • TCHD (Touched) • recently used memory based on statistic sampling from the VMkernel • not comparable to Guest OS internal consumption counters • SHRDSVD (Shared Saved) • memory that is saved for this VM because of TPS (Transparent Page Sharing) • GRANT • memory that has been touched at least once by the VM • GRANT – SHRDSVD = VM memory that is backed by machine memory • COWH (Copy-On-Write Hinted) • memory that is already hashed and could be shared
Memory (counter mapping esxtop -> vSphere Client) • esxtop (memory view) • VM Resource Allocation tab • Consumed = GRANT - SHRDSVD+ OVHD • Active = TCHD • Host VM Summary tab • Host Mem - MB = Consumed • Guest Mem - % = Active
Memory related (memory consumption) • Host memory usage alarm • High “Host Mem”, very low “Guest Mem” • esxtop memory view • very low amount of shared pages • high amount of shareable pages
Memory related (memory consumption) • Compared to another VM with the same amount of memory • relative low “Host Mem” • very high amount of shared pages, mostly zero pages • relative low amount of shareable pages • Transparent Page Sharing can only share small (4KB) pages • VMs running with the hwMMU mode are backed with large (2MB) pages • This can result in ~ 20% performance improvement KBs: Use of large pages can cause memory to be fully allocated and Transparent Page Sharing (TPS) in hardware MMU systems
Memory related (memory consumption) • LPs will be broken down once the host becomes overcommitted • Identify the Monitor Mode of a VM • via the CLI: # grep"MONITOR MODE" vmware.log | cut -d ":" -f 4- vmx| MONITOR MODE: allowed modes : BT HV HWMMU vmx| MONITOR MODE: user requested modes : HWMMU vmx| MONITOR MODE: guestOS preferred modes: BT HWMMU HV vmx| MONITOR MODE: filtered list : HWMMU
Take Home Message • Check for unintended memory limits • Make sure power management is set according to your needs • Disable C1E in the BIOS for latency sensitive applications • Document performance issues thoroughly • Thank you for your time and enjoy VMworld Europe
References • CPU Scheduling / Memory Management • VMware vSphere: The CPU Scheduler in VMware ESX 4.1 • Understanding Memory Resource Management in VMware vSphere 5.0 • Memory Resource Management in VMware ESX Server • Understanding Host and Guest Memory Usage • Large Page Performance • Whitepaper for RVI (AMD) Performance Improvements • Whitepaper for EPT (Intel) Performance Improvements • Best Practices / Troubleshooting Guide • Performance Best Practices for VMware vSphere 5.0 • VMware vCenter Server Performance and Best Practices for vSphere 4.1 • Troubleshooting Performance Related Problems in vSphere 4.1 Environments
References • Virtual Machine Monitor • Software and Hardware Techniques for x86 Virtualization • Virtual Machine Monitor Execution Modes in VMware vSphere 4.0 • A Comparison of Software and Hardware Techniques for x86 Virtualization • Performance aspects of x86 virtualization • The Evolution of an x86 Virtual Machine Monitor (non-free) • vCenter Stats / General • vSphere Resource Management Guide 5.0 • vCenter Performance Counters • Understanding VirtualCenter Performance Statistics • Virtualization performance: perspectives and challenges ahead (non-free)
References • Esxtop • ESXtop for Advanced Users (2008) • ESXtop for Advanced Users (2009) • Troubleshooting using ESXTOP for Advanced Users (2010) • Interpreting esxtop 4.1 Statistics • esxtop (yellow bricks) (external)
Backup Slide: Disk (latency counters) • ‘u’ Disk Device View • GAVG/cmd (Guest) • Latency observed by the Guest OS • KAVG/cmd (VMkernel) • Latency introduced by the VMkernel • DAVG/cmd (Device) • Latency observed above the SCSI layer • QAVG/cmd (Queue) • Queue time that is introduced by the physical layer VM GAVG VMKernel Layers KAVG Linux SCSI Layer QAVG DAVG
Backup Slide: Disk (new in ESX 4.1) • New in ESX 4.1 • ‘v’ VM Disk view now using vSCSIstats • Now only displays VM worlds • ‘e’ will now show the backing disk instead of the VM’s sub worlds • vm-support –S will no longer collect per VM storage stats • Now incudes NFS stats
Acknowledgements • Many thanks to Emiliano Turra for answering so many questions over the last years!