1 / 35

Understanding Performance Monitoring in VMware VI3 (based on VMworld 2007 – TA64)

Understanding Performance Monitoring in VMware VI3 (based on VMworld 2007 – TA64). Sources of Performance Problems. Bottom Line: virtual machines share physical resources Over-commitment of resources can lead to poor performance

MikeCarlo
Download Presentation

Understanding Performance Monitoring in VMware VI3 (based on VMworld 2007 – TA64)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Performance Monitoring in VMware VI3(based on VMworld 2007 – TA64)

  2. Sources of Performance Problems • Bottom Line: virtual machines share physical resources • Over-commitment of resources can lead to poor performance • Imbalances in the system can also lead to slower-than-expected performance • Configuration issues can also contribute to poor performance

  3. Tools for Performance Monitoring/Analysis • VirtualCenter client (VI client): per-host stats and per-cluster statistics • Esxtop: per-host statistics • SDK: allows users to collect only the statistics they want • All tools use same mechanism to retrieve data (special vmkernel calls)

  4. VI Client screenshot Chart Type Real-time vs. Historical Object Counter type Stats type Rollup

  5. Real-time vs. Historical stats • VirtualCenter stores statistics at different granularities Samples are “rolled up” (averaged) for next time interval

  6. Stacked vs. Line charts • Line • Each instance shown separately • Stacked • Graphs are stacked on top of each other • Only applies to certain kinds of charts, e.g.: • Breakdown of Host CPU MHz by Virtual Machine • Breakdown of Virtual Machine CPU by VCPU

  7. Esxtop Host information Per-VM/world information Per-host statistics Shows CPU/Memory/Disk/Network on separate screens Sampling frequency (refresh every 5s by default) Batch mode (can look at data offline with perfmon)

  8. SDK • Use the VIM API to access statistics relevant to a particular user • Can only access statistics that are exported by the VIM API (and thus are accessible via esxtop/VI client)

  9. Shares example • Change shares for VM • Dynamic reallocation • Add VM, overcommit • Graceful degradation • Remove VM • Exploit extra resources

  10. ESX CPU Scheduling • World states (simplified view): • ready = ready-to-run but no physical CPU free • run = currently active and running • wait = blocked on I/O • Multi-CPU Virtual Machines => gang scheduling • Co-run (latency to get vCPUs running) • Co-stop (time in “stopped” state)

  11. ESX, VirtualCenter, and Resource pools • Resource Pool extends proportional-share scheduling to groups of hosts (a cluster) • VirtualCenter can VMotion VMs to provide resource balance (DRS)

  12. Tools for monitoring CPU performance: VI Client • Basic stuff • CPU usage (percent) • CPU ready time (but ready time by itself can be misleading) • Advanced stuff • CPU wait time: time spent blocked on IO • CPU extra time: time given to virtual machine over reservation • CPU guaranteed: min CPU for virtual machine • Cluster-level statistics • Percent of entitled resources delivered • Utilization percent • Effective CPU resources: MHz for cluster

  13. VI Client CPU screenshot Note CPU milliseconds and percent are on the same chart but use different axes

  14. Cluster-level information in the VI Client • Utilization % describes available capacity on hosts (here: CPU usage low, memory usage medium) % Entitled resources delivered: best if all 90-100+.

  15. CPU performance analysis: esxtop • PCPU(%): CPU utilization • Per-group stats breakdown • %USED: Utilization • %RDY: Ready Time • %TWAIT: Wait and idling time • Co-Scheduling stats (multi-CPU Virtual Machines) • %CRUN: Co-run state • %CSTOP: Co-stop state • Nmem: each member can consume 100% (expand to see breakdown) • Affinity • HTSharing

  16. esxtop CPU screenshot 2-CPU box, but 3 active VMs (high %used) High %rdy + high %used can imply CPU overcommitment

  17. VI Client and Ready Time Ready time < used time Used time Used time ~ ready time: may signal contention. However, might not be overcommitted due to workload variability In this example, we have periods of activity and idle periods: CPU isn’t overcommitted all the time Ready time ~ used time

  18. Memory • ESX must balance memory usage for all worlds • Virtual machines, Service Console, and vmkernel consume memory • Page sharing to reduce memory footprint of Virtual Machines • Ballooning to relieve memory pressure in a graceful way • Host swapping to relieve memory pressure when ballooning insufficient • ESX allows overcommitment of memory • Sum of configured memory sizes of virtual machines can be greater than physical memory if working sets fit • VC adds the ability to create resource pools based on memory usage

  19. VI Client • Main page shows “consumed” memory (formerly “active” memory) • Performance charts show important statistics for virtual machines • Consumed memory • Granted memory • Ballooned memory • Shared memory • Swapped memory • Swap in • Swap out

  20. Virtual Machine Memory Metrics

  21. VI Client: VM list summary Host CPU: avg. CPU utilization for Virtual Machine Host Memory: consumed memory for Virtual Machine Guest Memory: active memory for guest

  22. Esxtop for memory: Host information PMEM: Total physical memory breakdown VMKMEM: Memory managed by vmkernel COSMEM: Service Console memory breakdown SWAP: Swap breakdown MEMCTL: Balloon information

  23. VI Client: Memory example for Virtual Machine Increase in swap activity No swap activity Swap in Balloon & target Swap out Consumed & granted Active memory Swap usage

  24. Disk • Disk performance is dependent on many factors: • Filesystem performance • Disk subsystem configuration (SAN, NAS, iSCSI, local disk) • Disk caching • Disk formats (thick, sparse, thin) • ESX is tuned for Virtual Machine I/O • VMFS clustered filesystem => keeping consistency imposes some overheads

  25. ESX Storage Stack • Different latencies for local disk vs. SAN (caching, switches, etc.) • Queuing within kernel and in hardware

  26. VI Client disk statistics • Mostly coarse-grained statistics • Disk bandwidth • Disk read rate, disk write rate (KB/s) • Disk usage: sum of read BW and write BW (KB/s) • Disk operations during sampling interval (20s for real-time) • Disk read requests, disk write requests, commands issued • Bus resets, command aborts • Per-LUN and aggregated statistics

  27. K: Kernel, D: Device, G: Guest Esxtop Disk Statistics • Aggregated statistics like VI Client • READS/s, WRITES/s, MBREAD/s, MBWRITE/s • Latency statistics • KAVG/cmd, DAVG/cmd, GAVG/cmd • Queuing information • Adapter (AQLEN), LUN (LQLEN), vmkernel (QUED)

  28. Disk performance example: VI Client Throughput with Cache (good) Throughput w/o Cache (bad)

  29. Latency seems high Disk performance example: esxtop After enabling cache, latency much better

  30. Guest OS TCP/IP virtual driver Virtual Device Virtual I/O stack physical driver Network • Guest to vmkernel • Address space switch • Virtual interrupts • Virtual I/O stack • Packet copy • Packet routing

  31. VI Client Networking Statistics • Mostly high-level statistics • Bandwidth • KBps transmitted, received • Network usage (KBps): sum of TX, RX over all NICs • Operations/s • Network packets received during sampling interval (real-time: 20s) • Network packets transmitted during sampling interval • Per-adapter and aggregated statistics • Per VM Stacked Graphs

  32. Esxtop Networking Statistics • Bandwidth • Receive (MbRX/s), Transmit (MbRX/s) • Operations/s • Receive (PKTRX/s), Transmit (PKTTX/s) • Configuration info • Duplex (FDUPLX), speed (SPEED) • Errors • Packets dropped during transmit (%DRPTX), receive (%DRPRX)

  33. Esxtop network output Setup A (10/100): Physical configuration Performance Setup B (GigE):

  34. Approaching Performance Issues • Make sure it is an apples-to-apples comparison • Check guest tools & guest processes • Check host configurations & host processes • Check VirtualCenter client for resource issues • Check esxtop for obvious resource issues • Examine log files for errors • If no suspects, run microbenchmarks (e.g., Iometer, netperf) to narrow scope • Once you have suspects, check relevant configurations • If all else fails…contact VMware

More Related