540 likes | 739 Views
VSP1999 esxtop for Advanced Users . Name, Title, Company. Disclaimer. This session may contain product features that are currently under development.
E N D
VSP1999esxtop for Advanced Users Name, Title, Company
Disclaimer • This session may contain product features that are currently under development. • This session/overview of the new technology represents no commitment from VMware to deliver these features in any generally available product. • Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Technical feasibility and market demand will affect final delivery. • Pricing and packaging for any new technologies or features discussed or presented have not been determined.
vSphere Performance Management Tools (1 of 2) • vCenter Alarms • Relies on static thresholds • Alarm trigger may not always indicate an actual performance problem • vCenter Operations • Aggregates metrics into workload, capacity and health scores • Relies on dynamic thresholds • vCenter Charts • Historical trends • Post mortem analysis, comparing metrics
vSphere Performance Management Tools (2 of 2) • esxtop/resxtop • For live troubleshooting and root cause analysis • esxplot, perfmon and other tools can be used for offline analysis
Performance Snapshot • For complicated problems • Technical support may ask you for a performance snapshot for offline analysis
About This Talk This talk will focus on the esxtop counters using illustrative examples esxtop manual: • http://www.vmware.com/pdf/vsphere4/r41/vsp_41_resource_mgmt.pdf Interpreting esxtop statistics • http://communities.vmware.com/docs/DOC-11812 Previous vmworld talks: • VMworld 2008 - http://vmworld.com/docs/DOC-2356 • VMworld 2009 - http://vmworld.com/docs/DOC-3838 • VMworld 2010 - http://www.vmworld.com/docs/DOC-5101
esxtop Screens Screens • c: cpu (default) • m: memory • n: network • d: disk adapter • u: disk device (added in ESX 3.5) • v: disk VM (added in ESX 3.5) • i: Interrupts (added in ESX 4.0) • p: power management (added in ESX 4.1) VM VM VM VM VMkernel CPU Scheduler Memory Scheduler Virtual Switch vSCSI c, i, p m n d, u, v
vCPU and VM Count World, VM and vCPU count
VMWAIT %WAIT - %IDLE More about this later…
CPU Clock Frequency in Different P-states CPU clock frequency in different P-states P-states are visible to ESX only when power management setting in the BIOS is set to “OS Controlled” More about this later…
Failed Disk IOs Failed IOs are now accounted separately from successful IOs
VAAI: Block Deletion Operations New set of VAAI stats for tracking block deletion VAAI : vStorage API for Array Integration
Low-Latency Swap (Host Cache) Low-Latency (SSD) Swap
CPU State Times WAIT CSTP RUN RDY SWPWT MLMTD Guest I/O IDLE blocked Elapsed Time VMWAIT
CPU Usage Accounting - OVRLP + SYS USED = RUN RUN OVRLP System Service SYS USED could be < RUN if the CPU is not running at its rated clock frequency
Impact of P-States %USED: CPU usage with reference to rated base clock frequency %UTIL: CPU utilization with reference to current clock frequency %RUN: CPU occupancy time
Factors That Affect VM CPU Usage Accounting • Chargeback • %SYS time • CPU frequency scaling • Turbo boost • USED > (RUN – SYS) • Power management • USED < (RUN – SYS) • Hyperthreading
CPU Usage: With CPU Clock Frequency Scaling VM is running all the time but uses only 75% of the clock frequency. Power savings enabled in BIOS.
Hyperthreading PCPU Core HT On HT Off ESX scheduler tries to avoid sharing the same core
CPU Usage: Without Core Sharing USED is > 100 due to Turbo Boost Two VMs running on different cores
CPU Usage: With Core Sharing Two VMs sharing the same core %LAT_C counter shows the CPU time unavailable to due to core sharing
Performance Impact of Swapping Some swapping activity Time spent in blocked state due to swapping
NFS Connectivity Issue (1 of 2) I/O activity to NFS datastore System time charged for NFS activity
NFS Connectivity Issue (2 of 2) No I/O activity on the NFS datastore VM is not using CPU VM blocked, connectivity lost to NFS datastore
Snapshot Revert Reads in MB from VM check point file Not accounted in VM disk I/O traffic But can be seen in adapter view
Wide-NUMA Support in ESX 5.0 2 x 16G NUMA Nodes 1 vCPU VM 24G vRAM exceeds one NUMA node 1 home NUMA node assigned
Wide-NUMA Support in ESX 5.0 2 x 16G NUMA Nodes 8vCPUs, exceeds one NUMA node 24G vRAM exceeds one NUMA node 2 Home NUMA nodes assigned
Network Packet Drops Packet drops at the vSwitch Excessive Ready time Max CPU limited
Disk I/O Latencies Application Guest OS iostat/perfmon KAVG = GAVG – DAVG VMM vSCSI Time spent in ESX storage stack is minimal, for all practical purposes KAVG ~= QAVG In a well configured system QAVG should be zero ESX Storage Stack KAVG QAVG Driver HBA GAVG Fabric Array SP DAVG
Disk I/O Queuing Application Guest OS D(/L)QLEN can change dynamically when SIOC is enabled GQLEN VMM vSCSI ESX Storage Stack WQLEN DQLEN Driver Reported in esxtop HBA AQLEN GQLEN – Guest Queue AQLEN – Adapter Queue WQLEN – World Queue D(/L)QLEN – LUN Queue SQLEN – Array SP Queue Fabric Array SP SQLEN
Max IOPS = Max Outstanding IOs / Latency For example, with 64 outstanding IOs and 4msec average latency Max IOPS = 64/4ms = 16,000
Disk I/O Queuing – Device Queue IO commands in Flight IO commands waiting in Queue Device Queue length, modifiable via driver parameter
Disk I/O Queuing – World Queue World ID World Queue Length – modifiable Disk.SchedNumRequestOutstanding
Device Queue Full Queuing issue KAVG is non-zero 32 IOs in flight and 32 Queued LUN Queue depth is 32
Disk I/O Queuing – Adapter Queue Different adapters have different queue size Adapter Queue can come into play if the total outstanding IOs exceeds the adapter queue
Takeaways • esxtop is great for troubleshooting a diverse set of problems • You can do root-cause analysis by co-relating statistics from different screens • Good understanding of the counters is essential for accurate troubleshooting • esxtop is not designed for performance management • There are various other tools for vSphere performance management