Understanding esxtop statistics

Understanding esxtop statistics Matthew Mancini, Phoenix VMUG Leader 02/25/2010 Phoenix VMUG

Disclaimer... • This presentation is not indented for any specific purpose. • These are simply my findings with ESXTOP • Use at your own risk...

Speaker bio • Matt Mancini • Leader of the Phoenix VMUG • Server Room Administrator at John C Lincoln • 20+ Years in IT • matt.mancini@vmugphoenix.com • Blog -- vmexplorer.blogspot.com

Agenda • References • What is esxtop • How to use it • - Basic commands • - Monitor CPU & Memory

References • Vmware Performance & Vmark Communities • http://communities.vmware.com/community/vmtn/general/performance?view=overview • ** Interpreting esxtop Statistics ** • http://communities.vmware.com/docs/DOC-9279 • Resource Management Guide • http://www.vmware.com/pdf/vi3_35/esx_3/r35u2/vi3_35_25_u2_resource_mgmt.pdf • Understanding Performance • http://communities.vmware.com/docs/DOC-5250 • Ready time • http://www.vmware.com/pdf/esx3_ready_time.pdf • Co-ScheduleingvCPU’s • http://communities.vmware.com/docs/DOC-4960

ESXTOP Translations... • Does anyone know what this means? Τα ελληνικά του σε με >> Its Greek to me How about these 2, can you translate them?

What is esxtop • Esxtop allows monitoring and collection of data for all system resources: CPU, memory, disk and network. (aka the 4 basic food groups) • When used interactively, these “food groups” data can be viewed on different types of screens; CPU, memory, network, and disk adapter statistics. • If used correctly it can interpret the performance of your ESX server and the VM’s that are running on it.

What is esxtop • resxtopvsesxtop • The resxtop and esxtop command‐line utilities provide a detailed look at how ESX Server uses resources in real time. You can start either utility in one of three modes: interactive (default), batch, or replay. • The fundamental difference between resxtop and esxtop is that you can use resxtop remotely (or locally), whereas esxtop can be started only through the service console of a local ESX Server host. • The resxtop utility is a Remote Command Line Interface (Remote CLI) command and before you can use any Remote CLI commands, you must download, install, and configure the Remote CLI virtual appliance.

How to Use it – Basic commands • To launch – type esxtop press enter • Can only be executed at Service Console and by the root user • Press q for Exit Press c for CPU • Press m for Memory Press d for Disk • Press n for Network Press f for fields • Press Space to refresh (default refresh is 5 seconds) • Help – • info esxtop << Detailed information • Esxtop –h << Command line Syntax

How to Use it – Basic commands • At the root# type esxtop and press enter • The default display is CPU metrics (Same as pressing c) • Press m for Memory

How to Use it – Basic commands • Press d for Disk • Press n for network

How to Use it – Basic commands • Press f for fields • Fields allow you to choose the fields you’d like displayed In this case I pressed b to remove the GID field then I press space to exit Now the GID field is not displayed Note - By exiting esxtop this resets the fields

How to Use it – Basic commands • Press o for order • Use a-h to change order • Uppercase move left • Lowercase to the right In this case I press ‘b’ to move the GID Field to the first position Now the GID & ID field are swapped Note - By exiting esxtop this resets the fields

How to Use it – Basic commands • Set the screen delay ‘–d’ • How quick the screen will refresh • Esxtop –d XX = where XX the time in seconds • Specifies the delay between screen updates. Default is 5 seconds. Minimum is 2 seconds. • You can change this with the interactive command 's'. If a delay of less than 2 seconds is specified then the delay is set to 2 seconds.

How to Use it – Monitor CPU • Uptime - The elapsed time since the server has been powered on • CPU load average - CPU loads based on 1 minute, 5 minutes, and 15 minutes, based on 6-second samples. • PCPU(%) - The percentage CPU utilization per physical CPU • Used total – Average of the PCPU’s • CCPU(%) - Percentages of total CPU time as reported by the ESX Service Console. • "us" is for percentage user time, "sy" is for percentage system time, "id" is for percentage idle time and "wa" is for percentage wait time. "cs/sec" is for the context switches per second recorded by the ESX Service Console.

How to Use it – Monitor CPU • Esxtop uses worlds and groups as the entities to show CPU usage. A world is an ESX Server VMkernel schedulable entity, similar to a process or thread in other operating systems. A group contains multiple worlds. • Note: the sum of the NWLD column equals the total of worlds at the top. • ID 50 -71 Names have been blanked out these are VM’s

How to Use it – Monitor CPU • Note that ID 7 has 12 NWLD • (NWLD = Number of Members in this world) • Press E to expand, type 7, press enter • ID 7 is now expanded and more detailed information is revealed

How to Use it – Monitor CPU • %RDY – A world in a run queue is waiting for CPU scheduler to let it run on a PCPU. %RDY accounts the percentage of this time. So, it is always smaller than 100% (clearest CPU contention indicator >5% is a concern) • %CSTP – high value indicates co-scheduling overhead • %USED – amount of CPU load incurred by the world

How to Use it – Drill down CPU • Other CPU Commands • R Sort resource pools, virtual machines, and worlds by the resource pool’s or virtual machine’s %RDY column. • N Sort resource pools, virtual machines, and worlds by the GID column. This is the default sort order. • ** V Display virtual machine instances only

How to Use it – Monitor Memory • It is important to note that some statistics refer to guest physical memory while others refer to machine memory. "Guest physical memory" is the virtual-hardware physical memory presented to the VM. "Machine memory" is actual physical RAM in the ESX host. Let's use the following figure to explain. In the figure, two VMs are running on an ESX host, where each block represents 4 KB of memory and each color represents a different set of data on a block. • Inside each VM, the guest OS maps the virtual memory to its physical memory. ESX Kernel maps the guest physical memory to machine memory. Due to ESX Page Sharing technology, guest physical pages with the same content can be mapped to the same machine memory page.

How to Use it – Monitor Memory Memory Over Commit 1-min, 5-min, 15-min. 1.00 = 100% • Server Uptime Worlds

How to Use – Monitor Memory • Memory overcommit, the basics. • In short: When you assign more RAM to your VMs than available in your host. • Good memory overcommit: When you assign more RAM to your VMs than available in your host BUT never cross the line where the amount of RAM that is USED by your VMs is more than available in your host. • Bad memory overcommit: When you assign more RAM to your VMs than available in your host AND cross the line where the amount of RAM that is USED by your VMs is more than available in your host. • Reference - • http://www.gabesvirtualworld.com/?p=930

How to Use it – Monitor Memory PMEM /MB: - is the total amount of machine memory in the server total: XXX cos – The amount of machine memory for the ESX Service Console "vmk" : the amount of machine memory being used by the ESX VMKernel. "other": the amount of machine memory being used by everything other than the ESX Service Console and ESX VMKernel. "other" contains not only the memory used by VM but also the user worlds that run directly on VMKernel. "free" : the amount of machine memory that is free.

How to Use it – Monitor Memory • The machine memory statistics for VMKernel. • "VMKMEM" (MB) total amount of machine memory managed by VMKernel. • “managed: XXX minfree” - the minimum amount of machine memory that VMKernel would like to keep free. This is because VMKernel needs to keep some amount of free memory for critical uses. "rsvd" : the amount of machine memory that is currently reserved. "rsvd" is the sum of three parts: (a) the reservation setting of the groups; (b) the overhead reservation of the groups; (c) "minfree“."ursvd" : the amount of machine memory that is currently unreserved. It is the memory available for reservation."state" : the free memory state.. • Q: How do I know my ESX box is under memory pressure?A: It is usually safe to say the ESX box is under memory pressure, if "state" is "hard" or "low". But, you need also check "SWAP" and "MEMCTL" to find whether you are experiencing memory problems. Basically, if there is not enough free memory and ESX are experiencing swapping or ballooning, ESX box is under memory pressure.

How to Use it – Monitor Memory • "COSMEM" (MB) The memory statistics reported by the ESX Service Console. "free" : the amount of idle machine memory."swap_t": the total swap configured."swap_f": the amount of swap free."r/s" : the rate at which memory is swapped in from disk."w/s" : the rate at which memory is swapped out to disk. • Q: What does it mean if I see a high r/s or w/s?A: Your console OS is swapping. It is highly likely that your COS free memory is low. You may either configure more memory for COS and restart your ESX box, or stop some programs running inside your COS.Q: Why can't I see this COSMEM line?A: You are using ESXi not classic ESX.

How to Use it – Monitor Memory • "PSHARE" (MB) The ESX page-sharing statistics. “Pshare/MB: XXXX shared“ - the amount of guest physical memory that is being shared."common": the amount of machine memory that is common across World(s)."saving": the amount of machine memory that is saved due to page-sharing.

How to Use it – Monitor Memory • "SWAP" (MB) The ESX swap usage statistics. "curr" : the current swap usage. This is the total swapped machine memory of all the groups. • "target": the swap usage expected to be. This is the total swap target of all the groups. "r/s" : the rate at which machine memory is swapped in from disk."w/s" : the rate at which machine memory is swapped out to disk

How to Use it – Monitor Memory • "MEMCTL" (MB) The memory balloon statistics. "curr" : the total amount of physical memory reclaimed by balloon driver. This is the total ballooned memory by the VMs."target": total amount of ballooned memory expected to be. This is the total ballooned targets of the VMs."max" : the maximum amount of physical memory reclaimable.

How to Use it – Drill down Memory • "MEMSZ" (MB) For a VM, it is the amount of configured guest physical memory • "SZTGT" (MB)This counter is used to determine future swapping and ballooning activities • TCHD" (MB)The amount of guest physical RAM recently used by the VM

How to Use it – Monitor Memory • "%ACTV” Percentage of active guest physical memory, current value. • "%ACTVS” Percentage of active guest physical memory, slow moving average. • "%ACTVF” Percentage of active guest physical memory, fast moving average. • "%ACTVN” Percentage of active guest physical memory in the near future. This is an estimated value. • "OVHDUW" (MB) Amount of overhead memory reserved for the vmx user world of a VM group. This is an internal counter. • "OVHD" (MB) Amount of overhead memory currently consumed by a VM. • "OVHDMAX" (MB) Amount of reserved overhead memory for the entire VM.

Next VMUG Meeting – • esxtop Stats Part II – • Disk • Network • Batch Mode

Understanding esxtop statistics