1 / 30

CPU Ready Time in VMware ESX Server

CPU Ready Time in VMware ESX Server. Bill Shelden bill.shelden@PERFMAN.com. CPU Ready Time in VMware ESX Server

yair
Download Presentation

CPU Ready Time in VMware ESX Server

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPU Ready Time in VMware ESX Server Bill Shelden bill.shelden@PERFMAN.com

  2. CPU Ready Time in VMware ESX Server A performance metric produced by VMware ESX Server is called CPU ready time. It measures the time a virtual CPU in a virtual machine running under ESX Server is ready to be dispatched but is not dispatched. CPU ready times are examined from a real ESX Server system and from a number of published benchmarks and are found to be too high to be explained solely by the contention experienced by virtual CPUs for the physical CPUs in the server running ESX Server. Some reasons for virtual CPUs accumulating CPU ready time when physical CPUs are available are examined. One such reason that has been extensively discussed is called co-scheduling and applies to SMP virtual machines. In multiprocessor servers an additional factor affects CPU ready time. Virtual CPUs that have been scheduled on a particular physical CPU will be given a preference to run on the same physical CPU again. In this case the ESX Server scheduler may choose to let a few cycles on a physical CPU stay idle rather than move a ready virtual CPU to another physical CPU. A model of these latter phenomena is discussed and the model’s predicted CPU ready times are compared to the real data and to the benchmark data. Abstract

  3. Topics • Investigate % CPU Ready Time on an internal PERFMAN server called LPPerfTest on server devclusterhost2 • Is the CPU Ready time measured reasonable? • Compare it to a simple model • Discuss % CPU Ready Time and its causes • Discuss a more robust model of an ESX Server system • Apply to uniprocessor benchmarks • Apply to mixed UNI and SMP benchmarks • Apply to devclusterhost2 • Conclusions

  4. A PERFMAN Internal ESX Hostdevclusterhost2 • 8 Physical CPUs • Running ESX Server 3.5 • 12 virtual machinesvCPUs • LPPerfTest 4 • MITest 2 • DevPortalSQL 2 • Win2008Ent 2 • DevPortalTest2 2 • DevSrvSMPT 1 • DevNas1 1 • DevPortalTest 1 • DevWiki 1 • Win64Test 1 • VirtualCenter2 1 • ISMqa2 1 • 19 total virtual CPUs

  5. Spike in % CPU Ready Time at 6 AM on devclusterhost2About 1800 seconds of CPU Ready Time

  6. LPPerfTest at 6 AM is an anomalyPERFMAN ROT is % CPU Ready Time < 5% for a VM

  7. How busy is the server devclusterhost2?Utilization of 8 cores is about 30% at 6 AM

  8. % CPU Ready Time in VMware ESX Server • References: • VMware ESX Server 3 Ready Time Observations • Co-scheduling SMP VMs in Vmware ESX Server • VMware vSphere 4: The CPU Scheduler in VMware ESX 4 • CPU ready time is the time a virtual machine must wait in a ready-to-run state before it can be scheduled on a CPU. • It is expressed as a percentage of the measurement interval • E.g. a VM with % CPU Ready Time of 5% in a 3600 interval is waiting in a ready-to-run state .05 x 3600 = 180 seconds. • Makes sense to view it in the context of the VM’s CPU service time • CPU Ready Time / CPU Busy Time • Call this CPU Ready Time per CPU Busy Time

  9. CPU Ready Time per CPU Busy Timefor devclusterhost2’s Virtual Machines

  10. Causes of CPU Ready time in VMware ESX Server • Physical CPUs are unavailable • Co-scheduling of SMP virtual machines • CPU preference in multiprocessor servers • Other reasons • Overall server utilization • Load correlation • Number of virtual machines • Number of virtual CPUs in the VMs

  11. Co-scheduling • Proportional-Share Based Algorithm • VM Priority is based on used CPU as a fraction of entitled CPU • Smaller means higher priority • Strict co-scheduling in ESX Server 2.x (2003) • Cumulative skew value for each vCPU • Progress is running or idling • Skew increases if not making progress • Idle vCPU does not accumulate skew • Once skew exceeds threshold, all sibling VMs must be co-started • Relaxed co-scheduling in ESX Server 3.x (2006) • Only those sibling VMs that are skewed must be co-started • Further relaxed co-scheduling in ESX Server 4. • Physical CPUs may be available while VMs are in a ready-to-run state.

  12. CPU Preference • Multiprocessor systems • A vCPU that has been scheduled on a particular CPU will be given preference to run on the same CPU again • Performance advantages of finding data in the CPU cache.

  13. Summary observations about devclusterhost2 • In a 1 hour interval: • VM’s in a server running ESX Server 3.5 are experiencing • 1800 seconds of CPU ready time (50% of 3600 secs) • 8640 seconds of CPU service time (30% of 8 pCPUs) • LPPerfTest, one of the virtual machines, is experiencing • 1069 seconds of CPU ready time • 4800 seconds of CPU service time • % CPU Ready Time for LPPerfTest = 29.7% > 5% ROT • It looks like LPPerfTest is the cause/victim of the problem because of the spike in its utilization at 6 AM. • The server has 8 physical CPUs and is running at about 30% busy • Is this reasonable? • I did not think so. Let’s investigate by modeling.

  14. First Model • Build a simulation model of a system with • 19 customers (19 vCPUs) • Contending for N physical CPUs where N = 8, 7, 6,,, • Providing about 8640 seconds of CPU service in a 3600 second interval • Examine the CPU queue times predicted by the model and compare to devclusterhost2 at 6 AM • The model used is the Machine Repair Model which has a well-known analytic solution • A simulation model was used.

  15. Machine Repair Model The repair center WS = Mean service time = Mean time to repair No. of Servers = No. of Repairmen = 2 Population = No. of Machines = 12 Delay Center with WS = Mean service time = Mean time to failure The shop floor

  16. Devclusterhost2 is behaving more like a 3 or 4 pCPU server running at 60-80% Busy

  17. Use a more realistic Model for a VMware ESX Host • Model characteristics • Number of server physical CPUs • Number of Virtual Machines • Virtual CPUs in each VM • Server Utilization of each VM • Population in each VM (number of processes) • VM dispatching • Co-scheduling for SMP VMs • CPU preference on servers with multiple physical CPUs • Apply to two benchmarks described in the paper • Uniprocessor benchmarks • Benchmarks on a 4-CPU server with mix of uniprocessor and SMP virtual machines • Apply to devclusterhost2

  18. VMware ESX Host Model VM2 Delay Center VM1 Delay Center Release vCPU Allocate vCPU CPU Release vCPU Allocate vCPU Server with 4 pCPUs • 2 Job Classes • One for each VM • Pop = WinMPL • Target Util/Tput • Allocate/Release node for each VM • Number of tokens = Number of vCPUs

  19. Summary of Benchmarks

  20. Uniprocessor Benchmarks (ESX Server 3.0) • Run on a server with a single physical CPU • No co-scheduling • No CPU preference • CPU Burner program set to consume 15% of a single physical CPU • 6 virtual machines each with one virtual CPU • Test started with single CPU burner in one virtual machine • The other five VM’s are idle • Every 10 minutes, another CPU burner program was started in another virtual machine • In the last 10 minutes, 6 VMs each running one copy of the CPU burner program

  21. Uni Benchmark and Model Results

  22. Benchmarks on 4-CPU Server (ESX Server 3.0) • Server has 4 physical CPUs • 10 minute (600 second) runs • Run 6 instances of the CPU burner program with each instance set to consume 50% of a single CPU • 6 x 50% of 1 CPU = 300% of 1 CPU • 300% of 1 CPU = 3 x 600 secs = 1800 secs • Utilization of 4-CPUs = 75% • Eight runs with combinations of VMs under ESX Server 3.0 • 6 UP • 5 UP 1 SMP • 4 UP 2 SMP • 3 UP 3 SMP • 2 UP 4 SMP • 1 UP 4 SMP 1 2-Burner • 4 SMP 2 2-Burners • 3 SMP 3 2-Burners

  23. 4-CPU Server benchmark results from paper

  24. 4-CPU Server BenchmarksModel only contention for Physical CPUs

  25. Model Contention for pCPUsplus Co-scheduling of SMP VMs

  26. Model pCPUs + Co-Scheduling + CPU Preference

  27. Summary of 4-CPU Server Models

  28. devclusterhost2 Model (ESX Server 3.5)

  29. Comparing Devclusterhost2June 10, 2010 and September 2, 2010

  30. Conclusions • % CPU Ready Time can be problematic in SMP VMs • It can be caused by Co-scheduling and CPU Preference • To limit CPU ready time consider: • Reducing the number of VMs • Reducing the load on the server • Reduce the number of virtual CPUs in VMs • Consider showing it as fraction of CPU Busy Time • ROT CPU Ready / CPU Busy < 0.2 for each VM

More Related