Friendly Virtual Machines for Efficient and Fair Resource Allocation

Friendly Virtual MachinesZhang,Bestavros etc., Boston Univ.ACM/USENIX VEE 2005 CSE 598c April 17, 2006 Bhuvan Urgaonkar

Problem Setting • Growing trend of hosting applications at third-party platforms • Two challenges • Isolation, security to co-located applications • Efficient and fair resource allocation • Virtualization seen as a promising approach for isolation • What about resource allocation?

Challenge - Resource Allocation in Hosting Environments • Traditional solutions • Over-provisioning => wasteful • Fair schedulers in the OS, dynamic provisioning, admission control • Complex • Deprive the application of meaningfully adapting its behavior to match available resources • Against the famous end-to-end argument developed in the networking community

End-to-end Argument • Clark et. al • A functionality should be pushed to the higher layer whenever possible • IP network implements packet forwarding, leaving congestion control to end systems • When applied to hosting platforms • Let the applications decide how many resources they need

How do VMs make end-end idea realizable? • In a traditional hosting system, applications would have to be modified • Always undesirable, often impossible • In a virtualized hosting system • VMM is like OS, guest OS is like application • Guest OS modification not so unacceptable • E.g., Xen, Denali • Main idea: It is possible to achieve good efficiency and fairness using “friendly” virtual machines

Outline • Motivation • Approach • Implementation • Evaluation • Conclusions

Friendly Virtual Machine • Not malicious • Dynamically adapts its resource needs to system conditions • Inspiration: AIMD congestion control in TCP • Gradually increase resource requirements, back-off when resource contention increases • How a TCP researcher would approach the resource management problem in data centers

System Goals • Efficiency • Resources should not be overloaded • E.g., Heavy paging during overload => low throughput • Resources should not be unnecessarily underutilized • Fairness • Each VM is allocated a proportional share of the bottleneck resource for that VM

Overload Detection • Unlike TCP, there are multiple resources to consider • CPU, virtual memory, network bandwidth • Resource utilization metrics not reliable • E.g., CPU util may be high but the bottleneck may be the memory sub-system • Use application-centric metrics like response time or throughput

Overload Detection • Virtual Clock Time (VCT) • Real time interval between consecutive virtual clock cycles • Bottleneck resource • The resource that is the first to trigger a significant increase in VCT • Bottleneck-equivalence classes • Detection: Measure the ratio of current VCT to minimum VCT observed • Compare with a threshold (2)

Adaptation Mechanisms • Control number of processes/threads • In practice, suspending running processes may not be a good idea • Alternatives • Suspend less important (e.g., younger) processes • Don’t allow new processes instead of suspending existing ones • Rate control by forcing VM to sleep • Follow an AIMD style adaptation that converges to fair/efficient allocation • Paper presents a control-theoretic model to prove convergence/stability properties

Salient Features • Underlying system requirements • Schedulers should be unbiased like round-robin, unlike multi-level feedback • VMM should implement resource policing to enforce AIMD behavior • Various adaptation strategies can co-exist • Think TCP-reno, TCP-tahoe, … • Suggestion: VMM could provide incentives for friendly behavior

Discussion • Is this system practical? • Rate of adaptation • Would it be fast enough for hosted applications? • Applications need resources soon after overload starts • How would the system behave with biased schedulers? • Can the adaptation mechanism be extended to handle different levels of importance? • This system might punish an application precisely when it is crucial for it to service its workload • E.g., An e-commerce app during Thanksgiving • Global knowledge can be crucial for efficiency • E.g., LRU page replacement • Security, isolation • To me it seems this would be as secure as a system with a more heavy-weight VMM

Implementation • User-mode Linux • Implement adaptation of number of processes and rate control • 500 lines of code

Outline • Motivation • Approach • Implementation • Evaluation • Conclusions

Memory intensive benchmark - Performance metrics vs # VMs • Linux suspends processes arbitrarily when excessive thrashing occurs, their system spreads the punishment evenly

Benchmark - Performance metrics vs # threads/VM (2 VMs) • Graceful degradation

How not to do Evaluation • No confidence intervals! • Observations for light loads are meaningless • Pick someone your own size • Of course, their system is better than vanilla UML, so what? • Should have compared with a system that implements fair schedulers

Apache - 4 VMs • Graceful degradation

Evolution of VCT w/ UML • Unfairness at high load

Evolution of VCT w FVM • Fair CPU allocation

Tput per VM w UML • Unfairness at high load • Unfair CPU allocations due to different paging treatment and process suspension

Per VM Tput w FVM • Fair behavior

Conclusions • Distributed, application-driven resource allocation • (+) Cool idea • (-) Needs more research to be convincing • Experimental evaluation not satisfactory

More Discussion

Friendly Virtual Machines for Efficient and Fair Resource Allocation