250 likes | 257 Views
Explore the concept of "friendly" virtual machines that dynamically adapt their resource needs to system conditions, aiming for efficiency and fairness in resource allocation. This paper presents an implementation and evaluation of this approach.
E N D
Friendly Virtual MachinesZhang,Bestavros etc., Boston Univ.ACM/USENIX VEE 2005 CSE 598c April 17, 2006 Bhuvan Urgaonkar
Problem Setting • Growing trend of hosting applications at third-party platforms • Two challenges • Isolation, security to co-located applications • Efficient and fair resource allocation • Virtualization seen as a promising approach for isolation • What about resource allocation?
Challenge - Resource Allocation in Hosting Environments • Traditional solutions • Over-provisioning => wasteful • Fair schedulers in the OS, dynamic provisioning, admission control • Complex • Deprive the application of meaningfully adapting its behavior to match available resources • Against the famous end-to-end argument developed in the networking community
End-to-end Argument • Clark et. al • A functionality should be pushed to the higher layer whenever possible • IP network implements packet forwarding, leaving congestion control to end systems • When applied to hosting platforms • Let the applications decide how many resources they need
How do VMs make end-end idea realizable? • In a traditional hosting system, applications would have to be modified • Always undesirable, often impossible • In a virtualized hosting system • VMM is like OS, guest OS is like application • Guest OS modification not so unacceptable • E.g., Xen, Denali • Main idea: It is possible to achieve good efficiency and fairness using “friendly” virtual machines
Outline • Motivation • Approach • Implementation • Evaluation • Conclusions
Friendly Virtual Machine • Not malicious • Dynamically adapts its resource needs to system conditions • Inspiration: AIMD congestion control in TCP • Gradually increase resource requirements, back-off when resource contention increases • How a TCP researcher would approach the resource management problem in data centers
System Goals • Efficiency • Resources should not be overloaded • E.g., Heavy paging during overload => low throughput • Resources should not be unnecessarily underutilized • Fairness • Each VM is allocated a proportional share of the bottleneck resource for that VM
Overload Detection • Unlike TCP, there are multiple resources to consider • CPU, virtual memory, network bandwidth • Resource utilization metrics not reliable • E.g., CPU util may be high but the bottleneck may be the memory sub-system • Use application-centric metrics like response time or throughput
Overload Detection • Virtual Clock Time (VCT) • Real time interval between consecutive virtual clock cycles • Bottleneck resource • The resource that is the first to trigger a significant increase in VCT • Bottleneck-equivalence classes • Detection: Measure the ratio of current VCT to minimum VCT observed • Compare with a threshold (2)
Adaptation Mechanisms • Control number of processes/threads • In practice, suspending running processes may not be a good idea • Alternatives • Suspend less important (e.g., younger) processes • Don’t allow new processes instead of suspending existing ones • Rate control by forcing VM to sleep • Follow an AIMD style adaptation that converges to fair/efficient allocation • Paper presents a control-theoretic model to prove convergence/stability properties
Salient Features • Underlying system requirements • Schedulers should be unbiased like round-robin, unlike multi-level feedback • VMM should implement resource policing to enforce AIMD behavior • Various adaptation strategies can co-exist • Think TCP-reno, TCP-tahoe, … • Suggestion: VMM could provide incentives for friendly behavior
Discussion • Is this system practical? • Rate of adaptation • Would it be fast enough for hosted applications? • Applications need resources soon after overload starts • How would the system behave with biased schedulers? • Can the adaptation mechanism be extended to handle different levels of importance? • This system might punish an application precisely when it is crucial for it to service its workload • E.g., An e-commerce app during Thanksgiving • Global knowledge can be crucial for efficiency • E.g., LRU page replacement • Security, isolation • To me it seems this would be as secure as a system with a more heavy-weight VMM
Implementation • User-mode Linux • Implement adaptation of number of processes and rate control • 500 lines of code
Outline • Motivation • Approach • Implementation • Evaluation • Conclusions
Memory intensive benchmark - Performance metrics vs # VMs • Linux suspends processes arbitrarily when excessive thrashing occurs, their system spreads the punishment evenly
Benchmark - Performance metrics vs # threads/VM (2 VMs) • Graceful degradation
How not to do Evaluation • No confidence intervals! • Observations for light loads are meaningless • Pick someone your own size • Of course, their system is better than vanilla UML, so what? • Should have compared with a system that implements fair schedulers
Apache - 4 VMs • Graceful degradation
Evolution of VCT w/ UML • Unfairness at high load
Evolution of VCT w FVM • Fair CPU allocation
Tput per VM w UML • Unfairness at high load • Unfair CPU allocations due to different paging treatment and process suspension
Per VM Tput w FVM • Fair behavior
Conclusions • Distributed, application-driven resource allocation • (+) Cool idea • (-) Needs more research to be convincing • Experimental evaluation not satisfactory