330 likes | 448 Views
Xen and Co .: Communication-aware CPU Scheduling for Consolidated Xen -based Hosting Platforms. Sriram Govindan, Arjun R. Nath, Amitayu Das, Bhuvan Urgaonkar , Anand Sivasubramaniam The Pennsylvania State University, University Park, PA, 16802 .
E N D
Xen and Co.: Communication-aware CPU Scheduling for Consolidated Xen-based Hosting Platforms Sriram Govindan, Arjun R. Nath, Amitayu Das,BhuvanUrgaonkar, AnandSivasubramaniam The Pennsylvania State University, University Park, PA, 16802. In Proceedings of Conference on Virtual Execution Environments (VEE), June 2007.
Introduction • Web application: • Communication and disk-I/O intensive • Highly modular software architectures with multiple communicating tiers • Require resource guarantees from the hosting platform • To provide satisfactory performance to clients who access them over the Internet. • Virtualization: • Cost reduction • Server consolidation • More agile dynamic resource provisioning • Easier management • Can web applications + virtualization?
Motivation • The need for communication-aware CPU scheduling: • Performance degradation of a communication-intensive application placed on a consolidated server despite allocating sufficient resources VMMDom-0JBossApplication logic tier in TPC-WDBData tier in TPC-W (a) Two dedicated servers. One for JBoss and the other for DB. (c) Single server consolidated with both 2 Tiers of TPC-W and 5 CPU intensive application.
Motivation(cont’d) • Setting(for VM hosting two tier of TPC-W): • Same CPU allocation by setting cap. • Same memory allocated. • Result: • Conclusion: • Providing enough CPU alone is not enough. • An equally important consideration is to provide CPU at the right time. Reason: The TPC-W tiers spend large amounts of time waiting for a chance to communicate, resulting in degraded response times. Note: we called such delay as scheduling-induced delays
Problem Definition • Can a server in a virtual host platform schedule hosted VMs in a communication-aware manner to enable satisfactory application performanceeven under conditions of high consolidation, while still adhering to the high-level resource provisioning goals in a fair manner?
Background In this section, we provide an overview of server virtualization.
Network I/O virtualization in Xen • Netfront: Each guest domain implements a driver for its virtual NIC that is called its netfrontdriver. • Netback: Domain0 implements a netback driver • An intermediary between netfront drivers and the device driver for the physical NIC
CPU Scheduling • Borrowed Virtual Time (BVT)[14]: • Has a set of parameters which can be configured to provide low latency dispatch for the I/O intensive domains. • Only support work conserving mode. • Simple Earliest-Deadline-First(SEDF): • Supporting work conserving mode and non-work conserving mode. • It provided two method • Weighted fair sharing • Reservation: (slice, period) • Def.: Asking slice units of the CPU every periodtime units. • (A domain is not admitted if its reservation cannot be satisfied) • Credit-based[12]: • Supporting multi-physical processor. • Supporting both work conserving and non-work conserving(by setting cap) mode. [12] Credit Based Scheduler. http://wiki.xensource.com/xenwiki/CreditScheduler. [14] K. J. Duda and D. R. Cheriton. Borrowed-virtual-time (BVT) Scheduling: Supporting Latency-sensitive Threads in a General-purpose Scheduler. In Proceedings of the Seventeenth ACM Symposium on Operating Systems Principles, pages 261–276, New York, NY, USA, 1999. ACM Press.
Classifying scheduling-induced delays The goal of our CPUscheduler is to reduce the aggregate scheduling-induced delayfor the hosted domains while still providing guarantees on CPU allocations.
Delay associated with the scheduling of Domain0 • Two types: • The duration between a packet reception at thephysical NIC and when Domain0 is scheduled next to setup an event channel notification for the recipient guestdomain. • The duration between when a transmittingdomain copies a packet into the transmission-I/O-ringof Domain0 and when Domain0 gets scheduled next toactually send it over the physical NIC Delay!!! Delay!!!
Delay at the recipient • This is the duration between when Domain0 sets up an event channel notificationfor the recipient domain (on packet arrival) and when the recipient domain gets scheduled next to receive the packet. Delay!!!
Delay at the sender • This is the extra delay, before a domain sends a network packet (on its virtual NIC), induced by the hypervisor scheduling other domains in between. Delay!!!
Another illustration Java Tier want to send a packet to DB Dom-0 finally send this packet Dom-u receive this packet; Dom-0 set notification to Dom-u Same as d2 and d3, respectively Dom-0 finally send this packet Physical NIC receive this packet • Here, delays d1, d2, d4, and d5 are of type 1, • while delays d3 and d6 are of type2. • There are no type3 delays in this example.
Solutions • Delay associated with the scheduling of Domain0 • Scheduling Domain0 soon after a packet is received by the physical NIC. • Soon after a domain does a send operation over its virtual network interface. • Delay at the recipient • scheduling the recipient domain soon after the reception of a packet for it in Domain0. • Delay at the sender • Anticipating when a domain would be ready to send a packet and scheduling it close to that time.
Implementation • Environment: • We use the Xen VMM, enhanced with our algorithm, to build a prototype Virtual Host Platform of a collection of physical servers. (version: 3.0.2 released on 2006) • Note: credit-based was added to the Xen 3.1 in May of 2007. • Scheduler: • We build our algorithm on top of SEDF in the sense of retaining SEDF’s basic feature of guaranteeing the specified slice to a domain over every period.
Issue-1 and design • Which domain should be chosen out of multiple recipients? • Heuristic: • Our scheduler picks the domain that is likely to experience the most overall reduction in scheduling-induced delay. • The domain that has received the most number of packets. • Implementation detail: • In Xen, each domain, including Dom-0 is given a page that it shares with the hypervisor. • We use these pages to maintain various I/O related statistics and call these book-keeping pages. • Network-reception-intensity(stored at book-keeping pages of Dom-0): • Keeping track of the number of packets received and waiting within Domain0, one for each domain.
Issue-1 and design(cont’d) • Network-reception-intensity update as follows: 2 Dom-i is scheduled and maintain the count of processed packet until de-scheduled 3 Hypervisor send the value and decrease network_reception_intensity for dom i 1 Dom-0 get a packet for Dom-i
Issue-2 and design • Reducing the delay at a sender domain requires us to anticipate when this domain would have data to send next. • Intuition: • Choose to schedule the one that is expected to transmit the most packets. • Implementation detail: • Anticipated-network-transmit-intensity(stored at book-keeping pages of Dom-0): • Represent the transmit pressure of that domain for the future, one for each domain.
Issue-2 and design(cont’d) • Anticipated-network-transmit-intensity update as follows: 1 Dom-i maintain a number in its book-keeping page which record the packets it transmit during the last scheduled period 2 Hypervisor send the value and replace anticipated_network_transmit_intensity for dom i with this value
Issue-3 and design • Domain0 has a crucial role in ensuring the timely delivery of received packets to domains and transmitting the packets sent by them. • We would like to preferentially schedule Dom-0 at times when it is likely to be on the critical path. • Implementation detail: • Dom0-network-reception-intensity • Keep track of total systems transmission pressure. • Actual-network-transmission-intensity • Keep track of total systems transmission pressure.
Issue-3 and design(cont’d) • Dom0-network-reception-intensity and actual-network-transmission-intensity update as follows: 1 3 Dom-i maintain a number in its book-keeping page which record the packets it transmit during the last scheduled period Dom-0 transmit the packets and decrease the number of actual_network_transmit_intensity 4 2 NIC get a packet and send interrupt to hypervisor. This value decrease when the notification is set by dom0 Hypervisor send the value and add actual_network_transmit_intensity with this value
Scheduler • Our scheduler picks the domain with the highest network intensity, which is the sum of: • For guest domains: • Network-reception-intensity and anticipated-network-transmission-intensity for guest domains • For Dom0: • Dom0-network-reception-intensity and actual-network-transmission-intensity. • Note: • Respect Reservations: Our scheduler would not violate the CPU reservations of any of the domains.
Experiment setup • Two Xen(3.0.2) hosted servers: • CPU: Xeon 3.4 GHz CPUs with 2 MB of L1 cache, 800 MHz Front Side Bus. • RAM: 2GB • NIC: Gigabit Ethernet • Each hosts 6-8 VMs • Guest OS: • We pinned all domains to a single CPU. • Each VM assigned between 120 MB to 300 MB of RAM depending on its requirement.
Application and workload • Additionally, we use other domains which are not shown above running CPU-intensive applications for illustrative purposes in some of our experiments.
Performance improvements of web application • Objective: • Compare the client response times with those on default Xen under high consolidation. • Settings: • Use 5 CPU-intensive application in 5 domains to get high consolidation.
Performance improvements of Streaming media server • Objective: • Investigate benefits of communication-aware scheduling for our streaming media server. • Setting: • the domain hosting the media server competes with 7 CPU-intensive domains • Data is streamed to 45 clients at a constant rate of 3.0 Mbps each over a period of 20 minutes. • By default, each client is assumed to use a buffer of size 8 MB
Contribution experiment • Objective: • Observe relative contributions of the various components of our overall scheduling algorithm • Settings: • Observe the streaming media server serving 45 clients. • Watch the time to experience first discontinuity. • Result:
Fairness guarantees • Answering a question: • Do the performance improvements for the streaming media server upon using our scheduling come at the cost of reduced CPU allocations for the competing CPU-intensive domains Note: our algorithm still ensures that they continue to receive CPU more than their reservations.
Conclusions • We identified one major shortcoming in the XenVMM: • The VM scheduler in Xen is agnostic of the communication behavior of modern, multi-tier applications. • The scheduling of the privileged domain is in the critical path of every network operation. • We developed a new communication-aware CPU scheduling algorithm for the XenVMM • We demonstrated the performance/cost benefits and the wide applicability of our algorithms. • TPC-W benchmark exhibited improvements in average response times of up to 35%. • A streaming media server hosted on our prototype was able to satisfactorily service up to 3.5 times than the default Xen.
Comments • In our system, we currently use average response timeto be our key performance indicator. • According to this paper, there will be more possible to have higher response time even with same request rate since higher server consolidation and CPU utilization. • Note: this is tested on Xen 3.0.2(SEDF scheduler) • Should we reproduce same experiments to check whether this problem still exists in Xen 4.0(credit-based scheduler)? • Moreover, this paper use CPU-intensive applications to achieve high CPU utilization toward whole system. • If we just consolidated our server by web application, what will happen? • In our system, we suggest user separate their application based on different characteristics; • If the answer is that “average response time will not increase”, then we could consolidate our server with same type application. • Otherwise, we can only consolidate our server more conservatively.