320 likes | 332 Views
Xen and Co. : Communication-aware CPU Scheduling for Consolidated Xen-based Hosting Platforms. Sriram Govindan, Arjun R Nath, Amitayu Das, Bhuvan Urgaonkar, Anand Sivasubramaniam, Computer Systems Laboratory, The Pennsylvania State University. Data centers. Rent server resources
E N D
Xen and Co.: Communication-aware CPU Scheduling for Consolidated Xen-based Hosting Platforms Sriram Govindan, Arjun R Nath, Amitayu Das, Bhuvan Urgaonkar, Anand Sivasubramaniam, Computer Systems Laboratory, The Pennsylvania State University.
Data centers • Rent server resources • Provide resource and performance guarantees • Problem: • Server sprawl • Solution: Consolidation • Reduce resource wastage • Reduced floor space • Better power management • How?
Single tier streaming server Server virtualization • Ability to create multiple virtual servers from a single physical server • Allows consolidation by hosting heterogeneous OS instances over the same hardware • Why now? • Emergence of highly efficient virtual machine monitors • Xen, VMware etc • Hardware support • Intel, AMD, IBM etc • Real world example: • Amazon EC2 2-tiered e-commerce application Applications Operating system Linux VMM Windows Hardware Hardware
Consolidation: Example • Consider a representative e-commerce benchmark, • TPC-W, an online book store application • Measure application resource needs and record performance, • Run TPC-W tiers on dedicated servers Record resource usage Query Record response times Requests Jboss Mysql . . . Response Responses VMM VMM Clients Hardware Hardware
Jboss Mysql . . . VMM VMM Clients Hardware Hardware Consolidation: Example CDF Response time in seconds
CPU intensive VMs Jboss mysql . . . Clients Consolidation: Example • Consolidate the TPC-W tiers on to a single server • Use Hypervisor to ensure resource guarantees • Reserve for the peak requirement • Pack more applications to utilize the remaining server capacity 10% 20% Resource underutilized Almost 100% Server Utilization Other resource requirements are also met VMM VMM Hardware Hardware
CPU intensive VMs Jboss mysql . . . Clients VMM Hardware Consolidation: Example With consolidation CDF Without consolidation Response time in seconds Why did this happen?
Scheduler induced delays Jboss query1 query2 TPC-W tiers running on dedicated servers reply1 reply2 DB Network latency
Scheduler induced delays Jboss query1 query2 TPC-W tiers running on dedicated servers reply1 reply2 DB Jboss query1 query2 Consolidated TPC-W tiers reply1 reply2 DB Scheduler induced delays Network latency
Does this look familiar? • Parallel systems: Gang scheduling/Co-scheduling • Feitelson et al, Ousterhout et al, Andrea et al • Schedulers: low latency dispatch • eg. BVT, Duda et al • Our contribution: • Fairness guarantees – Applications pay for resources • Self-tuning - reduced administrator intervention • Adapt to varying application’s I/O behaviour • Network I/O is virtualized – further increases the delays
Xen Virtual Machine Monitor I/O virtualization Applications Applications Applications … Domain 0/ Driver domain Modified Guest OS Modified Guest OS Modified Guest OS Virtual machines Virtual hardware (vCpu, vDisk, vNic, vMemory etc.) Xen Hypervisor VM scheduler Physical hardware (Cpu, Disk, Nic, Memory etc.)
Network Virtualization in Xen - Reception Application Packet delivery Guest VM Netfront Driver Virtual Interrupt Netback driver domain0 Hardware drivers Notify Hypervisor Interrupt NIC
Network Virtualization in Xen - Transmission Application Packet send Guest VM Netfront Driver Send over virtual NIC Netback driver domain0 Hardware drivers Send over NIC NIC
dom0 Jboss Issues a query to db dom0 DB Scheduler induced Delays • Delay associated with scheduling of Domain0 • When a guest domain transmits a packet • When a packet is received at the physical NIC
dom0 Jboss Issues a query to db dom0 DB Scheduler induced Delays • Delay associated with scheduling of Domain0 • Delay at the recipient • When Domain0 sends a packet to a guest domain
Scheduler induced Delays • Delay associated with scheduling of Domain0 • Delay at the recipient • Delay at the sender • Before a domain sends a network packet (on its virtual NIC). • Unlike reception, sending a packet can only be anticipated.
Scheduler induced Delays • Delay associated with scheduling of Domain0 • Delay at the recipient • Delay at the sender dom0 dom0 Jboss Jboss query reply dom0 dom0 DB Scheduler induced delays with virtualization overhead Consolidated TPC-W tiers in a virtualized environment Network latency
Scheduler design • Recall: Reservations must be provided • Build on top of a reservation based scheduler -SEDF • (slice, period) pair – need ‘slice ms’ every ‘period ms’ • Communication aware SEDF scheduler: • Enhance CPU scheduler to reduce scheduler induced delays • Change scheduling order to preferentially schedule communicating domains • Introduce short term unfairness • Still preserve reservation guarantees over a coarser time scale - PERIOD
Scheduler Implementation • Key idea: • Associate impending network activity with each virtual machine • Incorporate communication activity in to decision making • Greedy Heuristic: • Prefer VM that is likely to benefit the most – the VM with most pending packets
Communication aware scheduler - Reception Domain 1 Domain 2 Domain n Guest Domains … domain1.pending-- Hypervisor Packet arrive at the NIC Domain0.pending++ Domain0.pending-- NIC Domain0 Domain1.pending++ Interrupt Schedule Domain 1. Now, schedule domain0.
Evaluation Environment • Applications: • TPC-W benchmark • jboss and mysql tiers • Multi-threaded UDP Streaming server, • Simultaneously stream data at 3Mbps to specified number of clients • Every client is provided with a 8MB buffer size • Clients starts consuming data only when the buffer is full • CPU intensive workloads, • Used for illustrative purposes
Streaming media experiments - performance improvement • Streaming to 45 Clients at 3Mpbs for 20 minutes • Default scheduler suffered playback discontinuity every 1.5 minutes
Streaming media experiments - performance improvement • Streaming to 45 Clients at 3Mpbs for 20 minutes • Default scheduler suffered playback discontinuity every 1.5 minutes • Communication-aware scheduler suffered a discontinuity only after 18th minute
Streaming media experiments - improved consolidation ( Lower the better ) • A single buffer under run at the client is fixed as Service Level Objective (SLO) • Communication aware scheduler is able to sustain 30 more clients than the default scheduler No. of buffer under runs at the client “SLO” No. of clients supported at the server
TPC-W performance • TPC-W benchmark ran for 20 minutes • Around 35 percent improvement in response time compared to the default scheduler
Scheduler Fairness Evaluation CPU intensive Virtual Machine • The CPU intensive VM lost less than 1% of CPU compared to the default scheduler but was still above their reservation which was 10% • Just changing the order of scheduling resulted in huge response time improvement for the streaming server Default SEDF CPU utilization Modified SEDF Reservation Time in minutes
Conclusion • A communication-aware CPU scheduler developed for a consolidated environment • Low overhead run-time monitoring of network events by the hypervisor scheduler • Addressed additional problems due to network I/O virtualization in Xen • Source code (~300 lines) and Xen3.0.2 Patch available in the software link in, • http://csl.cse.psu.edu/
Streaming media experiments - performance improvement • Streaming to 45 Clients at 3Mpbs for 20 minutes • Default scheduler suffered glitches every 1.5 minutes • Communication-aware scheduler suffered a glitch only after 18th minute • With only domain0 optimization ON, glitch occurred at the 15th minute
Communication aware scheduler Domain 1 Domain 2 Domain n Guest Domains … Guest domain book-keeping pages … Hypervisor Domain0’s book-keeping page Domain0
Communication aware scheduler - Reception Domain 1 Domain 2 Domain n Receive packets. Guest Domains … Update Packet reception. Update pending activity. … Hypervisor Domain0’s book-keeping page Packet arrive at the NIC Domain0: network_reception_intensity++ Domain0 Domain 1: network_reception_intensity++ NIC Interrupt Now, schedule domain0. Domain 0 is de scheduled, now we are in the hypervisor. Schedule Domain 1. Domain 1 is de scheduled, now we are in the hypervisor.
Communication aware scheduler - Transmission Domain 1 Domain 2 Domain n Domain1: network_transmission intensity++ Guest Domains … … Hypervisor Domain0’s book-keeping page Domain0: network_transmission intensity++ Domain0 Domain1: anticipated_network transmission_intensity++ Now domain 1 is de scheduled, we are in the hypervisor.