230 likes | 479 Views
vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core. Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella, Dongyan Xu 2013 USENIX Annual Technical Conference. Embedded Lab. Kim Sewoog. Motivation. Pay-as-you-go: Server Consolidation
E N D
vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella, Dongyan Xu 2013 USENIX Annual Technical Conference Embedded Lab. Kim Sewoog
Motivation • Pay-as-you-go: Server Consolidation • Save cost in running application and operational expenditure • Multiple VMs sharing the same core • CPU access latency VM1 VM2 VM3 VM4 Hypervisor(or VMM) Low I/OThroughput
I/O Processing • Two basic stages • Device interrupts are processed synchronously in the kernel • Application asynchronously copies the data in kernel buffer VM1 VM2 VM3 Application Kernel Buffer IRQ Processing CPU Time IRQ processing delay < I/O Processing Workflow > < Effect of CPU Sharing on I/O Processing >
Effect of CPU Sharing on TCP Receive TCP Client Hypervisor Shared Buffer Scheduled VMs DATA DATA VM1 VM2 IRQ ProcessingDelay DATA VM3 ACK ACK ACK
Effect of CPU Sharing on UDP Receive UDP Client Hypervisor Shared Buffer Scheduled VMs DATA Shared Buffer DATA VM1 Dropped Full VM2 ApplicationBuffer DATA VM3
Effect of CPU Sharing on Disk Write Scheduled VMs Application Kernel Memory Disk Drive Kernel Memory DATA VM3 DATA DATA IRQ ProcessingDelay VM1 VM2 VM3
Intuitive Solution • Reduce time-slice of each VM • Causes significant context switch overhead
Our Solution: vTurbo • IRQ processing offloaded to a dedicated turbo core • Turbo core : Any physical core with micro-slicing (e.g., 0.1 ms) • Expose turbo core as a special vCPU to the VM • Turbo vCPU runs on a turbo core • Regular vCPUs run on regular cores • Pin IRQ context of guest OS to turbo vCPU • Benefits • Improved I/O throughput (TCP/UDP, Disk) • Self-adaptive system
vTurbo Design VM1 VM2 VM3 Application Regular Core VM1 VM2 VM3 VM1 VM2 VM3 Buf Buf IRQ IRQ Turbo Core Time Data Data
vTurbot’s Impact on Disk Write vTurbo Disk Drive Application Regular Core Kernel Memory VM3 DATA Kernel Memory VM3 VM1 VM2 VM3 VM1 VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 VM3 VM2 VM1 VM2
Effect of CPU Sharing on UDP Receive UDP Client Hypervisor Shared Buffer Regular Cores vTurbo KernelBuffer Shared Buffer DATA VM1 VM3 VM1 Kernel Buffer VM2 VM3 VM2 VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 Application Buffer DATA VM3
Effect of CPU Sharing on TCP Receive Backlog Queue TCP Client Hypervisor Shared Buffer KernelBuffer Regular Cores vTurbo DATA VM1 ACK VM3 VM1 VM2 Receive Queue VM3 VM2 VM1 VM2 Locked VM3 VM1 VM2 VM3 VM3 VM1 VM2 DATA Application Buffer
VM Scheduling Policy for Fairness • Turbo cores are not free • Maintain CPU fair-share among VMs • Calculate the credits on both regular and turbo cores • Guarantee the CPU allocation on turbo cores • Deduct I/O intensive VMs’ credits on regular cores • Allocate the deduction to non-IO intensive VMs < total capacity among the regular and turbo cores > < each VMs’ turbo core fair share > < total capacity > < actual usage of the turbo core > < each VM’s fair share of CPU >
Evaluation • VM hosts • 3.2 GHz Intel Xeon Quad-cores CPU, 16GB RAM • Assign an independent core to driver domain(dom0) • Xen 4.1.2 • Linux 3.2 • Choose 1 core as Turbo core • Gigabit Ethernet switch(10Gbps for 2 experiments)
File Read/Write Throughput: Micro-Benchmark regular core <-> turbo core
Apache Olio : Application Benchmark • 3 components • a web server to process user requests • a MySQL database server to store user profiles and event information • an NFS server to store images and documents specific to events
Conclusions • Problem : CPU sharing affects I/O throughput • Solution : vTurbo • Offload IRQ processing to a turbo-sliced dedicated core • Results : • Improve UDP throughput up to 4x • Improve TCP throughput up to 3x • Improve Disk write up to 2x • Improve NFS’ throughput up to 3x • Improve Olio’s throughput by up to 38.7%
Reference • CHENG, L., AND WANG, C.-L. “vbalance: Using interrupt load balance to improve i/o performance for smp virtual machine”, In ACM SoCC (2012) • DONG, Y., YU, Z., AND ROSE, G. “SR-IOV networking in Xen: architecture, design and implementation”,In WIOV (2008). • GORDON, A., AMIT, N., HAR’EL, N., BEN-YEHUDA, M., LANDAU, A., SCHUSTER, A., AND TSAFRIR, D. “ELI: baremetal performance for I/O virtualization”,In ACM ASPLOS(2012).