vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core

vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core Cong Xu, Sahan Gamage, Hui Lu, Ramana Kompella, Dongyan Xu 2013 USENIX Annual Technical Conference Embedded Lab. Kim Sewoog

Motivation • Pay-as-you-go: Server Consolidation • Save cost in running application and operational expenditure • Multiple VMs sharing the same core • CPU access latency VM1 VM2 VM3 VM4 Hypervisor(or VMM) Low I/OThroughput

I/O Processing • Two basic stages • Device interrupts are processed synchronously in the kernel • Application asynchronously copies the data in kernel buffer VM1 VM2 VM3 Application Kernel Buffer IRQ Processing CPU Time IRQ processing delay < I/O Processing Workflow > < Effect of CPU Sharing on I/O Processing >

Effect of CPU Sharing on TCP Receive TCP Client Hypervisor Shared Buffer Scheduled VMs DATA DATA VM1 VM2 IRQ ProcessingDelay DATA VM3 ACK ACK ACK

Effect of CPU Sharing on UDP Receive UDP Client Hypervisor Shared Buffer Scheduled VMs DATA Shared Buffer DATA VM1 Dropped Full VM2 ApplicationBuffer DATA VM3

Effect of CPU Sharing on Disk Write Scheduled VMs Application Kernel Memory Disk Drive Kernel Memory DATA VM3 DATA DATA IRQ ProcessingDelay VM1 VM2 VM3

Intuitive Solution • Reduce time-slice of each VM • Causes significant context switch overhead

Our Solution: vTurbo

Our Solution: vTurbo • IRQ processing offloaded to a dedicated turbo core • Turbo core : Any physical core with micro-slicing (e.g., 0.1 ms) • Expose turbo core as a special vCPU to the VM • Turbo vCPU runs on a turbo core • Regular vCPUs run on regular cores • Pin IRQ context of guest OS to turbo vCPU • Benefits • Improved I/O throughput (TCP/UDP, Disk) • Self-adaptive system

vTurbo Design

vTurbo Design VM1 VM2 VM3 Application Regular Core VM1 VM2 VM3 VM1 VM2 VM3 Buf Buf IRQ IRQ Turbo Core Time Data Data

vTurbot’s Impact on Disk Write vTurbo Disk Drive Application Regular Core Kernel Memory VM3 DATA Kernel Memory VM3 VM1 VM2 VM3 VM1 VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 VM3 VM2 VM1 VM2

Effect of CPU Sharing on UDP Receive UDP Client Hypervisor Shared Buffer Regular Cores vTurbo KernelBuffer Shared Buffer DATA VM1 VM3 VM1 Kernel Buffer VM2 VM3 VM2 VM1 VM2 VM3 VM1 VM2 VM3 VM1 VM2 Application Buffer DATA VM3

Effect of CPU Sharing on TCP Receive Backlog Queue TCP Client Hypervisor Shared Buffer KernelBuffer Regular Cores vTurbo DATA VM1 ACK VM3 VM1 VM2 Receive Queue VM3 VM2 VM1 VM2 Locked VM3 VM1 VM2 VM3 VM3 VM1 VM2 DATA Application Buffer

VM Scheduling Policy for Fairness • Turbo cores are not free • Maintain CPU fair-share among VMs • Calculate the credits on both regular and turbo cores • Guarantee the CPU allocation on turbo cores • Deduct I/O intensive VMs’ credits on regular cores • Allocate the deduction to non-IO intensive VMs < total capacity among the regular and turbo cores > < each VMs’ turbo core fair share > < total capacity > < actual usage of the turbo core > < each VM’s fair share of CPU >

Evaluation • VM hosts • 3.2 GHz Intel Xeon Quad-cores CPU, 16GB RAM • Assign an independent core to driver domain(dom0) • Xen 4.1.2 • Linux 3.2 • Choose 1 core as Turbo core • Gigabit Ethernet switch(10Gbps for 2 experiments)

File Read/Write Throughput: Micro-Benchmark regular core <-> turbo core

TCP/UDP Throughput : Micro-Benchmark

NFS/SCP Throughput : Application Benchmark

Apache Olio : Application Benchmark • 3 components • a web server to process user requests • a MySQL database server to store user profiles and event information • an NFS server to store images and documents specific to events

Conclusions • Problem : CPU sharing affects I/O throughput • Solution : vTurbo • Offload IRQ processing to a turbo-sliced dedicated core • Results : • Improve UDP throughput up to 4x • Improve TCP throughput up to 3x • Improve Disk write up to 2x • Improve NFS’ throughput up to 3x • Improve Olio’s throughput by up to 38.7%

Reference • CHENG, L., AND WANG, C.-L. “vbalance: Using interrupt load balance to improve i/o performance for smp virtual machine”, In ACM SoCC (2012) • DONG, Y., YU, Z., AND ROSE, G. “SR-IOV networking in Xen: architecture, design and implementation”,In WIOV (2008). • GORDON, A., AMIT, N., HAR’EL, N., BEN-YEHUDA, M., LANDAU, A., SCHUSTER, A., AND TSAFRIR, D. “ELI: baremetal performance for I/O virtualization”,In ACM ASPLOS(2012).

Thank you !

vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core

vTurbo: Accelerating Virtual Machine I/O Processing Using Designated Turbo-Sliced Core

Presentation Transcript

VIR311 Microsoft System Center Virtual Machine Manager 2008 R2: Advanced Virtualization Management

PVM : Parallel Virtual Machine “ The poor man ’ s super-computer ”

Accelerating SQL Database Operations on a GPU with CUDA

Chapter 21: Machine Translation

Cache

Designated Senior Persons Training

Chapter 7

Softening the Network: Virtualization’s Final Frontier

How to Upload and attach VHD

CS0004: Introduction to Programming

Building Core Elements and Design

KEK activities on CLIC X-band Accelerating Structures

Machine (Assembly) Language

Compressive Sensing and Signal Processing Towards Analog-to-Information Conversion

KEK activities on CLIC X-band Accelerating Structures

Designated Infection Control Officer Training

Unit 4: Computer Numerical Control System the basics of the core of CNC machine tool-CNC unit:

YOU AIN’T SEEN NOTHING YET

The VAN

Seminar: Statistical NLP

Most Innovative Uses of Virtual Reality with Oculus apps