Differentiated I/O services in virtualized environments

Differentiated I/O services in virtualized environments Tyler Harter, Salini SK & Anand Krishnamurthy

Overview • Provide differentiated I/O services for applications in guest operating systems in virtual machines • Applications in virtual machines tag I/O requests • Hypervisor’s I/O scheduler uses these tags to provide quality of I/O service

Motivation • Variegated applications with different I/O requirements hosted in clouds • Not optimal if I/O scheduling is agnostic of the semantics of the request

Motivation VM 2 VM 3 VM 1 Hypervisor

Motivation VM 2 VM 3 Hypervisor

Motivation • We want to have high and low priority processes that correctly get differentiated service within a VM and between VMs Can my webserver/DHT log pusher’s IO be served differently from my webserver/DHT’s IO?

Existing work & Problems • Vmware’s ESX server offers Storage I/O Control (SIOC) • Provides I/O prioritization of virtual machines that access a shared storage pool But it supports prioritization only at host granularity!

Existing work & Problems • Xen credit scheduler also works at domain level • Linux’s CFQ I/O scheduler supports I/O prioritization • Possible to use priorities at both guest and hypervisor’s I/O scheduler

Original Architecture Low High Low High Syscalls Guest VMs I/O Scheduler (e.g., CFQ) QEMU Virtual SCSI Disk Syscalls Host I/O Scheduler (e.g., CFQ)

Original Architecture

Problem 1: low and high may get same service

Problem 2: does not utilize host caches

Existing work & Problems • Xen credit scheduler also works at domain level • Linux’s CFQ I/O scheduler supports I/O prioritization • Possible to use priorities at both guest and hypervisor’s I/O scheduler • Current state of the art doesn’t provide differentiated services at guest application level granularity

Solution Tag I/O and prioritize in the hypervisor

Outline • KVM/Qemu, a brief intro… • KVM/QemuI/O stack • Multi-level I/O tagging • I/O scheduling algorithms • Evaluation • Summary

KVM/Qemu, a brief intro.. kernel-mode: switch into guest-mode and handle exits due to I/O operations KVM module part of Linux kernel since version 2.6 Linux has all the mechanisms a VMM needs to operate several VMs. Has 3 modes:- kernel, user, guest Relies on a virtualization capable CPU with either Intel VT or AMD SVM extensions user-mode: I/O when guest needs to access devices guest-mode: execute guest code, which is the guest OS except I/O Linux Standard Kernel with KVM - Hypervisor Hardware

KVM/Qemu, a brief intro.. Each Virtual Machine is an user space process Linux Standard Kernel with KVM - Hypervisor Hardware

KVM/Qemu, a brief intro.. Other user space ps libvirt Linux Standard Kernel with KVM - Hypervisor Hardware

KVM/QemuI/O stack Issues an I/O-related system call (eg: read(), write(), stat()) within a user-space context of the virtual machine. Application in guest OS Application in guest OS This system call will lead to submitting an I/O request from within the kernel-space of the VM read, write, stat ,… System calls layer VFS The I/O request will reach a device driver - either an ATA-compliant (IDE) or SCSI FileSystem BufferCache Block SCSI ATA

KVM/QemuI/O stack Application in guest OS Application in guest OS The device driver will issue privileged instructions to read/write to the memory regions exported over PCI by the corresponding device read, write, stat ,… System calls layer VFS FileSystem BufferCache Block SCSI ATA

KVM/QemuI/O stack Qemu emulator A VM-exit will take place for each of the privileged instructions resulting from the original I/O request in the VM The privileged I/O related instructions are passed by the hypervisor to the QEMU machine emulator These instructions will trigger VM-exits, that will be handled by the core KVM module within the Host's kernel-space context Linux Standard Kernel with KVM - Hypervisor Hardware

KVM/QemuI/O stack Qemu emulator These instructions will then be emulated by device-controller emulation modules within QEMU (either as ATA or as SCSI commands) QEMU will generate block-access I/O requests, in a special blockdevice emulation module Thus the original I/O request will generate I/O requests to the kernel-space of the Host Upon completion of the system calls, qemu will "inject" an interrupt into the VM that originally issued the I/O request Linux Standard Kernel with KVM - Hypervisor Hardware

Multi-level I/O tagging modifications

Modification 1: pass priorities via syscalls

Modification 2: NOOP+ at guest I/O scheduler

Modification 3: extend SCSI protocol with prio

Modification 2: NOOP+ at guest I/O scheduler

Modification 4: share-based priosched in host

Modification 5: use new calls in benchmarks

Scheduler algorithm-Stride - ID of application = Shares assigned to V– Virtual IO counter for = Global_shares/ Dispatch request() { Select the ID which has lowest Virtual IO counter Increase by if ( reaches threshold) Reinitialize all to 0 Dispatch request in the queue }

Scheduler algorithm cntd • Problem: Sleeping process can monopolize the resource once it wakes up after a long time • Solution: • If a sleeping process wakes up, then set = max( min(all which are non zero), )

Evaluation • Tested on HDD and SSD • Configuration:

Results • Metrics: • Throughput • Latency • Benchmarks: • Filebench • Sysbench • Voldemort(Distributed Key Value Store)

Shares vs Throughput for different workloads : HDD

Shares vs Latency for different workloads : HDD • Priorities are better respected if most of the read request hits the disk

Effective Throughput for various dispatch numbers : HDD • Priorities are respected only when dispatch numbers of the disk is lower than the number of read requests generated by the system at a time • Downside: Dispatch number of the disk is directly proportional to the effective throughput

Shares vs Throughput for different workloads : SSD

Shares vs Latency for different workloads : SSD • Priorities in SSDs are respected only under heavy load, since SSDs are faster

Comparison b/w different schedulers • Only Noop+LKMS respects priority! (Has to be, since we did it)

Results

Summary • It works!!! • Preferential services are possible only when dispatch numbers of the disk is lower than the number of read requests generated by the system at a time • But lower dispatch number reduces the effective throughput of the storage • In SSD, preferential service is only possible under heavy load • Scheduling at the lowermost layer yields better differentiated services

Future work • Get it working for writes • Get evaluations on VMware ESX SIOC and compare with our results

Differentiated I/O services in virtualized environments