120 likes | 226 Views
CSE 598c Virtual Machines “Diagnosing Performance Overheads in the Xen Virtual Machine Environment” Aravind Menon, Jose Renato Santos, Yoshio Turner, G. Janakiraman, Willy Zwaenepoel. Lisa Johansen March 13, 2006. Motivation. Performance of an application in VM environments are affected by:
E N D
CSE 598c Virtual Machines“Diagnosing Performance Overheads in the Xen Virtual Machine Environment”Aravind Menon, Jose Renato Santos, Yoshio Turner, G. Janakiraman, Willy Zwaenepoel Lisa Johansen March 13, 2006
Motivation • Performance of an application in VM environments are affected by: • Operating System • Other Processes • Underlying VMM • Other VMs • We want a way to measure the elements which effect performance in a Virtual Machine environment (Xen)
Outline • Overview Statistical Analysis in VMs • Xenoprof • Performance Debugging • Performance Overhead Analysis in Xen
Issues in VM Statistical Analysis P2 P1 P3 VM VM VM P1 P1 P4 P2 Kernel Kernel Kernel VMM P3 P1 Hardware Hardware P4 P2 Kernel • Distributed computing • Distributed profiling • VM’s don’t have access to hardware events • VMM does OS Virtual Machine
Xenoprof • In order to handle distributed profiling, each VM runs an OProfile for individual profiling • In order to monitor hardware, Xenoprof accepts hypercalls from OProfile and returns samples through interrupts P2 P3 P1 P1 P4 P2 P1 Kernel Kernel Dom0 OProfile OProfile OProfile Xenoprof VMM Hardware
How it works • Each profiling domain queries the Xenoprof to find out if it should be the initiator • If there are multiple domains, Dom0 must be the initiator • The initiator collects profiling requirements from the participants and forwards this information to the Xenoprof • Xenoprof collects program counter samples in accordance to the instructions • These samples are then given to the OProfilers where they are mapped to the correct process • Individual or system wide performance can then be determined
Performance Debugging - Networking • The motivating example was looking at the comparison of receiver throughput between Linux and XenoLinux • Varying the size of the user-level buffer greatly effects XenoLinux. Why? • Using Xenoprof they found: • XenoLinux kernel was the source of the increase in execution time • skb_copy_bits, skbuff_ctor, and tcp_collapse were the culprit functions • This is all due to time spent defragmenting memory taken up by empty socket buffer contents
Performance Overhead Evaluation • Given this cool new tool, let’s apply it and determine performance overheads • Namely in network communication because it is an important element of VMs • Evaluate: • Receiver workload • Sender workload • Web server workload • In three configurations: • Xen-domain0 • Xen-guest0 (same CPU) • Xen-guest1 (different CPUs)
Receiver workload • Domain0 • Degraded performance when compared to Linux • Found that instruction TLB misses and data TLB misses are much greater than in Linux (primary cause) • May be TLB flushing or increase in working set size • Instruction cost is greater in XenoLinux due to overheads that exist within Xen • Guest0 & Guest1 • Degraded performance when compared to Dom0 • Significant increase in instructions • Page remapping and transfer from Dom0 to DomUs • Increased L2 cache misses caused by increased working set size
Sender workload • Domain0 • No throughput differences when compared to Linux • Guest0 • Huge throughput degradation based on the high instruction cost (max 706 Mb/s compared to 3764 Mb/s) • The TCP stack processes a larger number of packets than Dom0 to transfer the same amount of data • Due to the lack of TCP segmentation offload support • Also computes large checksums • Driver domain model prevents these instructions to be offloaded into physical interface • If similar abilities are taken away from Dom0, we see similar results
Webserver workload • Overall, very similar to the receive and send • Domain0 • Higher TLB miss rate than that of Linux • Guest0 • Higher instruction costs • Highest L2 cache miss rates • Highest computational overhead • TSO offload don’t matter due to the small payloads • Guest1 • Higher instruction costs • Higher L2 cache miss rates • TSO offload don’t matter due to the small payloads
Conclusion • Xenoprof is a tool to examine performance within Xen • Xenoprof has been used to examine the different performance elements of network communication in Xen • It can be used to evaluate other performance within Xen