1 / 12

Diagnosing Performance Overheads in Xen VM Environment

This paper discusses the use of Xenoprof to measure and analyze performance overheads in a Xen virtual machine environment. The authors evaluate the impact of different workloads on receiver, sender, and web server performance in Xen.

sheppard
Download Presentation

Diagnosing Performance Overheads in Xen VM Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE 598c Virtual Machines“Diagnosing Performance Overheads in the Xen Virtual Machine Environment”Aravind Menon, Jose Renato Santos, Yoshio Turner, G. Janakiraman, Willy Zwaenepoel Lisa Johansen March 13, 2006

  2. Motivation • Performance of an application in VM environments are affected by: • Operating System • Other Processes • Underlying VMM • Other VMs • We want a way to measure the elements which effect performance in a Virtual Machine environment (Xen)

  3. Outline • Overview Statistical Analysis in VMs • Xenoprof • Performance Debugging • Performance Overhead Analysis in Xen

  4. Issues in VM Statistical Analysis P2 P1 P3 VM VM VM P1 P1 P4 P2 Kernel Kernel Kernel VMM P3 P1 Hardware Hardware P4 P2 Kernel • Distributed computing • Distributed profiling • VM’s don’t have access to hardware events • VMM does OS Virtual Machine

  5. Xenoprof • In order to handle distributed profiling, each VM runs an OProfile for individual profiling • In order to monitor hardware, Xenoprof accepts hypercalls from OProfile and returns samples through interrupts P2 P3 P1 P1 P4 P2 P1 Kernel Kernel Dom0 OProfile OProfile OProfile Xenoprof VMM Hardware

  6. How it works • Each profiling domain queries the Xenoprof to find out if it should be the initiator • If there are multiple domains, Dom0 must be the initiator • The initiator collects profiling requirements from the participants and forwards this information to the Xenoprof • Xenoprof collects program counter samples in accordance to the instructions • These samples are then given to the OProfilers where they are mapped to the correct process • Individual or system wide performance can then be determined

  7. Performance Debugging - Networking • The motivating example was looking at the comparison of receiver throughput between Linux and XenoLinux • Varying the size of the user-level buffer greatly effects XenoLinux. Why? • Using Xenoprof they found: • XenoLinux kernel was the source of the increase in execution time • skb_copy_bits, skbuff_ctor, and tcp_collapse were the culprit functions • This is all due to time spent defragmenting memory taken up by empty socket buffer contents

  8. Performance Overhead Evaluation • Given this cool new tool, let’s apply it and determine performance overheads • Namely in network communication because it is an important element of VMs • Evaluate: • Receiver workload • Sender workload • Web server workload • In three configurations: • Xen-domain0 • Xen-guest0 (same CPU) • Xen-guest1 (different CPUs)

  9. Receiver workload • Domain0 • Degraded performance when compared to Linux • Found that instruction TLB misses and data TLB misses are much greater than in Linux (primary cause) • May be TLB flushing or increase in working set size • Instruction cost is greater in XenoLinux due to overheads that exist within Xen • Guest0 & Guest1 • Degraded performance when compared to Dom0 • Significant increase in instructions • Page remapping and transfer from Dom0 to DomUs • Increased L2 cache misses caused by increased working set size

  10. Sender workload • Domain0 • No throughput differences when compared to Linux • Guest0 • Huge throughput degradation based on the high instruction cost (max 706 Mb/s compared to 3764 Mb/s) • The TCP stack processes a larger number of packets than Dom0 to transfer the same amount of data • Due to the lack of TCP segmentation offload support • Also computes large checksums • Driver domain model prevents these instructions to be offloaded into physical interface • If similar abilities are taken away from Dom0, we see similar results

  11. Webserver workload • Overall, very similar to the receive and send • Domain0 • Higher TLB miss rate than that of Linux • Guest0 • Higher instruction costs • Highest L2 cache miss rates • Highest computational overhead • TSO offload don’t matter due to the small payloads • Guest1 • Higher instruction costs • Higher L2 cache miss rates • TSO offload don’t matter due to the small payloads

  12. Conclusion • Xenoprof is a tool to examine performance within Xen • Xenoprof has been used to examine the different performance elements of network communication in Xen • It can be used to evaluate other performance within Xen

More Related