200 likes | 224 Views
“Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath. Penn State CSE. Introduction. Paper to appear in USENIX 2006 Alan L. Cox, Rice University Aravind and Willy, EPFL, Lausanne
E N D
“Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath Penn State CSE
Introduction • Paper to appear in USENIX 2006 • Alan L. Cox, Rice University • Aravind and Willy, EPFL, Lausanne • Outline: Paper introduces three methods of optimization to existing Xen (2.0) design for improving networking efficiency: • Improving network interface • Faster I/0 channel for exchange of packets between host and guest • Virtual memory modifications to reduce TLB misses by guest domains
1. Improving the Virtual Network Interface • Front end, virtualized network driver for Xen guests • Simple, low-level interface : allows support for a large number of physical NICs. • However, this prevents the virtual driver from using advanced NIC capabilities, such as : • Checksum offload • Scatter/Gather DMA support • TCP segmentation offload • Improve by making use of some of these NIC capabilities if possible
1. Improving the Virtual Network Interface • TCP Segmentation Offload (or TCP Large Send) is when the NIC's buffer is much larger than the supported maximum transmission unit (MTU) of a given medium. The work of dividing the much larger packets into smaller packets can be offloaded to the NIC. (less CPU processing) • Scatter/Gather I/O: Enables the OS to construct packets directly from the file system buffers without needing to copy them to contiguous memory location. • Checksum Offload: Compute TCP data checksum on NIC hardware rather than in software.
Offload driver in Driver Domain Modified Network I/O Architecture
Offload benefits Results of Network modifications
2. I/O Channel Improvements Pre-modification operations • Packet transfer between guest and domain is done via a zero-copy page remapping mechanism. • The page containing the packet is remapped into the address space of the target domain • This operation requires each packet to be allocated on a separate page. • Each packet receive/transfer requires 3/2 address remaps and 2 memory alloc/dealloc operations
2. I/O Channel Improvements Optimizations • Transmit: • Let driver domain examine MAC header of packet, check if destination is driver domain or broadcast. • Then construct network packet from packet header and unmapped (maybe) packet fragments and send over bridge. (needs Gather DMA support in NIC) • Receive: • Reduce small packet overheads by doing a data-copy instead of page remap. • Implemented using shared pages between Dom0 and guest
Results of I/0 channel changes I/O channel optimization benefit (transmit)
3. Virtual Memory Improvements • Observations • High number of TLB misses for guest domains in Xen for network operations compared to native Linux • Possibly due to increase in working set size. • Absence of support for virtual memory primitives such as superpage mappings and global page table mappings
3. Virtual Memory Improvements Optimizations • Modified guest OS to use superpage mappings for a virtual address range only if associated physical pages are contiguous • Modified memory allocator in guest OS tries to group together all memory pages into a contiguous range (within a superpage). • Modify VMM to allow use of global page mappings in their address space. (limited benefit) • Avoid TLB flush when switching between busy to idle and back to busy domains.
Results for virtual memory changes TLB measurements using Xenoprof
Data TLB measurements Results for virtual memory changes
Virtual Memory Issues • Transparent page sharing between Vms breaks contiguity of physical page frames. Bad for superpages (currently not implemented in Xen) • Ballooning driver use also breaks contiguity of physical pages. (solution – have coarse grained ballooning in units of superpage size)
Transmit throughput measurements Results Transmit (Overall)
Receive Throughput measurements Results Receive (Overall)
Results overall • Transmit throughput of guests improved more than 300% • Receive throughput improved by 30% • Receive performance is still a bottleneck
Questions to the Authors • Why did they use Xen 2.0 rather than 3.x • They had started this work when Xen was at 2.0.6, they continued working on it even after Xen 3.0 was announced. • Will these optimizations be included in the Xen 3.x codebase ? • Yes, they are looking to do that (its not done yet). • Any More ?