1 / 20

Penn State CSE

“Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath. Penn State CSE. Introduction. Paper to appear in USENIX 2006 Alan L. Cox, Rice University Aravind and Willy, EPFL, Lausanne

paldrich
Download Presentation

Penn State CSE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. “Optimizing Network Virtualization in Xen” Aravind Menon, Alan L. Cox, Willy Zwaenepoel Presented by : Arjun R. Nath Penn State CSE

  2. Introduction • Paper to appear in USENIX 2006 • Alan L. Cox, Rice University • Aravind and Willy, EPFL, Lausanne • Outline: Paper introduces three methods of optimization to existing Xen (2.0) design for improving networking efficiency: • Improving network interface • Faster I/0 channel for exchange of packets between host and guest • Virtual memory modifications to reduce TLB misses by guest domains

  3. Xen Network I/0 Architecture

  4. 1. Improving the Virtual Network Interface • Front end, virtualized network driver for Xen guests • Simple, low-level interface : allows support for a large number of physical NICs. • However, this prevents the virtual driver from using advanced NIC capabilities, such as : • Checksum offload • Scatter/Gather DMA support • TCP segmentation offload • Improve by making use of some of these NIC capabilities if possible

  5. 1. Improving the Virtual Network Interface • TCP Segmentation Offload (or TCP Large Send) is when the NIC's buffer is much larger than the supported maximum transmission unit (MTU) of a given medium. The work of dividing the much larger packets into smaller packets can be offloaded to the NIC. (less CPU processing) • Scatter/Gather I/O: Enables the OS to construct packets directly from the file system buffers without needing to copy them to contiguous memory location. • Checksum Offload: Compute TCP data checksum on NIC hardware rather than in software.

  6. Offload driver in Driver Domain Modified Network I/O Architecture

  7. Offload benefits Results of Network modifications

  8. 2. I/O Channel Improvements Pre-modification operations • Packet transfer between guest and domain is done via a zero-copy page remapping mechanism. • The page containing the packet is remapped into the address space of the target domain • This operation requires each packet to be allocated on a separate page. • Each packet receive/transfer requires 3/2 address remaps and 2 memory alloc/dealloc operations

  9. 2. I/O Channel Improvements Optimizations • Transmit: • Let driver domain examine MAC header of packet, check if destination is driver domain or broadcast. • Then construct network packet from packet header and unmapped (maybe) packet fragments and send over bridge. (needs Gather DMA support in NIC) • Receive: • Reduce small packet overheads by doing a data-copy instead of page remap. • Implemented using shared pages between Dom0 and guest

  10. Results of I/0 channel changes I/O channel optimization benefit (transmit)

  11. 3. Virtual Memory Improvements • Observations • High number of TLB misses for guest domains in Xen for network operations compared to native Linux • Possibly due to increase in working set size. • Absence of support for virtual memory primitives such as superpage mappings and global page table mappings

  12. 3. Virtual Memory Improvements Optimizations • Modified guest OS to use superpage mappings for a virtual address range only if associated physical pages are contiguous • Modified memory allocator in guest OS tries to group together all memory pages into a contiguous range (within a superpage). • Modify VMM to allow use of global page mappings in their address space. (limited benefit) • Avoid TLB flush when switching between busy to idle and back to busy domains.

  13. Results for virtual memory changes TLB measurements using Xenoprof

  14. Data TLB measurements Results for virtual memory changes

  15. Virtual Memory Issues • Transparent page sharing between Vms breaks contiguity of physical page frames. Bad for superpages (currently not implemented in Xen) • Ballooning driver use also breaks contiguity of physical pages. (solution – have coarse grained ballooning in units of superpage size)

  16. Transmit throughput measurements Results Transmit (Overall)

  17. Receive Throughput measurements Results Receive (Overall)

  18. Results overall • Transmit throughput of guests improved more than 300% • Receive throughput improved by 30% • Receive performance is still a bottleneck

  19. Questions to the Authors • Why did they use Xen 2.0 rather than 3.x • They had started this work when Xen was at 2.0.6, they continued working on it even after Xen 3.0 was announced. • Will these optimizations be included in the Xen 3.x codebase ? • Yes, they are looking to do that (its not done yet). • Any More ?

  20. Thanks !

More Related