1 / 24

Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability

Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability. Bo Li , Zhigang Huo, Panyong Zhang, Dan Meng { leo , zghuo, zhangpanyong, md}@ncic.ac.cn Presenter: Xiang Zhang zhangxiang@ncic.ac.cn.

Download Presentation

Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Bo Li, Zhigang Huo, Panyong Zhang, Dan Meng {leo, zghuo, zhangpanyong, md}@ncic.ac.cn Presenter: Xiang Zhang zhangxiang@ncic.ac.cn Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China

  2. Introduction • Virtualization is now one of the enabling technologies of Cloud Computing • Many HPC providers now use their systems as platforms for cloud/utility computing, these HPC on Demand offerings include: • Penguin's POD • IBM's Computing On Demand service • R Systems' dedicated hosting service • Amazon’s EC2

  3. Introduction:Virtualizing HPC clouds? • Pros: • good manageability • proactive fault tolerance • performance isolation • online system maintenance • Cons: • Performance gap • Lack low latency interconnects, which is important to tightly-coupled MPI applications • VMM-bypass has been proposed to relieve the worry

  4. Introduction:VMM-bypass I/O Virtualization • Xen split device driver model only used to setup necessary user access points • data communication in the critical path bypasses both the guest OS and the VMM VMM-Bypass I/O (courtesy [7])

  5. Introduction:InfiniBand Overview • InfiniBand is a popular high-speed interconnect • OS-bypass/RDMA • Latency: ~1us • BW: 3300MB/s • ~41.4% of Top500 now uses InfiniBand as the primary interconnect Interconnect Family / Systems June 2010 Source:http://www.top500.org

  6. Introduction:InfiniBand Scalability Problem • Reliable Connection (RC) • Queue Pair (QP), Each QP consists of SQ and RQ • QPs require memory • Shared Receive Queue (SRQ) • eXtensible Reliable Connection (XRC) • XRC domain & SRQ-based addressing N: node count C: cores per node Conns/Process: (N-1)×C Conns/Process: (N-1) SRQ8 SRQ7 SRQ6 SRQ5 SRQ RQ

  7. Problem Statement • Does scalability gap exist between native and virtualized environments? • CV: cores per VM Scalability gap exists!

  8. Presentation Outline • Introduction • Problem Statement • Proposed Design • Evaluation • Conclusions and Future Work

  9. Proposed Design:VM-proof XRC design • Design goal is to eliminate the scalability gap • Conns/Process: (N-1)×(C/CV)  (N-1)

  10. Proposed Design:Design Challenges • VM-proof sharing of XRC domain • A single XRC domain must be shared among different VMs within a physical node • VM-proof connection management • With a single XRC connection, P1 is able to send data to all the processes in another physical node (P5~P8), no matter which VMs those processes reside in

  11. Proposed Design:Implementation • VM-proof sharing of XRCD • XRCD is shared by opening the same XRCD file • guest domains and IDD have dedicated, non-shared filesystem • pseudo XRCD file and real XRCD file • VM-proof CM • Traditionally IP/hostname was used to identify a node • LID of the HCA is used instead

  12. Proposed Design:Discussions • safe XRCD sharing • unauthorized applications from other VMs may share the XRCD • the isolation of the sharing of XRCD could be guaranteed by the IDD • isolation between VMs running different MPI jobs • By using different XRCD files, different jobs (or VMs) could share different XRCDs and run without interfering with each other • XRC migration • main challenge: XRC connection is a process-to-node communication channel. • Future work

  13. Presentation Outline • Introduction • Problem Statement • Proposed Design • Evaluation • Conclusions and Future Work

  14. Evaluation:Platform • Cluster Configuration: • 128-core InfiniBand Cluster • Quad Socket, Quad-Core Barcelona 1.9GHz • Mellanox DDR ConnectX HCA, 24-port MT47396 Infiniscale-III switch • Implementation • Xen 3.4 with Linux 2.6.18.8 • OpenFabrics Enterprise Edition (OFED) 1.4.2 • MVAPICH-1.1.0

  15. Evaluation:Microbenchmark Explanation: Memory copy operations under virtualized case would include interactions between the guest domain and the IDD. • The bandwidth results are nearly the same • Virtualized IB performs ~0.1us worse when using blueframe mechanism. • memory copy of the sending data to the HCA's blueframe page IB verbs latency using doorbell MPI latency using blueframe IB verbs latency using blueframe

  16. Evaluation: VM-proof XRC Evaluation • Configurations • Native-XRC: Native environment running XRC-based MVAPICH. • VM-XRC (CV=n): VM-based environment running unmodified XRC-based MVAPICH. The parameter CV denotes the number of cores per VM. • VM-proof XRC: VM-based environment running MVAPICH with our VM-proof XRC design.

  17. Evaluation:Memory Usage 13GB • 16 cores/node cluster fully connected • The X-axis denotes the process count • ~12KB memory for each QP • 16x less memory usage • 64K processes will consume 13GB/node with the VM-XRC (CV=1) configuration • The VM-proof XRC design reduces the memory usage to only 800MB/node Better 800MB

  18. Evaluation:MPI Alltoall Evaluation • a total of 32 processes • 10%~25% improvement for messages < 256B VM-proof XRC Better

  19. Evaluation: Application Benchmarks • VM-proof XRC performs nearly the same as Native-XRC • Except BT and EP • Both are better than VM-XRC VM-proof XRC Better • little variation for different CV values • Cv=8 is an exception • Memory allocation not NUMA-aware guaranteed Better

  20. Evaluation: Application Benchmarks (Cont’d) ~15.9x less conns ~14.7x less conns

  21. Conclusion and Future Work • VM-proof XRC design converges two technologies • VMM-bypass I/O virtualization • eXtensible Reliable Connection in modern high speed interconnection networks (InfiniBand) • the same raw performance and scalability as in native non-virtualized environment with our VM-proof XRC design • ~16x scalability improvement is seen in 16-core/node clusters • Future work • evaluations on different platforms with increased scale • add VM migration support to our VM-proof XRC design • extend our work to the newly SRIOV-enabled ConnectX-2 HCAs

  22. Questions? {leo, zghuo, zhangpanyong, md}@ncic.ac.cn

  23. BackupSlides

  24. OS-bypass of InfiniBand OpenIB Gen2 stack

More Related