240 likes | 252 Views
SoftIB is a software-based IB device for Xen virtualized environments, providing efficient inter-VM communication without RDMA hardware. It is built on VMRI and offers high-performance testing for OFED applications.
E N D
SoftIB: A Virtual IB Device for Xen Jianxin Xiong
Overview • SoftIB is: • A software based IB device for virtualized environment (Xen) • Built on top of Virtual Machine RDMA Interface (VMRI) • Implemented as a driver in the OpenFabrics framework • SoftIB tries to provide: • An alternative inter-VM comm. mechanism with very good performance • An environment for testing OFED based applications without the RDMA hardware
uDAPL libibverbs User-level HCA driver User-level Access IPoIB, SDP, SRP Kernel Bypass IB Core (verbs, etc) HCA Driver kernel HCA OpenFabrics Software Stack
uDAPL libibverbs libvmri User-level Access IPoIB, SDP, SRP IB Core (verbs, etc) ib_vmri kernel VMRI SoftIB Architecture
VM0 VM1 VM2 proc1 proc2 proc3 proc4 OS OS OS hypervisor Devices Memory, CPU, etc Virtualization with Xen
Virtual Network Device in Xen VM0 VM1 VM2 • Front end: • standard interface • packetizing • Device channel: • shared ring buffer requests • event channel interrupt • Back end: • Process request • copying / buffer flipping • High overhead src dst NetBE NetFE NetFE device channels hypervisor
VM1 VM2 buf1 buf2 VMRI hypervisor SQ RQ A Simplified View of VMRI Virtual Machine RDMA Interface • A RDMA-style inter-VM communication mechanism • Implemented as a service in the hypervisor • Hypervisor has access to all the memory pages • Memory registration required • Accessible to all the VMs • VMRI is fast because: • 0-buffering data copy • Low overhead queue based RDMA protocol
VM guest code data buffer SQ buffer RQ buffer CQ buffer EQ buffer hypervisor hypercall VIRQ request processing device QPC MRC PDC EQC CQC SQ RQ CQ mcast group VMRI Components
direct copy sendbuf recvbuf 4 CQ RQ CQ SQ EQ EQ 3 3 cqe eqe eqe wqe cqe wqe 1 dev1 dev2 1 mr2 qp1 mr1 qp2 VM2 VM1 5 5 2 optional optional 6 6 irq irq hypervisor Data Moving: Send/Recv
direct copy localbuf remote buffer 4 CQ SQ EQ 3 3 cqe eqe wqe 1 dev1 dev2 mr2 qp1 mr1 qp2 VM2 VM1 5 2 optional 6 irq hypervisor Data Moving: RDMA Write
direct copy localbuf remote buffer 4 CQ SQ EQ 3 3 cqe eqe wqe dev1 dev2 1 mr2 qp1 mr1 qp2 VM2 VM1 5 2 optional 6 irq hypervisor Data Moving: RDMA Read
direct copy sendbuf recvbuf 4 CQ RQ CQ SQ EQ EQ 3 3 wqe eqe wqe cqe cqe eqe 1 dev1 dev2 1 mr1 qp2 qp1 mr2 VM2 2 5 5 optional optional 6 6 irq irq hypervisor Data Moving: UD Send/Recv VM1
direct copy sendbuf recvbuf 4 multicast group CQ RQ CQ SQ EQ EQ 3 3 2 cqe wqe wqe eqe eqe cqe 1 dev1 dev2 1 qp1 qp2 mr2 mr1 VM2 VM1 5 5 optional optional 6 6 irq irq hypervisor Data Moving: UD Multicast
VMRI API • Through Xen “hypercall” interface: • Group of commands: DEV (create/destroy/query) PD (create/destroy) CQ (create/destroy/arm/disarm) EQ (create/destroy/bind) QP (create/destroy/modify/query) MR (reg/dereg) MCG (create/destroy/attach/detach) Misc (schedule/debug/status) HYPERVISOR_vmri_op( int cmd, int dev_id, void * args );
ib_core ib_vmri function pointers ib_dev QP table CQ table internal functions MR table EQ table PD table guest kernel VMM VMRI SoftIB Kernel Driver
libibverbs ibv_cmd_xxx ibv_xxx libvmri function pointers read write ioctl internal functions user kernel ib user-level access kernel bypass ib core SoftIB User Mode Driver
SoftIB Status • Based on Xen-unstable and OFED-1.2-rc2 • Supported: • RC and UD transport • Unicast and multicast • Basic SA functionalities • IPoIB, SDP, uDAPL • Not supported: • Shared receive queue • Advanced SA functionalities • Subnet management • On going work: • Testing applications such as NFS/RDMA • Testing more ULPs such as SRP, iSER
Performance Evaluation • Hardware: • Xeon 5160 DP • 4GB Memory (only 2GB is used) • Motherboard: S5000PAL • Software: • RedHat EL4U4, CentOS 4.4 • Xen-unstable, cs14774 • NetPIPE, Intel MPI 3.0 • Tested Devices: • SoftIB (rdma, ipoib, sdp), Ethernet (vnif, ioemu), loopback, shm
Performance Summary • Substantial performance improvement has been observed over the standard virtual Ethernet device: