250 likes | 529 Views
RDMA vs TCP experiment. Goal Environment Test tool - iperf Test Suits Conclusion. Goal. Test maximum and average bandwidth usage in 40Gbps(Infiniband) and 10Gbps(iWARP) network environment Compare CPU usage between TCP and RDMA data transfer mode
E N D
Goal • Environment • Test tool - iperf • Test Suits • Conclusion
Goal • Test maximum and average bandwidth usage in 40Gbps(Infiniband) and 10Gbps(iWARP) network environment • Compare CPU usage between TCP and RDMA data transfer mode • Compare CPU usage between RDMA READ and RDMA WRITE mode
Environment 40 GbpsInfiniband 10 Gbps iWARP Netqos03/client Netqos04/server
Tool - iperf • Migrateiperf 2.0.5 to the RDMA environment with OFED(librdmacm and libibverbs). • 2000+ Source Lines of Code added. • From 8382to 10562. • iperf usage extended • -H: RDMA transfer mode instead of TCP/UDP • -G: pr(passive read) pw(passive write) • Data read from server. • Server writes into clients. • -O: output data file, both TCP server and RDMA server • Only one stream to transfer
Test Suits • test suits 1: memory -> memory • test suits 2: file -> memory -> memory • test case 2.1: file(regular file) -> memory -> memory • test case 2.2: file(/dev/zero) -> memory -> memory • test case 2.3: file(lustre) -> memory -> memory • test suits 3: memory -> memory -> file • test case 3.1: memory -> memory -> file(regular file) • test case 3.2: memory -> memory -> file(/dev/null) • test case 3.3: memory -> memory -> file(lustre) • test suits 4: file -> memory -> memory -> file • test case 4.1: file ( regular file) -> memory -> memory -> file( regular file) • test case 4.2: file(/dev/zero) -> memory -> memory -> file(/dev/null) • test case 4.3: file(lustre) -> memory -> memory -> file(lustre)
File choice • File operation with Standard I/O library • fread, fwrite, Cached by OS • Input with /dev/zero wants to test the maximum application data transfer include file operation – read, which means disk is not the bottleneck • Output with /dev/null wants to test the maximum application data transfer include file operation – write, which means disk is not the bottleneck
Buffer choice • RDMA operation block size is 10MB • RDMA READ/WRITE one time • Previous experiment shows that, in this environment, if the block size more than 5MB, there is little effect to the transfer speed • TCP read/write buffer size is the default • TCP window size: 85.3 KByte (default)
Test case 2.1: (fread)file(regular file) -> memory -> memory CPU
Test case 2.1: (fread)file(regular file) -> memory -> memory Bandwidth
Test case 2.2 (five minutes) file(/dev/zero) -> memory -> memory CPU
Test case 2.2 (five minutes) file(/dev/zero) -> memory -> memory Bandwidth
Test case 3.1 (200G file are generated): memory -> memory -> file(regular file) CPU
Test case 3.1 (200G file are generated): memory -> memory -> file(regular file) Bandwidth
Test case 3.2: memory -> memory -> file(/dev/null) Bandwidth
Test case 4.1:file(r) -> memory -> memory -> file(r) Bandwidth
Test case 4.2:file(/dev/zero) -> memory -> memory -> file(/dev/null) CPU
Test case 4.2:file(/dev/zero) -> memory -> memory -> file(/dev/null) Bandwidth
Conclusion • For one data transfer stream, the RDMA transport is twice as fast as TCP, while the RDMA has only 10% of CPU load compare with the CPU load under TCP, without disk operation. • FTP includes two components: Networking and File operation. Compare with the RDMA operation, file operation (limited by the disk performance) takes most of the CPU usage. Therefore, a well-designed file buffer mode is critical.
Future work • Setup Lustre environment, and configure Lustre with RDMA function • Startup FTP migration • Source control • Bug database • Document • etc (refer to The Joel Test)
Memory Cached Cleanup # sync # echo 3 > /proc/sys/vm/drop_caches