160 likes | 549 Views
Mapping of scalable RDMA protocols to ASIC/FPGA platforms. Yosef Gavriel Tirat-Gefen, PhD Senior Member IEEE Chief Scientist Castel Systems Inc. & Dept. Physics and Astronomy George Mason University Fairfax, VA yosefgavriel@computer.org. Presentation Overview. Motivation
E N D
Mapping of scalable RDMA protocols to ASIC/FPGA platforms Yosef Gavriel Tirat-Gefen, PhD Senior Member IEEE Chief Scientist Castel Systems Inc. & Dept. Physics and Astronomy George Mason University Fairfax, VA yosefgavriel@computer.org
Presentation Overview Motivation TCP Off-loading Zero-copying RDMA protocol RDMA protocol stack Structure of a RDMA card Results Conclusion
Motivation Supercomputer or Server farm Supercomputer or Server farm WAN Terabyte storage Terabyte storage Workstation Enabling high-bandwidth WAN applications
Applications Distributed Command and Control. Signal processing (e.g. RADAR) Sharing of intelligence data real-time. Distributed large scale computation/ simulation of aerospace problems. Extension of storage area networks over a wide area network (WAN). Enabling technology for modern supercomputing installations.
Layer 3 Layer 2 Layer 1 Layer 3 Layer 2 Layer 1 Traditional TCP/IP Networking Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (PHY) Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (PHY) Router
L3 L2 L1 Standard Data Flow on TCP/IP Application A Memory Space Application B Memory Space WAN/LAN TCP Buffer/Stack Memory Space TCP Buffer/Stack Memory Space L1 L2 L3
Standard Data Flow on TCP/IP • Traditional TCP/IP copies data from application to TCP memory buffer • Leads to CPU lost cycles in buffer copying • CPU gets overwhelmed to rates above 2.5 Gbps • TCP/IP off-loading is a help but it does not solve the problem on the receiver side
Application/O.S. TCP Layer 3 (IP) Layer 2 (MAC) Layer 1 (Phy) TCP/IP off-load processing Application/O.S. TCP/IP offload Processor (TOE) Mapped to hardware
Zero-copying and TCP offloading processing Host CPU Cache Memory TCP off-load Processor TOE/NIC Card Host CPU Host Main Memory Receive Buffer Network buffer WAN/LAN
Zero-copying and TCP offloading processing • Zero-copying is still not achieved as receiver buffer is still copied back to application memory space • TCP/IP off-loading is not scalable • RDMA protocols provide a solution
RDMA data-flow for WAN applications Host Memory Host Memory Host CPU B Host CPU A Application Memory Space Application Memory Space WAN RDMA NIC Card RDMA NIC Card
Scalable WAN-RDMA for bandwidths above 10 Gbps 10 Gbps links RDMA NIC Card for WAN Tx Buffer PHY Host MAC > 10 Gbps WAN RDMA Engine Rx Buffer DMA channel
The RDMA protocol layers and our prototype Running on Host CPU ULP (e.g. iSCSI, NFS) RDMA DDP MPA SCTP TCP Layer 3 (e.g. IP) Layer 2 (MAC) Layer 1 (PHY) FPGA implementation FPGA and off-the-shelf MAC/PHY chips
PCI-Express/Hyper-transport Interface Overall Hardware/Firmware Organization of the WAN RDMA card IP/Firmware module RDMA Protocol Engine Rx Memory controller Tx Memory controller SCTP Protocol Engine Rx Memory Bank Layer 3 (IP) Processor Rx Memory Bank Data stream split/join unit SAR SAR SAR SAR 10GE/OC-192 framer 10GE/OC-192 framer 10GE/ OC-192 framer 10GE/OC-192 framer PHY PHY PHY PHY
Present Results Currently using Virtex-II/Virtex-IIPro (Xilinx) as target devices for our cores Data indicate that most of the key cores will fit one FPGA device (Virtex-II) Aggregate of all cores is spanning several FPGAs Intra-device communication is a issue, need to be careful with PCB design. We are currently trying to accommodate most of the cores in one FPGA. Most of the cores will be made available free-of-charge to researchers in non-profit or government organizations.
Conclusion Advent of Hyper-transport/ PCI-Express and VITA (embedded computing) standards will enable I/0 bandwidths above 10 Gbps locally Extension of RDMA protocol enables large bandwidths over wide area networks The proposed cores will fulfill the natural growth of bandwidth requirements in commercial/defense/aerospace applications.