170 likes | 266 Views
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., steph@sandburst.com Allyn Romanow, Cisco Systems, allyn@cisco.com. RDDP Is Coming Soon. “ST [RDMA] Is The Wave Of The Future” – S Bailey & C Good, CERN 1999 Need: standard protocols host software
E N D
Remote Direct Memory Access (RDMA) over IPPFLDNet 2003, GenevaStephen Bailey, Sandburst Corp., steph@sandburst.comAllyn Romanow, Cisco Systems, allyn@cisco.com
RDDP Is Coming Soon “ST [RDMA] Is The Wave Of The Future” – S Bailey & C Good, CERN 1999 • Need: • standard protocols • host software • accelerated NICs (RNICs) • faster host buses (for > 1G) • Vendors are finally serious: Broadcom, Intel, Agilent, Adaptec, Emulex, Microsoft, IBM, HP (Compaq, Tandem, DEC), Sun, EMC, NetApp, Oracle, Cisco & many, many others
Overview • Motivation • Architecture • Open Issues
CFP SigComm Workshop • NICELI SigComm 03 Workshop Workshop on Network-I/O Convergence: Experience, Lessons, Implications • http://www.acm.org/sigcomm/sigcomm2003/workshop/niceli/index.html
High Speed Data Transfer • Bottlenecks • Protocol performance • Router performance • End station performance, host processing • CPU Utilization • The I/O Bottleneck • Interrupts • TCP checksum • Copies
What is RDMA? • Avoids copying by allowing network adapter under control of application to steer data directly into application buffers • Bulk data transfer or kernel bypass for small messages • Grid, cluster, supercomputing, data centers • Historically, special purpose fabrics – Fibre Channel, VIA, Infiniband, Quadrics, Servernet
application Ethernet/ IP Storage Network (Fibre Channel) A Machine Database Intermachine Network (VIA, IB, Proprietary) Traditional Data Center The World Servers
Why RDMA over IP? Business Case • TCP/IP not used for high bandwidth interconnection, host processing costs too high • High bandwidth transfer to become more prevalent – 10 GE, data centers • Special purpose interfaces are expensive • IP NICs are cheap, volume
The Technical Problem- I/O Bottleneck • With TCP/IP host processing can’t keep up with link bandwidth, on receive • Per byte costs dominate, Clark (89) • Well researched by distributed systems community, mid 1990’s. Industry experience. • Memory bandwidth doesn’t scale, processor memory performance gap– Hennessy(97), D.Patterson, T. Anderson(97), • Stream benchmark
Copying Using IP transports (TCP & SCTP) requires data copying 1 NIC Packet Buffer 2 Packet Buffer User Buffer Data copies
Why Is Copying Important? • Heavy resource consumption @ high speed (1Gbits/s and up) • Uses large % of available CPU • Uses large fraction of avail. bus bw – min 3 trips across the bus 64 KB window, 64 KB I/Os, 2P 600 MHz PIII, 9000 B MTU
What’s In RDMA For Us? Network I/O becomes `free’ (still have latency though) 1750 machines using 0% CPU for I/O 2500 machines using 30% CPU for I/O
Approaches to Copy Reduction • On-host – Special purpose software and/or hardware e.g., Zero Copy TCP, page flipping • Unreliable, idiosyncratic, expensive • Memory to memory copies, using network protocols to carry placement information • Satisfactory experience – Fibre Channel, VIA, Servernet • FOR HARDWARE, not software
RDMA over IP Standardization • IETF RDDP Remote Direct Data Placement WG • http://ietf.org/html.charters/rddp-charter.html • RDMAC RDMA Consortium • http://www.rdmaconsortium.org/home
ULP RDMA control DDP Transport IP RDMA over IP Architecture Two layers: • DDP – Direct Data Placement • RDMA - control
Upper and Lower Layers • ULPs- SDP Sockets Direct Protocol, iSCSI, MPI • DAFS is standardized NFSv4 on RDMA • SDP provides SOCK_STREAM API • Over reliable transport – TCP, SCTP
Open Issues • Security • TCP order processing, framing • Atomic ops • Ordering constraints – performance vs. predictability • Other transports, SCTP, TCP, unreliable • Impact on network & protocol behaviors • Next performance bottleneck? • What new applications? • Eliminates the need for large MTU (jumbos)?