Network-I/O Convergence in “Too Fast” Networks: Threats and Countermeasures

David R. Cheriton Stanford University Network-I/O Convergencein “Too Fast” Networks:Threats and Countermeasures

Network-I/O Convergence:An Old Story • Network – I/O Convergence – an old story • File transfer vs. tape in 1950/60’s • File I/O from way back • 1975: Thoth – message-based but RDMA-like “move” operation necessary for read/writes • 1985: V – distributed message passing – again an RDMA-like move for file I/O • File servers – only external I/O is network • Blade servers – only I/O is network, except on boot Network-I/O Convergence is not new

Recent Problem:Attacks and Attack Resistance • Reordering – what does that do to delivery • Reordering must not kill performance • Inserted/forged packets – encrypted • Must not corrupt memory or use extra bandwidth • Replay – partial packets • Being attacked by your peripherals • iSCSI – one of your SAN disks could be compromised Ideal: Performance of host not degraded by attacks

New Problem: “Too Fast” Networks • E.g. 10 Gbps Ethernet • Network speeds exceed memory and processor • Network processing cost increases: MAC and decryption • Too fast to handle goodput in software • i.e. receive, demux, decrypt, deliver into memory • Too fast for protecting host in software • Too many hardware resources used to reject a packet • Need for encryption just makes it worse • Not Just Zero-Copy “Too fast” means very expensive not to be protect and not feasible to do in software

Objective: Zero-copy, zero-corruption, zero-compromise • Receiver authorization: fixed limit on cost on receiver host processor/memory resources • E.g. 10 percent mem/processor cycles per net I/F • System performance depends on this • No combination of packet traffic (attack) can exceed • Resource allocation/protection problem How to do, how to do efficiently, how to do safely

What about Moore’s Law? • “Too fast” is intrinsic • Fiber operating at the limit of the memory system • Switch write/read • Processor operating at the limit of the memory system • system write/read on reception • Memory goes faster – network will go faster Too fast networks – at the limit of memory speeds

Collision between I/O and Processorfor Hardware Resources • Contention for pins to memory between I/O vs. processor • Memory is random-access, caching – latency-sensitive, temporal/spatial locality, single subsystem • I/O streams: streaming, multiple subsystems • Contention for on-chip state • Mapping state for network vs VM, cache, etc. • Like VM page tables • Contention on for-chip logic: • Software-centric protocol design overhead • Worse with multiprocessors • Multiple integrated NICs in processor chip so more demands If not in the processor, way across the I/O “network” E.g. Infiniband link

Threat: Infiniband • Specialized networks for I/O • Just like Fiber channel • 1995 (actually 1999): RDMA over TCP, versus NextGenIO, FutureIO and now Infiniband • 2005 (actually 2003): RDMA-based transport • Unified I/O designed protocol architecture • Safer because limited range – data center network • Potential for deconvergence, at least relative to general-purpose networking, Ethernet, IP • Note: need for remote disaster recovery • GE Ethernet adaptors on the edge of the data center net Fix IP for storage or else lose to IB

The Multi-Layer Solution? • Ethernet/IP/ESP/TCP/MPA/DDP/RDMA • Many layers, redundancy, complexity, semantics • E.g. TCP sequencing semantics • Plus, still need control plane communication so need HW demux, delivery, decryption there too • Can an attacker compromise by? • forcing high CPU, extra memory bandwidth. • flooding garbage traffic, including control plane • Trip-up the “fast path” with exceptions Very complex “solution”, or “meta solution”

Meta-protocols • Standardize yet don’t provide interoperability • E.g. RDMA over … • Several different choices next level down, e.g. SCTP but TCP allowed. • Standards are too flexible to design hardware to. • Good standards require hard choices to get interoperability and market size, not metaprotocols

High-performance RPC Problem • High-performance RPC including framing support (marshal/demarshal) for large parameters • E.g. file/block read and write • Networks as fast as memory so copy is painful • Semantic gap between transport and OO RPC • Transport is byte stream but RPC is frames • Makes it hard to avoid copies • Safe, secure, transport with prevention of DoS, e.g. syn attacks • RPC – control plane for RDMA Not just RDMA, RPC needs to be handled too

Proposed Solution: Refactoring the Transport Layer Protocol • Theory: refactoring the protocol design problem between hardware level and non-hardware • Hardware • Hardware must protect resources – mem BW, memory, processor • HW data path, e.g. receive, decrypt, MAC, deliver to memory in right place and right way, else drop • Software handles control, with control building on hardware path so “immune” to attacks

Solution: RDMA-based Protocol Region-level: handles packet-based data delivery • Receive, decrypt, MAC and copy to memory region Control-level: RPC using RDMA regions • Shamelessly harvest TCP techniques • fast retransmit • Slow-start, etc. Connection Management: RPC using special RDMA region • Integrate key exchange, session setup

Region What: Collection of packet frames to/from which a sequence of packets of particular flow are mapped • Flow label plus an offset/seqNo field maps to frame in region • else drop • Static MTU per frame from MTU discovery Transmission: • Gather, encryption and authentication • Region – like in virtual memory, but frames, not pages • Similar mapping to page tables

Region structure Flowlabel sequenceNo E.g. UDP header plus 32-bit offset Similar to page mapping except • MTU-size packet frames • Packet reception state in region desc and frame descriptors region k+w k k+1 . . . frame frame frame File system disk buffer

Region, cont’d Delivery conditions: • Only deliver if decrypts and MACs • Only deliver if maps to region and buffer • Only deliver into exact location • Best-efforts delivery with retrans at higher-level Pros: • Simple state and logic for xmit, recv, acking • Competitive with Infiniband • Protection: no memory cost if packet not accepted • Multiple Protocols feasible: UDP, ESP Fine for data, what about control?

ROP Control Level • Build on hardware region mechanism for delivery and transmission • Exploit OORPC technology • Call, return and update regions as well as application-level RDMA regions • Referred to as a “channel” • Acks as RPC calls into update region • Software or hardware processing of acks • Same HW decryption, authentication, delivery mechanisms apply

File Write • Client has return region and update region • Server has call and update region plus region per RDMA buffer • Client knows identifier for buffer from file open • File write: • Client RDMAs data to remote buffer using region transmission mech. • Client sends RPC write call to server call region, identifying buffer • Server is notified of reception of write call • Server checks data is completely received • If not, requests retransmission of missing • Perform write processing and map buffer to new memory • Send RPC return frame to client’s return region • Client processes return frame and returns • Regions are used as (hardware) windows to flow control

Connection Setup • Channel Manager: mechanism to create/setup new channels • Provides a default call region for channel setup • Channel setup as RPCs • Present credentials and get a cookie • Present a cookie and get a new channel • Builds on experience with SYN attacks • Exposure: flooding of these channel setup RPCs

Conclusions • Network-I/O convergence – old story • The threat: “too fast” networks – too fast for software protocol design as usual • Memory is most limited, I/O is memory-intensive • Attacks on host resources • Threat: specialized I/O networks vs. complex general-purpose • Refactoring protocol design between hardware and software • Protecting resources, efficient delivery • Direct data placement only part of the story • New transport protocol better than going to a competing network protocol architecture, e.g. Infiniband • RPC-simplified protocol – demux/delivery for control plane • There is a counter-measure – can the IP/Ethernet community respond?

Network-I/O Convergence in “Too Fast” Networks: Threats and Countermeasures