280 likes | 429 Views
Chapter 13. TCP Implementation. Objectives. Understand the structure of typical TCP implementation Outline the implementation of extended standards for TCP over high-performance networks
E N D
Chapter 13 TCP Implementation
Objectives • Understand the structure of typical TCP implementation • Outline the implementation of extended standards for TCP over high-performance networks • Understand the sources of end-system overhead in typical TCP implementations, and techniques to minimize them • Quantify the effect of end-system overhead and buffering on TCP performance • Understand the role of Remote Direct Memory Access (RDMA) extensions for high-performance IP networking
Contents • Overview of TCP implementation • High-performance TCP • End-system overhead • Copy avoidance • TCP offload
Implementation Overview
Overall Structure (RFC 793) • Internal structure specified in RFC 793 • Fig. 13.1
Data Structure of TCP Endpoint • Data structure of TCP endpoint • Transmission control block: Stores the connection state and related variables • Transmit queue: Buffers containing outstanding data • Receiver queue: Buffers for received data (but not yet forwarded to higher layer)
Buffering and Data Movement • Buffer queues reside in the protocol-independent socket layer within the operating system kernel • TCP sender upcalls to the transmit queue to obtain data • TCP receiver notifies the receive queue of correct arrival of incoming data • BSD-derived kernels implement buffers in mbufs • Moves data by reference • Reduces the need to copy • Most implementations commit buffer space to the queue lazily • Queues consume memory only when the bandwidth of the network does not match the rate at which TCP user produces/consumes data
User Memory Access • Provides for movement of data to and from the memory of the TCP user • Copy semantics • SEND and RECEIVE are defined with copy semantics • The user can modify a send buffer at the time the SEND is issued • Direct access • Allows TCP to access the user buffers directly • Bypasses copying of data
TCP Data Exchange • TCP endpoints cooperate by exchanging segments • Each segment contains: • Sequence number seg.seq, segment data length seg.len, status bits, ack seq number seg.ack, advertised receive window size seg.wnd • Fig. 13.3
Data Retransmissions • TCP sender uses retransmission timer to derive retransmission of unacknowledged data • Retransmits a segment if the timer fires • Retransmission timeout (RTO) • RTO<RTT: Aggressive; too many retransmissions • RTO>RTT: Conservative; low utilisation due to connection idle • In practice, adaptive retransmission timer with back-off is used (Specified in RFC 2988)
Congestion Control • A retransmission event indicates (to TCP sender) that the network is congested • Congestion management is a function of the end-systems • RFC 2581 requires TCP end-systems respond to congestion by reducing sending rate • AIMD: Additive Increase Multiplicative Decrease • TCP sender probes for available bandwidth on the network path • Upon detection of congestion, TCP sender multiplicatively reduces cwnd • Achieves fairness among TCP connections
High Performance TCP
TCP Implementation with High Bandwidth-Delay Product • High bandwidth-delay product: • High speed networks (e.g. optical networks) • High-latency networks (e.g. satellite network) • Collectively called Long Fat Networks (LFNs) • LFNs require large window size (more than 16 bits as originally defined for TCP) • Window scale option allows TCP sender to advertise large window size (e.g. 1 Gbyte) • Specified at connection setup • Limits window sizes in units of up to 16K
Round Trip Time Estimation • Accuracy of RTT estimation depends on frequent sample measurements of RTT • Percentage of segments sampled decreases with larger windows • May be insufficient for LFNs • Timestamp option • Enables the sender to compute RTT samples • Provides safeguard against accepting out-of-sequence numbers
Path MTU Discovery • Most efficient by using the largest MSS without segmentation • Enables TCP sender to automatically discover the largest acceptable MSS • TCP implementation must correctly handle dynamic changes to MSS • Never leaves more than 2*MSS bytes of data unacknowledged • TCP sender may need to segment data for retransmission
End-System Overhead
Reduce End-System Overhead • TCP imposes processing overhead in operating system • Adds directly to latency • Consumes a significant share of CPU cycles and memory • Reducing overhead can improve application throughput
Sources of Overhead for TCP/IP • Per-transfer overhead • Per-packet overhead • Per-byte overhead • Fig. 13.5
Per-Packet Overhead • Increasing packet size can mitigate the impact of per-packet and per-segment overhead • Fig. 13.6 • Increasing segment size S increases achievable bandwidth • As packet size grows, the effect of per-packet overhead becomes less significant • Interrupts • A significant source of per-packet overhead
Checksum Overhead • A source of per-byte overhead • Ways for reducing checksum overhead: • Complete multiple steps in a single traversal to reduce per-byte overhead • Integrate chechsumming with the data copy • Compute the checksum in hardware
Copy Avoidance for High-Performance TCP • Page remapping • Uses virtual memory to reduce copying across the TCP/user interface • Typically resides at the socket layer in the OS kernel • Scatter/gather I/O • Does not require copy semantics • Entails a comprehensive restructuring of OS and I/O interfaces • Remote Direct Memory Access (RDMA) • Steers incoming data directly into user-specified buffers • IETF standards under way
TCP Offload • Supports TCP/IP protocol functions directly on the network adapter (NIC) • Processing • TCP checksum offloading • Significantly reduces per-packet overheads for TCP/IP protocol processing • Helps to avoid expensive copy operations