430 likes | 575 Views
Reliable Transport II: TCP and Congestion Control. Brad Karp UCL Computer Science. CS 6007/GC15/GA07 27 th - 28 th February, 2008. Outline. Packet header format Connection establishment Data transmission Retransmit timeouts RTT estimator AIMD Congestion control
E N D
Reliable Transport II:TCP and Congestion Control Brad Karp UCL Computer Science CS 6007/GC15/GA07 27th - 28th February, 2008
Outline • Packet header format • Connection establishment • Data transmission • Retransmit timeouts • RTT estimator • AIMD Congestion control • Throughput, loss, and RTT equation • Connection teardown • Protocol state machine
TCP Packet Header • TCP packet: IP header + TCP header + data • TCP header: 20 bytes long • Checksum covers header + “pseudo header” • IP header source and destination addresses, protocol • Length of TCP segment (TCP header + data)
TCP Header Details • Connections inherently bidirectional; all TCP headers carry both data and ACK sequence numbers • 32-bit sequence numbers are in units of bytes • Source and destination ports • multiplexing of TCP by applications • UNIX: local ports below 1024 reserved (only root may use them) • Window: advertisement of number of bytes advertiser willing to accept
TCP Connection Establishment:Motivation • Goals: • Start TCP connection between two hosts • Avoid mixing data from old connection in new connection • Avoid confusing previous connection attempts with current one • Prevent (most) third parties from impersonating (spoofing) one endpoint • SYN packets (SYN flag in TCP header set) used to establish connections • Use retransmission timer to recover from lost SYNs • What protocol meets above goals?
Connections shouldn’t start with constant sequence number; risks mixing data between old and new connections TCP Connection Establishment:Non-Solution (I) A B • Use two-way handshake • A sends SYN to B • B accepts by returning SYN to A • A retransmits SYN if not received • A and B can ignore duplicate SYNs after connection established • What about delayed data packets from old connection? time SYN SYN data, seqno = 1 data, seqno = 512 closed SYN data, seqno = 1024 SYN data, seqno = 1 data, seqno = 512 data, seqno = 1024
Connection attempts should explicitly acknowledge which SYN they are accepting! TCP Connection Establishment:Non-Solution (II) A B • Two-way handshake, as before • But enclose random initial sequence numbers on SYNs • What about delayed SYNs from old connection? • A wrongly believes connection successfully established • B will drop all of A’s data! time SYN, seqno = i closed SYN, seqno = k SYN, seqno = j data, seqno = k+1 data ignored!
TCP Connection Establishment:3-Way Handshake A B • Set SYN on connection request • Each side chooses random initial sequence number • Each side explicitly ACKs the sequence number of the SYN it’s responding to SYN, seqno = i time SYN, seqno = j, ACK = i+1 seqno = i+1, ACK = j+1
Robustness of 3-Way Handshake:Delayed SYN • Suppose A’s SYN i delayed, arrives at B after connection closed • B responds with SYN/ACK for i+1 • A doesn’t recognize i+1; responds with reset, RST flag set in TCP header • A rejects connection A B SYN, seqno = i closed SYN, seqno = j, ACK = i+1 time RST, ACK = j
Robustness of 3-Way Handshake:Delayed SYN/ACK A B • A attempts connection to B • Suppose B’s SYN k/ACK p delayed, arrives at A during new connection attempt • A rejects SYN k;sends RST to B • Connection from A to B succeeds unimpeded closed SYN, seqno = i time SYN, seqno = k, ACK = p RST, ACK = k SYN, seqno = j, ACK = i+1 seqno = i+1, ACK = j+1
Unless he is on path between A and B, adversary cannot spoof A to B or vice-versa! Why: random ISNs on SYNs Robustness of 3-Way Handshake:Source Spoofing • Suppose host B trusts host A, based on A’s IP address • e.g., allows any account creation request from host A • Adversary M may not control host A, but may seek to impersonate, or spoof, host A • Adversary may not need to receive data from B; only send data (e.g., “create an account l33thax0r”) • Can M establish a connection to B as A? SYN, seqno = j, ACK = i+1 A B IP = A, SYN, seqno = i M IP = A, seqno = i+1, ACK = ??
TCP: Data Transmission (I) • Each byte numbered sequentially, mod 232 • Sender buffers data in case retransmission required • Receiver buffers data for in-order reassembly • Sequence number (seqno) field in TCP header indicates first user payload byte in packet • Receiver indicates receive window size explicitly to sender in window field in TCP header • corresponds to available buffer space at receiver
TCP: Data Transmission (II) • Sender’s transmit window size: amount of buffer space at sender • Sender uses window that is minimum of send and receive window sizes • Receiver sends cumulative ACKs • ACK number in TCP header names highest contiguous byte number received thus far, +1 • one ACK per received packet, OR • Delayed ACK also possible: receiver batches ACKs, sends one for every pair of data packets (200 ms max delay) • Current window at sender: • low byte advances as packets sent • high byte advances as receive window updates arrive
Outline • Packet header format • Connection establishment • Data transmission • Retransmit timeouts • RTT estimator • AIMD Congestion control • Throughput, loss, and RTT equation • Connection teardown • Protocol state machine
TCP: Retransmit Timeouts • Sender sets timer for each sent packet • when ACK returns, timer canceled • if timer expires before ACK returns, packet resent • Expected time for ACK to return: RTT • TCP estimates round-trip time using EWMA • measurements mi from timed packet/ACK pairs • RTTi = ((1-α) x RTTi-1 + α x mi) • Retransmit timeout: RTOi = β × RTTi • original TCP: β = 2 • Is this accurate enough? • Recall dangers of too-short and too-long RTT estimates from previous lecture
Mean and Variance RTT estimator used by all modern TCPs Mean and Variance:Jacobson’s RTT Estimator • Above link load of 30% at router, β × RTTi will retransmit too early! • Response to increasing load: waste bandwidth on duplicate packets • Result: congestion collapse! • [Jacobson 88]: estimate vi, mean deviation (EWMA of |mi – RTTi|), stand-in for variance vi = vi-1 × (1-γ) + γ × |mi-RTTi| • Use RTOi = RTTi + 4vi
Retransmit Behavior • Original TCP, before [Jacobson 88]: • at start of connection, send full window of packets • retransmit each packet immediately after its timer expires • Result: window-sized bursts of packets sent into network
Pre-Jacobson TCP (Obsolete!) • Time-sequence plot taken at sender • Bursts of packets: vertical lines • Spurious retransmits: repeats at same y value • Dashed line: available 20 Kbps capacity
Self-Clocking: Conservation of Packets • Goal: self-clocking transmission • each ACK returns, one data packet sent • spacing of returning ACKs: matches spacing of packets in time at slowest link on path
Reaching Equilibrium: Slow Start • At connection start, sender sets congestion window size, cwnd, to pktSize (one packet’s worth of bytes), not whole window • Sender sends up to minimum of receiver’s advertised window and cwnd • Upon return of each ACK until receiver’s advertised window size reached, increase cwnd by pktSize bytes • “Slow” means exponential window increase! • Takes log2W RTTs to reach receiver’s advertised window size W
Post-Jacobson TCP: Slow Start and Mean+Variance RTT Estimator • Time-sequence plot at sender • “Slower” start • No spurious retransmits
Outline • Packet header format • Connection establishment • Data transmission • Retransmit timeouts • RTT estimator • AIMD Congestion control • Throughput, loss, and RTT equation • Connection teardown • Protocol state machine
Goals in Congestion Control • Achieve high utilization on links; don’t waste capacity! • Divide bottleneck link capacity fairly among users • Be stable: converge to a steady allocation among users • Avoid congestion collapse
Congestion Collapse • Cliff behavior observed in [Jacobson 88] Knee Congestion collapse! Throughput (bps) Offered load (bps)
Congestion Requires Slowing Senders • Recall: bigger buffers cannot prevent congestion • Senders must slow to alleviate congestion • Absence of ACKs implicitly indicates congestion • TCP sender’s window size determines sending rate • Recall: correct window size is bottleneck bandwidth-delay product • How can sender learn this value? • Search for it, by adapting window size • Feedback from network: ACKs return (window OK) or do not return (window too big)
Avoiding Congestion:Multiplicative Decrease • Recall that sender uses sending window of size min(cwnd, rwnd), where rwnd is receiver’s advertised window • Upon timeout for sent packet, sender presumes packet lost to congestion, and: • sets ssthresh = cwnd / 2 • sets cwnd = pktSize • uses slow start to grow cwnd up to ssthresh • End result: cwnd = cwnd / 2, via slow start • Sender sends one window per RTT; halving cwnd halves transmit rate
Combined algorithm: Additive Increase, Multiplicative Decrease (AIMD) Avoiding Congestion:Additive Increase • Drops indicate TCP sending more than its fair share of bottleneck • No feedback to indicate TCP using less than its fair share of bottleneck • Solution: speculatively increase window size as ACKs return • Additive increase: for each returning ACK, cwnd = cwnd + (pktSize × pktSize)/cwnd • Increases cwnd by ~pktSize bytes per RTT
Refinement: Fast Retransmit (I) • Sender must wait well over RTT for timer to expire before loss detected • TCP’s minimum retransmit timeout: 1 second • Another loss indication: duplicate ACKs • Suppose sender sends 1, 2, 3, 4, but 2 lost • Receiver receives 1, 3, 4 • Receiver sends cumulative ACKs 2, 2, 2 • Loss causes duplicate ACKs!
Fast Retransmit (II) • Upon arrival of 3 duplicate ACKs, sender: • sets cwnd = cwnd/2 • retransmits “missing” packet • no slow start • Not only loss causes dup ACKs • Reordering, too A B data, seqno = 1 data, seqno = 513 time data, seqno = 1025 data, seqno = 1537 ACK = 513 ACK = 513 ACK = 513 data, seqno = 513
AIMD in Action • Sender searches for correct window size
Why AIMD? • Other control rules possible • E.g., MIMD, AIAD, … • Recall goals: • Links fully utilized (efficient) • Users share resources fairly • TCP adapts all flows’ window sizes independently • Must choose a control that will always converge to an efficient and fair allocation of windows
Chiu-Jain Phase Plots • Consider two users sharing a bottleneck link • Plot bandwidths allocated to each • Efficiency: sum of two users’ rates fixed • Fairness: two users’ rates equal • Equi-Fairness: ratio of two users’ rates fixed Equi-Fairness Line (MI) Fairness Line (AI) Overload User 2 (bps) Optimum Efficiency Line Underload User 1 (bps)
Chiu Jain: AIMD • AIMD converges to optimum efficiency and fairness Fairness Line Efficiency Line
Chiu Jain: AIAD • AIAD doesn’t converge to optimum point! • Similar oscillations for MIMD Fairness Line Efficiency Line
Outline • Packet header format • Connection establishment • Data transmission • Retransmit timeouts • RTT estimator • AIMD Congestion control • Throughput, loss, and RTT equation • Connection teardown • Protocol state machine
Modeling Throughput, Loss, and RTT • How do packet loss rate and RTT affect throughput TCP achieves? • Assume: • only fast retransmits • no timeouts (so no slow starts in steady-state)
Evolution of Window Over Time W • Average window size: 3W/4 • One window sent per RTT • Bandwidth: • 3W/4 packets per RTT • (3W/4 x packet size) / RTT bytes per second • W depends on loss rate… W/2 time
Loss and Window Size • Assume no delayed ACKs, fixed RTT • cwnd grows by one packet per RTT • So it takes W/2 RTTs to go from window size W/2 to window size W; this period is one cycle • How many packets sent in total? • ((3W/4) / RTT) x (W/2 x RTT) = 3W2/8 • One loss per cycle (as window reaches W) • loss rate: p = 8/3W2 • W = sqrt(8/3p)
Throughput, Loss, and RTT Model • W = sqrt(8/3p) = (4/3) x sqrt(3/2p) • Recall: • Bandwidth: B = (3W/4 x packet size) / RTT • B = packet size / (RTT x sqrt(2p/3)) • Consequences: • Increased loss quickly reduces throughput • At same bottleneck, flow with longer RTT achieves less throughput than flow with shorter RTT!
Outline • Packet header format • Connection establishment • Data transmission • Retransmit timeouts • RTT estimator • AIMD Congestion control • Throughput, loss, and RTT equation • Connection teardown • Protocol state machine
TCP: Connection Teardown A B • Data may flow bidirectionally • Each side independently decides when to close connection • In each direction, FIN answered by ACK • Must reliably terminate connection for both sides • During TIME_WAIT state at first side to send FIN, ACK valid FINs that arrive • Must avoid mixing data from old connection with new one • During TIME_WAIT state, disallow all new connections for 2 x max segment lifetime FIN, seqno = i time ACK = i+1 FIN, seqno = j ACK = j+1 enter TIME_WAIT state
Summary: TCP and Congestion Control • Connection establishment and teardown • Robustness against delayed packets crucial • Round-trip time estimation • EWMAs estimate both RTT mean and deviation • Congestion detection at sender • Timeout: retransmit timer expires, half window, slow start from one packet • Fast Retransmit: three duplicate ACKs, half window, no slow start • Search for optimal sending window size • Additive increase, multiplicative decrease (AIMD) • AIMD converges to high utilization, fair sharing