280 likes | 535 Views
TCP Congestion Control. 32 bits. URG: urgent data (generally not used). counting by bytes of data (not segments!). source port #. dest port #. sequence number. ACK: ACK # valid. acknowledgement number. head len. not used. rcvr window size. U. A. P. R. S. F.
E N D
32 bits URG: urgent data (generally not used) counting by bytes of data (not segments!) source port # dest port # sequence number ACK: ACK # valid acknowledgement number head len not used rcvr window size U A P R S F PSH: push data now (generally not used) # bytes rcvr willing to accept checksum ptr urgent data Options (variable length) RST, SYN, FIN: connection management (reset, setupteardown commands) application data (variable length) Also in UDP TCP Segment Structure
flow control TCP Flow Control receiver: explicitly informs sender of (dynamically changing) amount of free buffer space • RcvWindow field in TCP segment sender: keeps the amount of transmitted, unACKed data less than the most recently received RcvWindow sender won’t overrun receiver’s buffers by transmitting too much, too fast RcvBuffer= size of TCP Receive Buffer RcvWindow = amount of spare room in Buffer Questions: 1.What is the maximum size of RcvBuffer? 2. Can sender estimate the size of RcvBuffer? 3. Can receiver change its RcvBuffer size in the middle of a session? 4. Can Sender know the change? receiver buffering
Outline • Principle of congestion control • TCP/Reno congestion control
Big picture: How to determine a flow’s sending rate? Congestion: informally: “too many sources sending too much data too fast for the network to handle” different from flow control! manifestations: lost packets (buffer overflow at routers) wasted bandwidth long delays (queueing in router buffers) a top-10 problem! Principles of Congestion Control
History • TCP congestion control in mid-1980s • fixed window size w • timeout value = 2 RTT • Congestion collapse in the mid-1980s • UCB LBL throughput dropped by 1000X!
Some General Questions • How can congestion happen? • What is congestion control? • Why is congestion control difficult? • Will congestion disappear in the future due to technology advances (e.g. faster links, routers)? • How does TCP provide congestion control?
Cause/Cost of Congestion: Scenario 1 flow 1 5 Mbps 20 Mbps 10 Mbps 20 Mbps flow 2 (5 Mbps) 10 Mbps router 1 router 2 • Flow 2 has a fixed sending rate of 5 Mbps • We vary the sending rate of flow 1 from 0 to 20 Mbps • Assume • No retransmission • The link from router 1 to router 2 has infinite buffer • Throughput: packets go through • maximum achievable throughput • large delays when congested Total throughput of flow 1 & 2 (Mbps) Delay at link 1 delay due to randomness 10 sending rate by flow 1 (Mbps) sending rate by flow 1 (Mbps) 5 5 5 0 0
Cause/Cost of Congestion: Scenario 2 router 5 router 3 flow 1 5 Mbps 20 Mbps 10 Mbps 5 Mbps flow 2 (5 Mbps) 5 Mbps router 2 router 1 router 4 router 6 • Assume • No retransmission • The link from router 1 to router 2 has finite buffer • Throughput: packets go through • when packet dropped at the link from router 2 to router 6, the upstream transmission from router 1 to router 2 used for that packet was wasted! Total throughput of flow 1 & 2 (Mbps) 10 sending rate by flow 1 (Mbps) 5 5 0 What if retransmission?
packet loss knee cliff Throughput congestion collapse Load Delay Load Summary: The Cost of Congestion Cost • High delay • Packet loss • Wasted upstream bandwidth when a pkt is discarded at downstream • Wasted bandwidth due to retransmission (a pkt goes through a link multiple times)
End-end congestion control: no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP Network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) explicit rate sender should send at Approaches towards congestion control Two broad approaches towards congestion control:
Open-loop: A flow does not adjust its sending rate dynamically according to the status of the network Need reservation to avoid congestion collapse Closed-loop: A flow adjusts its rate dynamically according to the status of the network Open-loop vs. Closed-loop
End-to-end congestion control: A flow determines its rate Hop-by-hop: Routers on the path implement flow control between each other e.g. ATM credit-based Scheduling for flows at a link End-to-end vs. Hop-by-hop
Implicit: congestion inferred by end systems through observed loss, delay Explicit: routers provide feedback to end systems explicit rate sender should send at single bit indicating congestion (SNA, DECbit, TCP ECN, ATM) Implicit vs. Explicit
Rate-based vs. Window-based Rate-based: • Congestion control by explicitly controlling the sending rate of a flow, e.g. set sending rate to 128Kbps • Example: ATM Window-based: • Congestion control by controlling the window size of a transport scheme, e.g. set window size to 64KBytes • Example: TCP
Outline • TCP Overview • Principle of congestion control • TCP/Reno congestion control
w * MSS throughput Bytes/sec RTT TCP Congestion Control • Closed-loop, end-to-end, implicit, window-based congestion control • Transmission rate limited by congestion window size, cwnd, over segments: cwnd • w segments, each with MSS bytes sent in one RTT:
TCP Congestion Control: Basic Question • Ideally, we want to set the window size (approximately) to the product of available bandwidth (for this flow) and round-trip delay • However, • We don’t know these parameters at the beginning of a flow • Further, the available bandwidth and round-trip are changing, because of • competing flows
TCP Congestion Control: Basic Structure • Two “phases” • SlowStart • congestion avoidance (AIMD) • Important variables: • cwnd: congestion window size • ssthresh: threshold between the slow-start phase and the congestion avoidance phase • Many versions of TCP • TCP/Tahoe: this is a less optimized version • TCP/Reno: this is what we are talking about today; most OSs today implement TCP/Reno • TCP/Vegas: currently not used
TCP Congestion Control Implementation Initially: cwnd = 1; ssthresh = infinite (64K); For each newly ACKed segment: if (cwnd < ssthresh) /* slow start*/ cwnd = cwnd + 1; else /* congestion avoidance; cwnd increases by 1 per RTT */ cwnd += 1/cwnd; Triple-duplicate ACKs: /* multiplicative decrease */ cwnd = ssthresh = cwnd/2; Timeout: ssthresh = cwnd/2; cwnd = 1;
1 RTT TCP AIMD Network Data Packets • AIMD [Jacobson 1988]: Additive Increase : In every RTT W = W + 1*MSS Multiplicative Decrease : Upon a congestion event W = W/2 Sender Receiver TCP Acknowledgment Packets Congestion Window Size MD AI Time 0
When connection begins, CongWin = 1 MSS Example: MSS = 500 bytes & RTT = 200 msec initial rate = 20 kbps available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate TCP Slow Start • When connection begins, increase rate exponentially fast until first loss event • double CongWin every RTT • done by incrementing CongWin for every ACK received • Why call it slowstart: initial rate is slow but ramps up exponentially fast
segment 1 cwnd = 1 ACK for segment 1 segment 2 segment 3 ACK for segments 2 + 3 segment 4 segment 5 segment 6 segment 7 TCP Slow-start Initially: cwnd = 1; ssthresh = infinite (64K); For each newly ACKed segment: if (cwnd < ssthresh) /* slow start*/ cwnd = cwnd + 1; Timeout or Triple Duplicate ACKs: /*slowstart stops*/ cwnd = 2 cwnd = 4 cwnd = 6 cwnd = 8
After 3 dup ACKs: CongWin is cut in half window then grows linearly But after timeout event: CongWin instead set to 1 MSS; window then grows exponentially to a threshold, then grows linearly Fast Retransmit Philosophy: • 3 dup ACKs indicates network capable of delivering some segments • timeout before 3 dup ACKs is “more alarming”
Q: When should the exponential increase switch to linear? A: When CongWin gets to 1/2 of its value before timeout. Implementation: Variable Threshold At loss event, Threshold is set to 1/2 of CongWin just before loss event Fast Recovery
TCP/Reno: Big Picture cwnd TD TD TO ssthresh ssthresh ssthresh ssthresh Time congestion avoidance slow start congestion avoidance congestion avoidance slow start congestion avoidance TD: Triple duplicate acknowledgements TO: Timeout
Summary: TCP Congestion Control • When CongWin is below Threshold, sender in slow-start phase, window grows exponentially. • When CongWin is above Threshold, sender is in congestion-avoidance phase, window grows linearly. • When a triple duplicate ACK occurs, Threshold set to CongWin/2 and CongWin set to Threshold. • When timeout occurs, Threshold set to CongWin/2 and CongWin is set to 1 MSS.