380 likes | 572 Views
Reliable Data Transfer in Transmission Control Protocol (TCP). TCP creates reliable, ordered data transfer service on top of IP’s unreliable service Pipelined segments Cumulative ACKs Uses single retransmission timer. Retransmissions are triggered by: timeout events duplicate ACKs
E N D
Reliable Data Transfer inTransmission Control Protocol (TCP)
TCP creates reliable, ordered data transfer service on top of IP’s unreliable service Pipelined segments Cumulative ACKs Uses single retransmission timer Retransmissions are triggered by: timeout events duplicate ACKs Window size controlled by receiver and inferred from network Flow control Congestion control TCP Data Transfer
Sender/Receiver Overview Sender Receiver Next frame expected Last frame acceptable Last ACK received Last Frame Sent … … … … Sender window Receiver window Sent & Acked Sent Not Acked Received & Acked Acceptable Packet OK to Send Not Usable Not Usable
TCP Sender/Receiver Invariants Sending application Receiving application TCP TCP LastByteWritten LastByteRead LastByteAcked LastByteSent NextByteExpected LastByteRcvd Snd: LastByteAcked ≤ LastByteSent ≤ LastByteWritten Rcv: LastByteRead < NextByteExpected ≤ LastByteRcvd + 1 LastByteAcked < NextByteExpected ≤ LastByteSent+1
TCP sender(simplified) NextSeqNum = InitialSeqNum + 1 SendBase = InitialSeqNum + 1 /* == LastByteAcked + 1 */ loop (forever) { switch(event) event:data received from application above 1. create TCP segment with sequence number NextSeqNum 2. if (timer currently not running) 2.1. start timer – timeout after TimeOutInterval later 3. pass segment to IP 4. NextSeqNum = NextSeqNum + length(data) event:timer timeout 1. retransmit not-yet-acknowledged segment with smallest sequence number 2. start timer – timeout after TimeOutInterval later event:ACK received, with ACK field value of y 1. if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) restart timer } } /* end of loop forever */ • Comment: • SendBase-1: last • cumulatively ack’ed byte, i.e., the receiver has received bytes up to • SendBase –1 and is expecting the byte starting at SendBase
Host A Host B Seq=92, 8 bytes data ACK=100 Seq=92 timeout timeout X loss Seq=92, 8 bytes data ACK=100 time time lost ACK scenario TCP: retransmission scenarios Host A Host B Seq=92, 8 bytes data Seq=100, 20 bytes data ACK=100 ACK=120 Seq=92, 8 bytes data Sendbase = 100 SendBase = 120 ACK=120 SendBase = 100 SendBase = 120 premature timeout
Host A Host B Seq=92, 8 bytes data ACK=100 Seq=100, 20 bytes data timeout X loss ACK=120 time Cumulative ACK scenario TCP retransmission scenarios (more) SendBase = 120
TCP Receiver ACK generation[RFC 1122, RFC 2581] TCP Receiver action Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK (ACKs maybe piggybacked) Immediately send single cumulative ACK, ACKing both in-order segments Immediately send duplicate ACK, indicating seq. # of next expected byte Immediate send ACK, provided that segment starts at lower end of gap Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Arrival of in-order segment with expected seq #. One other segment has ACK pending Arrival of out-of-order segment higher-than-expect seq. # . Gap detected Arrival of segment that partially or completely fills gap
Time-out period often relatively long: long delay before resending lost packet Detect lost segments via duplicate ACKs. Sender often sends many segments back-to-back If segment is lost, there will likely be many duplicate ACKs. If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: fast retransmit:resend segment before timer expires Fast Retransmit
Fast retransmit algorithm: event:ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { /* y == SendBase. y cannot be smaller than SendBase */ increment count of dup ACKs received for y if (count of dup ACKs received for y == 3) { resend segment with sequence number y == SendBase } a duplicate ACK for already ACKed segment fast retransmit
receive side of TCP connection has a receive buffer: speed-matching service: matching the send rate to the receiving app’s drain rate flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast TCP Flow Control • app process may be slow at reading from buffer
spare room in buffer RcvWindow = MaxRcvBuffer - [NextByteExpectd - LastByteRead] Rcvr advertises spare room by including value of RcvWindow in segments Sender limits unACKed data to RcvWindow guarantees receive buffer doesn’t overflow LastByteSent – LastByteAcked ≤ RcvWindow SndWindow = RcvWindow -[LastByteSent – LastByteAcked] TCP Flow control: how it works LastByteRead NextByteExpected LastByteRcvd MaxRcvBuffer
TCP Flow control Issues • What happens if advertised window is 0? • Receiver updates window when application reads data • What if this update is lost? • Deadlock • TCP Persist timer • Sender periodically sends window probe packets • Receiver responds with ACK and up-to-date window advertisement
TCP flow control enhancements • Problem: (Clark, 1982) • If receiver advertises small increases in the receive window then the sender may waste time sending lots of small packets • This problem is known as “Silly window syndrome” • Receiver advertises one byte window • Sender sends one byte packet (1 byte data, 40 byte header = 4000% overhead)
Solving Silly Window Syndrome • Receiver avoidance [Clark (1982)] • Prevent receiver from advertising small windows • Increase advertised receiver window by min(MSS, RecvBuffer/2)
Solving Silly Window Syndrome • Sender Avoidance [Nagle’s algorithm (1984)] • prevent sender from unnecessarily sending small packets • How long does sender delay sending data? • too long: hurts interactive applications • too short: poor network utilization • strategies: timer-based vs self-clocking • When application generates additional data • if fills a max segment (and window open): send it • else • if there is unack’ed data in transit: buffer it until ACK arrives • else: send it
Keeping the Pipe Full • 16-bit AdvertisedWindow Bandwidth Delay x Bandwidth Product T1 (1.5 Mbps) 18KB Ethernet (10 Mbps) 122KB T3 (45 Mbps) 549KB FDDI (100 Mbps) 1.2MB STS-3 (155 Mbps) 1.8MB STS-12 (622 Mbps) 7.4MB STS-24 (1.2 Gbps) 14.8MB assuming 100ms RTT • TCP Extension to allow window scaling • Put is options field
Q: how to set TCP timeout value? longer than RTT but RTT varies too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss Q: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT TCP Round Trip Time and Timeout
TCP Round Trip Time and Timeout EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT • Exponential weighted moving average • influence of past sample decreases exponentially fast • typical value: = 0.125 e.g: Ai Estimated RTT at time i and M sampled RTT at time i A0 = M0 A1 = (1- ) M0 + M1 A2 = (1- )2 M0 + (1- ) M1 + M2 ….
Setting the timeout EstimtedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT: TCP Round Trip Time and Timeout DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT| (typically, = 0.25) Then set timeout interval: TimeoutInterval = EstimatedRTT + 4*DevRTT
Congestion: informally: “too many sources sending too much data too fast for network to handle” Signal and detect congestion Policy for source to adjust transmission rate to match network bandwidth capacity Decrease rate upon congestion signal Increase for utilization Initialization to reach steady state Congestion Control Overview
Network Utilization • Queuing delay (theoretically) could approach infinity with increased load • Network Power (ratio of throughput to delay) Queuing delay Throughput/delay Optimal Load load Knee
End-end congestion control: no explicit feedback from network congestion inferred from end-system observed loss, delay approach taken by TCP Network-assisted congestion control: routers provide feedback to end systems single bit indicating congestion (SNA, DECbit, TCP/IP ECN, ATM) explicit rate sender should send at Approaches towards congestion control Two broad approaches towards congestion control:
end-end control (no network assistance) sender limits transmission: sndWindow = LastByteSent-LastByteAcked min(cwnd, rcvWin) Cwnd is a dynamic function of perceived network congestion How does sender perceive congestion? loss event = timeout or 3 duplicate acks TCP sender reduces rate (cwnd) after loss event TCP Congestion Control
TCP Congestion Control Outline • Basic Idea: Probe (test) the current available bandwidth in the network & adjust your sending rate accordingly • Start with a very small sending rate -- 1 packet • Increase your sending rate as long as no congestion is detected • How do you detect “no congestion”? – No packet loss • How do you increase your sending rate? • Decrease your sending rate when you detect congestion • How do you detect congestion? – Packet loss (timeout, 3 duplicate ACK) • How do you decrease your sending rate?
TCP Slow Start • When connection begins, cwnd = 1 MSS • Example: MSS = 500 bytes & RTT = 200 msec • initial rate = 20 kbps • available bandwidth may be >> MSS/RTT • desirable to quickly ramp up to respectable rate • When connection begins, increase rate exponentially fast until first loss event • When loss occurs (congestion signal), set cwnd to 1 MSS and re-start with slow start
When connection begins, increase rate exponentially until first loss event: double cwnd every RTT done by incrementing cwnd by 1 MSS for every ACK received Summary: initial rate is slow but ramps up exponentially fast When the first packet loss occurs (either a timeout occurred or 3 duplicate ACKs were received), set cwnd to 1 MSS and start over. Host A Host B one segment RTT two segments four segments time TCP Slow Start (more)
Question: Should we continue doing slow start throughout the lifetime of the TCP connection? Why did we increase cwnd exponentially at the beginning? Because we had no idea how much of our traffic the network can carry, so we needed to probe fast to figure it out What do we know at the first packet loss event? That the network cannot carry our traffic at the rate we had at the time the packet loss occurred What was our rate at the time the packet loss occurred? Cwnd/RTT Refinement Idea: Use the knowledge we obtained at the time of the packet loss for further refinement of the slow start algorithm. How? Keep a threshold value, ssthresh, and set to 1/2 of cwnd just before loss event. Cnwd increases exponentially until it reaches ssthresh, and linearly afterwards – this is called congestion avoidance TCP After the First Packet Loss
Q: When should the exponential increase switch to linear? A: When cwnd gets to 1/2 of its value before timeout. Implementation: Threshold variable ssthresh Set to some large value, i.e., 65K, when the connection is established At loss event, ssthresh is set to 1/2 of cwnd just before loss event 14 12 10 threshold 8 (segments) congestion window size 6 4 TCP 2 Tahoe 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Transmission round Congestion Avoidance – TCP Tahoe • Figure assumes that the first packet loss has occurred when cwnd was 16, so sshtresh = 8, and cwnd = 1 if (cwnd < ssthresh) cwnd +=1; else cwnd += 1/cwnd;
Linear Congestion Window Increase during Congestion Avoidance Phase Window is increased by 1 packet after a full window size of packets is ACKed time Linear Window Increase ExampleDuring Congestion Avoidance Host A Host B one segment RTT two segments three segments four segments
Question: Should we set cwnd to 1 both after After a timeout When 3 duplicate ACKs are received Answer: timeout before 3 dup ACKs is “alarming”. So set cwnd to 1 BUT 3 dup ACKs before timeout indicates that the network is capable of delivering some segments Why not set cwnd to a bigger value rather than 1? TCP Reno: Set cwnd to half of its value, which is equal to sshtresh, and enter linear increase phase Towards a better Congestion Control Algorithm – TCP Reno
14 12 10 threshold 8 (segments) congestion window size 6 4 TCP 2 TCP Tahoe Reno 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Transmission round TCP Reno Refinement (more) RCP Reno Refinement • After 3 dup ACKs: • cwnd is cut in half • window then grows linearly • This is called “fast recovery” • But after timeout event: • cwnd instead set to 1 MSS; • window then grows exponentially • to a threshold, then grows linearly • At time 8, when cwnd = 12, a packet loss is detected by 3 duplicate ACKs. • Tahoe sets cwnd to 1 unconditionally • Reno sets cwnd to half of its current value, which is 6 • Notice that ssthresh is set to 6 in both cases • Also notice that if this was a “timeout”, then Reno would also set cwnd to 1
Summary: TCP Congestion Control • When cwnd is below ssthresh, sender in slow-start phase, window grows exponentially. • When cwnd is above sshthresh, sender is in congestion-avoidance phase, window grows linearly. • When a triple duplicate ACK occurs, ssthresh set to cwnd/2 and cwnd set to ssthresh. • When timeout occurs, ssthresh set to cwnd/2 and cwnd is set to 1 MSS. Congestion window always oscillates !
Figure above shows the behavior of a TCP connection in steady state Additive Increase/Multiplicative Decrease (AIMD) Further improving TCP Congestion Control Algorithm --TCP Vegas Detect congestion before packet loss occurs by observing RTTs Longer RTT, greater the congestion in the routers Lower the transmission rate linearly when packet loss is imminent Steady State TCP Modeling Long-lived TCP connection