620 likes | 648 Views
TCP. 10. TCP – purpose. TCP provides reliable data transmission over an unreliable network. TCP provides congestion control TCP provides flow control TCP passes messages Inputs Destination address Destination port Source port (socket) Message Outputs Message Error reporting
E N D
TCP 10
TCP – purpose • TCP provides reliable data transmission over an unreliable network. • TCP provides congestion control • TCP provides flow control • TCP passes messages • Inputs • Destination address • Destination port • Source port (socket) • Message • Outputs • Message • Error reporting • If TCP reports that the message has been delivered then we can rest assured that the receiving application has received the data. What the application does with it is another story. • At least 85% of all traffic uses TCP….but I heard the 50% of traffic in S. Korea uses UDP (gaming). • UDP • No flow control • No error reporting (little error reporting) BGP FTP HTTP SMTP telnet icmp UDP OSPF TCP IP
TCP header • IP header is 20 bytes (source IP, destination IP, protocol, TTL,…) • TCP header 20 bytes Source port Destination port Sequence # ACK # Header length 4 bits Reserved 6 U R G A C K P S H R S T S Y N F I N REC WIN 16 Urgent ptr 16 CHECK SUM 16 Options and padding
Ports – used so a single host can have many connections at the same time. When a packet arrives, it is distinguished by the source IP, source port, and destination port. More or less, the IPs and port define an application • Sequence number – indicates the 1st byte of the data. • ACK# is the next expected sequence number • Header length in 32 bit words. 4 bits means the max size is 60 bytes. 20 bytes are used by the header, so up to 40 bytes more could be in options. • flags • URG – urgent ptr (urgent data and valid urgent ptr, eg., cntrl-c) • ACK – ACK number is valid • PSH – receiver (the receiver should pass this data to the application as soon as possible… as oppose to what? This should be set when this packet will empty the outgoing buffer so the receiver should not wait for a full buffer before passing data to the app. Just send it now.) • RST – reset connection (something went wrong, good for detecting attacks). • SYN – synchronize sequence number • FIN – sender is finished sending data
connection establishment Node A initiates a connection with node B => Node A performs an active open, node B passive open (listen) dest source Send SYN SYN=1, seq=2197 ACK=0 Send SYN/ACK SYN=1, seq#=197 ACK=1, ack#=2198 Send ACK (for syn) ACK flag=1 ack#=198 seq#=2198 Initial SYN depends on implementation…
Connection establishment • If the first SYN is dropped, then it is resent 3 seconds later. If this is dropped, it is resent 6 seconds. And so on. The maximum waiting time is 64 seconds. The maximum time can be as high as 180 second. But this depends on the implementation. • If the listener doesn’t get an ACK, it will retransmit in 3 second and back-off in the same way. • But if the listener gets a data packet, the ack will be set and this will end the connection establishment. • Often during connection establishment connection setup data is included in the options. • Eg., the segment size is included in the options. • More option discussed later
Connection termination • FIN flag implies no more data will be sent from that host. • A FIN from each side closes the connection. • A FIN from only one size puts the connection in the half close state. • Example • Node A sends first • A sends pkt with FIN=1 and seq#=U (A enters FIN_WAIT) • B responds with ACK and ack#=U+1 (B enters close_wait) • A receives ACK (A enters FIN_WAIT2) • Now b closes • B send pkt with FIN set and seq#=V (enters LAST_ACK) • A responds with ACK and ack#=V+1 (enters TIME_WAIT and stays there for 120 seconds and then enters closed) • B receives ACK and enters closed. • Use netstat to determine the state of the TCP connections.
Sending data • Either side can send data. When sequence number indicates where the first byte is placed in the receiver buffer. • The receiver responds with an ACK, the ack# indicates the next empty byte location in the buffer. SYN had seq#=14 Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) Seq # 16 17 18 19 20 21 22 15 e S t e v H i buffer Seq#=1001 Ack#=22 Data size =0
SYN had seq#=14 Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) Seq # 16 17 18 19 20 21 22 15 e S t e v buffer Seq#=22 Ack#=1001 Data = ‘Bye’, size = 3 (bytes) SYN had seq#=14 Seq # 16 17 18 19 20 21 22 15 e S t e v B y e buffer Seq#=1001 Ack#=20 Data size =0 Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) SYN had seq#=14 Seq # 16 17 18 19 20 21 22 15 e S t e v H i B y e buffer Seq#=1001 Ack#=25 Data size =0 Note: here the receiver is not sending data, so its seq num is never changing and the reply ack is never changing. But the definitions of the ACK and SYN remain valid. Note that SYN and FIN packets are special cases. No data, but the ACKs increment.
Retransmission time-out • How to decide when a packet should be retransmitted? • Two methods. Here we talk about the first, when the ACK has not been received in a long time, TCP assumes that the packet was dropped. • How long is a long time…..? No good solution. Van jackobson’s algorithm This does not work all that well. Really, it is MinRTO that controls when time-outs occur. Van Jackobson’s algorithm does not work well. But more analysis is required.
RTO analysis Using the July 25, 2001 snapshot of round-trip times from the NLANR data set. we computed empirical probability of spurious timeouts. The total data set consists of nearly 13000 connections between 122 sites and 17.5 million round-trip time measurements. This data consisted of time series of round-trip times for each connection with each time series containing 1440 round-trip times (one sample per minute over the entire day)
Detecting drops with triple Dup ACKs Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) 25 30 30 35 16 17 18 19 20 21 22 15 Seq # e H i S t e v buffer Seq#=1001 Ack#=22 Data size =0 Seq#=22 Ack#=1001 Data = ‘Bye’, size = 2 (bytes) Seq#=25 Ack#=1001 Data = ‘Wazup’, size = 5 (bytes) 25 30 30 35 16 17 18 19 20 21 22 15 e H i W a z u p S t e v Seq#=1001 Ack#=22 Data size =0 Rwin=2 Seq#=30 Ack#=1001 Data = ‘Give’, size = 4 (bytes) 25 30 30 35 16 17 18 19 20 21 22 15 e H i W a z u p S t e v G v e i Seq#=1001 Ack#=22 Data size =0 Rwin=2 Seq#=34 Ack#=1001 Data = ‘Me’, size = 2 (bytes) 25 30 30 35 16 17 18 19 20 21 22 15 e H i W a z u p M e S t e v G v e i Seq#=1001 Ack#=22 Data size =0 Rwin=2 25 30 30 35 16 17 18 19 20 21 22 Seq#=22 Ack#=1001 Data = ‘Bye’, size = 2 (bytes) 15 e e H i W a z u p M e S t e v G v e B y i Seq#=1001 Ack#=36 Data size =0 Rwin=2
Why triple dup ACK? • Why not one DUP ACK? • Bennet and Partridge, Packets reordering is not pathological network behavior, 1999. This paper showed that packet reordering can/does occur. Further research into this could be a project. • The reason for the packet reordering is that the routers have parallel paths through them. So, depending on the order of arrival and the packet sizes, the incoming order will be different from the outgoing order. • Supposedly this was only a problem with older model juniper routers. There are many of these routers out there. Cisco field day! • Reordering only happens when the packets arrive at nearly the same time. This might not happen that much in TCP (see ACK clocking later). • However, this is an active research area. • Load balancing can cause packets to take different paths. This can cause reordering. Load balancing is a good project topic. • Route flap can also cause reordering. • Why not a larger DUPThres (larger than 3)? • This casues other problems. • Limited transmit can help. See my papers on TCP-PR for details. • Using triple DUP ACKs instead of RTO is called fast retransmit because the drop is detected faster.
Flow control – so the receive doesn’t get overwhelmed. SYN had seq#=14 Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) Seq # 16 17 18 19 20 21 22 15 Seq#=1001 Ack#=22 Data size =0 Rwin=2 e S t e v H i buffer Seq#=22 Ack#=1001 Data = ‘By’, size = 2 (bytes) 16 17 18 19 20 21 22 15 e • The number of unacknowledged packets must be lass than the receiver window. • As the receivers buffer fills, decreases the receiver window. S t e v H i B y Seq#=1001 Ack#=24 Data size =0 Rwin=0 Application reads buffer 25 26 27 28 29 30 31 24 Seq#=1001 Ack#=24 Data size =0 Rwin=9 Seq#=4 Ack#=1001 Data = ‘e’, size = 1 (bytes) 25 26 27 28 29 30 31 24 e
Flow control – so the receive doesn’t get overwhelmed. SYN had seq#=14 Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) Seq # 16 17 18 19 20 21 22 15 Seq#=1001 Ack#=22 Data size =0 Rwin=2 e S t e v H i buffer Seq#=22 Ack#=1001 Data = ‘By’, size = 2 (bytes) 16 17 18 19 20 21 22 15 e • The number of unacknowledged packets must be lass than the receiver window. • As the receivers buffer fills, decreases the receiver window. S t e v H i B y Seq#=1001 Ack#=24 Data size =0 Rwin=0 Application reads buffer 25 26 27 28 29 30 31 24 3 s Seq#=1001 Ack#=24 Data size =0 Rwin=9 window probe Seq#=4 Ack#=1001 Data = , size = 0 (bytes) Seq#=1001 Ack#=24 Data size =0 Rwin=9 Seq#=4 Ack#=1001 Data = ‘e’, size = 1 (bytes) 25 26 27 28 29 30 31 24 e
Flow control – so the receive doesn’t get overwhelmed. SYN had seq#=14 Seq#=20 Ack#=1001 Data = ‘Hi’, size = 2 (bytes) Seq # 16 17 18 19 20 21 22 15 Seq#=1001 Ack#=22 Data size =0 Rwin=2 e S t e v H i buffer Seq#=22 Ack#=1001 Data = ‘By’, size = 2 (bytes) 16 17 18 19 20 21 22 15 e • The number of unacknowledged packets must be lass than the receiver window. • As the receivers buffer fills, decreases the receiver window. S t e v H i B y Seq#=1001 Ack#=24 Data size =0 Rwin=0 3 s Seq#=4 Ack#=1001 Data = , size = 0 (bytes) Seq#=1001 Ack#=24 Data size =0 Rwin=0 6 s Max time between probes is 60 or 64 seconds Seq#=4 Ack#=1001 Data = , size = 0 (bytes)
Receiver window • The receiver window field is 16 bits. • Default receiver window • By default, the receiver window is in units of bytes. • Hence 64KB is max receiver size for any (default) implementation. • Ethernet segments are 1500 bytes (TCP data =1460). • So that would give 44 packets. • If the bit-rate was 10Mbps, what is the RTT so that this window size is equal to the bandwidth delay product. • Receiver window scale • During SYN, one option is Receiver window scale. • This option provides the amount to shift the Receiver window. • Eg. Is rec win scale = 4 and rec win=10, tehn real receiver window is 10<<4 = 160 bytes.
Congestion Control • Make sure not to overwhelm the network • How much data to put into the network? • The sender maintains a the congestion window (cwnd) that is the maximum number of unacknowledged packets. • InFlight is the number of unacked packets. • If InFlight < cwnd, then a packet can be sent. • When an ACK arrives, InFlight decreases so another packet can be sent.
suppose that cwnd = 4*MSS MSS is maximum segment size = min of segment sizes of sender and receiver. It is negotiated during SYN. suppose MSS=1000 Seq#=20 Ack#=1001Data = …, size =1 MSS (bytes) Inflight=1MSS Inflight=2MSS Seq#=1020 ck#=1001 Data = …, size =1 MSS (bytes) Seq#=2020 Ack#=1001 Data = …, size =1 MSS (bytes) Inflight=3MSS Seq#=3020 Ack#=1001 Data = …, size =1 MSS (bytes) Inflight=4MSS Seq#=1001 Ack#=1020 Data size =0 Seq#=1001 Ack#=1020 Data size =0 Inflight=3MSS Seq#=4020 Ack#=1001 Data = …, size =1 MSS (bytes) Inflight=4MSS Inflight=3MSS Seq#=4020 Ack#=1001 Data = …, size =1 MSS (bytes) Inflight=4MSS
suppose that cwnd = 4*MSS MSS is maximum segment size = min of segment sizes of sender and receiver. It is negotiated during SYN. suppose MSS=1000 Inflight=1MSS Seq#=20 Ack#=1001Data = …, size =1 MSS (bytes) Seq#=1020 ck#=1001 Data = …, size =1 MSS (bytes) Inflight=2MSS Seq#=2020 Ack#=1001 Data = …, size =1 MSS (bytes) ACK clocking What is the maximum rate that ACKs can arrive at the sender? Seq#=3020 Ack#=1001 Data = …, size =1 MSS (bytes) Inflight=3MSS Inflight=4MSS Seq#=1001 Ack#=1020 Data size =0 Seq#=1001 Ack#=1020 Data size =0 Inflight=3MSS Seq#=4020 Ack#=1001 Data = …, size =1 MSS (bytes) Inflight=4MSS Inflight=3MSS Seq#=4020 Ack#=1001 Data = …, size =1 MSS (bytes) Inflight=4MSS
ACK clocking 10Mbps 100Mbps 100Mbps Packets can leave here at 100Mbps
ACK clocking 10Mbps 100Mbps 100Mbps Packets can leave here at 100Mbps Packets leave here at a rate of 10Mbps What rate do packets leave here?
ACK clocking 10Mbps 100Mbps 100Mbps Packets can leave here at 100Mbps Packets leave here at a rate of 10Mbps What rate do packets leave here? Ans: 10Mbps, they arrive at 10Mbps What about the ACKs? 10Mbps 100Mbps 100Mbps What rate do ACKs leave here?
ACK clocking 10Mbps 100Mbps 100Mbps Packets can leave here at 100Mbps Packets leave here at a rate of 10Mbps What rate do packets leave here? Ans: 10Mbps, they arrive at 10Mbps What about the ACKs? 10Mbps 100Mbps 100Mbps What rate do ACKs leave here? Ans: 40/1040 * 10Mbps. Or at a rate so that if a oacket is send for each ACK, then the rate that the packets are sent is 10Mbps What rate do ACKs leave here? Ans: 40/1040 * 10Mbps. Or at a rate so that if a oacket is send for each ACK, then the rate that the packets are sent is 10Mbps What about the packets?
ACK clocking 10Mbps 100Mbps 100Mbps Packets can leave here at 100Mbps Packets leave here at a rate of 10Mbps What rate do packets leave here? Ans: 10Mbps, they arrive at 10Mbps What about the ACKs? 10Mbps 100Mbps 100Mbps What rate do ACKs leave here? Ans: 40/1040 * 10Mbps. Or at a rate so that if a oacket is send for each ACK, then the rate that the packets are sent is 10Mbps What rate do ACKs leave here? Ans: 40/1040 * 10Mbps. Or at a rate so that if a oacket is send for each ACK, then the rate that the packets are sent is 10Mbps What about the packets? 10Mbps. Perfect!!!
Congestion control • ACK clocking makes the sender not send any faster than the bottleneck link speed. • But how to “fill the pipe?” We only send cwnd packets in a burst. How big should cwnd be? Sending at “burst” rate of 10Mbps Not sending pckts. Wasted bandwidth Sending at “burst” rate of 10Mbps
Congestion control • ACK clocking makes the sender not send any faster than the bottleneck link speed. • But how to “fill the pipe?” We only send cwnd packets in a burst. How big should cwnd be? The number of pckts sent in one RTT is the cwnd. In order to not waste bandwidth, how many packets should be sent? RTT
Congestion control • ACK clocking makes the sender not send any faster than the bottleneck link speed. • But how to “fill the pipe?” We only send cwnd packets in a burst. How big should cwnd be? The number of pckts sent in one RTT is the cwnd. In order to not waste bandwidth, how many packets should be sent? RTT Cwnd (bytes)= Link byte-rate (byte/s) * RTT s Bottleneck links speed Bandwidth delay product = Link byte-rate (byte/s) * RTT s
Congestion control • Ideally cwnd = bandwidth delay product. • This ignores fairness. If there are N flows that are also use the same link. Then ideally cwnd = bandwidth delay product/N. • But how to find this value???
TCP congestion control • Theme: probe the system. • Slowly increase cwnd until there is a packet drop. That must imply that the cwnd size (or sum of windows sizes) is larger than the BWDP. • Once a packet is dropped, then decrease the cwnd. And then continue to slowly increase. • Two phases: • slow start (to get to the ballpark of the correct cwnd) • Congestion avoidance, to oscillate around the correct cwnd size. Cwnd>ssthress Triple dup ack Connection establishment Slow-start Congestion avoidance timeout Connection termination
Slow start • When the connect first start (and after a timeout for today’s TCPs) • Cwnd starts at 1 or 2 MSS. • For each non-dup ACK received, the window size increase by one. • This increasing continues until the window reaches the value of SSThres. • The initial value of SSThres is often large (taken as infinite). So the Rwin limits the growth of the window.
Slow start cwnd SYN: Seq#=20 Ack#=X SYN: Seq#=1000 Ack#=21 SYN: Seq#=21 Ack#=1001 1 Seq#=21 Ack#=1001 Data=‘…’ size =1000 Seq#=1001 Ack#=1021 size =0 2 Seq#=1021 Ack#=1001 Data=‘…’ size =1000 Seq#=2021 Ack#=1001 Data=‘…’ size =1000 3 Seq#=1001 Ack#=1021 size =0 Seq#=1021 Ack#=1001 Data=‘…’ size =1000 Seq#=1001 Ack#=1021 size =0 Seq#=2021 Ack#=1001 Data=‘…’ size =1000 4 Seq#=1021 Ack#=1001 Data=‘…’ size =1000 Seq#=2021 Ack#=1001 Data=‘…’ size =1000 5 6 7 The pipe is full! 8
Slow start cwnd SYN: Seq#=1000 Ack#=21 1 RTT 2 Seq#=1001 Ack#=1021 size =0 Cwnd doubles every RTT!! 3 RTT Seq#=1001 Ack#=1021 size =0 4 Seq#=1001 Ack#=1021 size =0 5 RTT 6 7 RTT The pipe is full! 8 What is happening here? RTT??
Slow start cwnd SYN: Seq#=1000 Ack#=21 1 RTT 2 Seq#=1001 Ack#=1021 size =0 Cwnd doubles every RTT!! 3 RTT Seq#=1001 Ack#=1021 size =0 4 Seq#=1001 Ack#=1021 size =0 5 RTT 6 7 RTT What is happening here? Now the queue is filling. Either it will fill and drop a packet or the recWin will stop cwnd from increasing 8 RTT??
If RecWin!=inf and RecWin<bandwidth delay product + queue size, and there are no other packets, then there will never be a drop. Lots of conditions, but a large number of flows do not experience drops. • If RecWin/ssthress=inf and the outgoing link of the sender is not the bottleneck, then eventually there will be a drop. If the drop is detected with triple dupack, then cwnd = cwnd/2 and congestion avoidance is entered. • If the drop(s) is(are) detected with timeout, then ssthress=cwnd/2, cwnd=1 and slowstart is continued. • If ssthress< bandwidth delay product + queue size and RecWin>ssthress, the congestion avoidance is entered.
Congestion Avoidance Basics: additive increase multiplicative decrease (AIMD)!! Rough view For every cwnd’s worth of packets, cwnd is incremented by one. When there is a drop, cwnd=cwnd/2. Seq# (MSS) cwnd cwnd 11 4 6 12 1 13 2 3 14 16 4 15 2 17 18 3 19 4 5 20 15 15 21 5 15 6 15 7 15 5 8 6 5 9 6 15 7 3 8 22 9 10 23 10 22 11 23 12 13 24 11 6 14 12 4 13 15 24 14 15
Rough view of TCP congestion control drops Cwnd=ssthres Slow start Congestion avoidance drops drop Slow start Congestion avoidance drops drop Congestion avoidance Slow start Slow start
TCP - more detailed view • Delayed ACKs • The worry was that the network was going to be all jammed up with ACKs. • So instead of sending an ACK for every pck, delay the ack and maybe ack two packets • Generate an ACK for at least every other packet. • Don’t delay an ACK by more than 500ms. (exact number depends on implementation.) • If packets are out of order, generate an ACK for every packet. • Also, immediately send an ACK when a “gap” in the buffer is filled. • Delayed ACKs can greatly slow down a connection. • Eg., the first packet is delayed by 500ms • Depending on the implementation, cwnd will grow more slowly.
Details - Fast recovery • cwnd after a drop • Recall, TCP only sends packets when InFlight < Cwnd. • InFlight only decreases when a new ACK is received, I.e., a DUP ACK does not cause InFlight to change. • If a DUP ACK arrives, then it means that a packet arrived at the receiver and an ACK was sent. So the number of packet in the network has decreased. So InFlight should decrease. • But maybe the network has duplicated the ACK. To be conservative, leave InFlight as is (I guess).
Fast recovery • Upon the two DUP ACK arrival, do nothing. Don’t send any packets (InFlight is the same). • Upon the third Dup ACK, • set SSThres=cwnd/2. • Cwnd=cwnd/2+3 • Retransmit the requested packet. • Upon every other DUP ACK, cwnd=cwnd+1. • If InFlight<cwnd, send a packet and increment InFlight. • When a new ACK arrives, set cwnd=ssthres (RENO). • When an ACK arrives that ACKs all packets that were outstanding when the first drop was detected, cwnd=ssthres (NEWRENO)
Fast recovery Seq# (MSS) cwnd Inflight cwnd 11 4 6 12 1 6 13 2 3 14 16 4 15 2 17 18 3 19 4 5 20 15 15 21 5 15 6 15 7 15 5 8 5 9 6 6 6=6/2+3 15 7 7 8 7 22 9 8 8 10 23 10 22 11 23 12 13 24 11 6 14 12 3 3 13 15 24 14 15
Seq# (MSS) Fast recovery – multiple drops - RENO cwnd 4 1 2 3 cwnd Inflight 4 2 11 3 6 12 6 4 5 12 16 5 12 17 6 18 7 19 5 8 5 20 12 9 6 12 21 7 12 8 12 9 10 12 10 11 12 6 6=6/2+3 12 13 11 7 6 7 22 14 12 8 8 15 23 15 12 15 12 15 3 24 3 Why is this bad? The first drop told us that we were sending to fast. The second drop tells us the same thing (already). So why react to the same news twice….NewReno 15 15 5 5=2+3 16 2 2 15
Fast Recovery – multiple drops - NewReno • The problem was that one of the packets that was outstanding when the drop was detected was also dropped. • Solution (NewReno) • When a drop is detected, • Ssthres=cwnd/2 • Cwnd=cwnd/2+3 • Recover = seq# of largest byte sent. • Retransmit the dropped packet • Upon a DUP ACK, increment cwnd and sent if Inflight<cwnd • If ACK is larger than pervious ACK, but smaller than recover (partial ack) • Suppose that pervious ack#=X and now ack#=Y<recover • Retransmit drop packet • Cwnd = cwnd – (Y-X)+1 • Of course, Inflight = Inflight-(Y-X) • So transmit another packet (that makes two transmissions) • If ACK>recover, • Cwnd=ssthres • Exit fast recovery
Fast Recovery – single drops - NewReno Inflight cwnd 14 14 16 17 18 19 20 21 17 17 17 17 Recover=29 14 17 10 11 12 13 14 15 15 16 31 Note how the actual number outstanding is always = 7 7
Fast Recovery – multiple drops - NewReno Inflight cwnd 14 14 16 17 18 19 20 21 17 17 NewReno sends two packets for every ACK indicating a multiple drop. 17 17 29 Recover=29 14 17 10 11 12 13 14 15 15 16 21 2 drops takes 2 RTT to recover. N drops takes N RTT to recover. If N*RTT>RTO, then slow-steady => no TO impatient => TO 19 21 15=19-4 16=19-(21-17)+1 35 7 Exit fast recovery
Other things • Idle restart • If no packet has been sent in RTO seconds • SSThress=Cwnd • Cwnd=1 • Slow-start • Avoids big bursts after idle times • E.g., get data form disk • http 1.1 • Timeout – exponential back off • If no ACK arrives before RTO timer expires, then time-out • Ssthress=cwnd/2; Cwnd=2; slow-start • RTO=min(2*RTO,64s) • If next packet is dropped, then the wait is longer • Gives up after 9-12 tries. But implementation dependent (ns never stops) • If a retransmitted is dropped, the TCP times out.
Dup ACKs after timeout Inflight cwnd 20 14 14 21 16 22 17 23 18 19 20 21 17 24 17 17 17 29 24 30 Recover=29 26 28 14 10 30 11 12 13 42 14 15 15 17 42 16 42 31 42 42 42 42 19 42 15=19-4 16=19-(21-17)+1 eventually timeout DUP ACKS 17 18 Set send_high to maximum seq# sent. If DUP ACKs are received for segments less than send_high, assume it does not indicate a drop. In case there was a drop, then there will be a time out. 18 19
Selective Acknowledgment – SACKThe latest widespread congestion control • Problem: when a multiple packets are dropped, the cumulative ACK does not give information as to which packets were dropped. As a result, fast recovery is not so fast; it takes one RTT per lost packet. • Solution: embed into the ACK some information about which packets have successfully arrived. • TCP-SACK allows ACKs to contain information about received packets. • If the packets are received in order, then the ACK looks the same as TCP-RENO or TCP-NEWRENO. But if a packet the packets arrive out of order, then the ACK contains SACK blocks. • A SACK block indicates a sequence of segments that have been received. seq num 15 20 25 30 35 A A A S S S S S S S N N N ACKed SACKed SACKed Not Sent
TCP-SACK Highest ACK seq num 15 20 25 30 35 A A A S S S S S S S N N N ACKed SACKed SACKed Not Sent left edge of 2nd block right edge of 2nd block right edge of 2nd block left edge of 2nd block SACK blocks are 8 bytes long (4 bytes for each edge) The SACK option includes 1 byte to specify that it is a SCK block and one byte for the number of SACK blocks. 1 SACK block = 10 bytes + 2 bytes padding -> 52 bytes header 2 SACK blocks = 18 bytes + 2 bytes padding -> 60 bytes header 3 SACK blocks = 26 bytes + 2 bytes padding -> 68 bytes header 4 SACK blocks = 34 bytes + 2 bytes padding -> 76 bytes header Max ACK is 80 bytes If time stamp option is used, then the max number of SACK blocks is 3. kind=5 length=2 SACK option left edge of 2st block = 26 right edge of 2st block = 30 left edge of 1st block = 20 right edge of 1st block = 23
Generation of SACKs • No SACK blocks if no out of order packets • No delayed ACK if out of order packets (send an ACK for every received packet. • When an out of order packet arrives, the first SACK block contains contain the segment that just arrived. • The ACK should contain as many SACK blocks as fit and are required (no skimping to save bit-rate). • The SACK blocks included should be those that have most recently been reported (see 3). So if there are at most 3 SACK blocks, then each continuous block of segments will be reported at least 3 times. • If the packet that arrived has just been received (a duplicate reception), then the first SACK block should identify this packet. (This is the DSACK extension to SACK). In this case, the next SACK block should indicate the continuous sequence of segments that contain the segments received in duplicate. seq num 15 20 25 30 35 A A A S S S S S S S N N N ACKed SACKed SACKed Not Sent left edge of 2nd block right edge of 2nd block right edge of 2nd block left edge of 2nd block Now suppose that segment 21 arrives for a second time. kind=5 length=2 SACK option left edge of DUP packet = 21 right edge of DUP packet = 22 left edge of 1st block = 20 right edge of 1st block = 23 left edge of 2st block = 26 right edge of 2st block = 30