460 likes | 651 Views
TCP: Transmission Control Protocol. Overview Connection set-up and termination Interactive Bulk transfer Timers Improvements. TCP: Overview. Connection oriented, byte stream service Full or half duplex service Reliability (ARQ) Sliding window with variable sized window
E N D
TCP: Transmission Control Protocol • Overview • Connection set-up and termination • Interactive • Bulk transfer • Timers • Improvements
TCP: Overview • Connection oriented, byte stream service • Full or half duplex service • Reliability (ARQ) • Sliding window with variable sized window • Stream is sent in segments (IP datagrams) • SN for bytes • Receiver buffer reorders bytes • Checksum on header and data • Discards duplicate data • Flow control
TCP: Overview Data from A Acks from A A B Data from B Acks from B
65535-20-20=65495 TCP: Overview TCP segment IP Header TCP Header TCP Data Source port # Destination port # Sequence # Acknowledgement # HL reserv flags Window size Checksum Urgent pointer Options if any
TCP: Flags • URG: The urgent pointer is used • ACK: The acknowledgement number is valid • PSH: The receiver should pass this data to the • application as soon as possible • RST: Reset the connection • SYN: Synchronize sequence numbers to initiate • a connection. • FIN: The sender is finished sending data
TCP: Flags SYN When starting a TCP connection this bit is set Sequence # = Initial sequence number (ISN) URG byte offset of urgent data are to be found Urgent Ptr + SN = last byte of urgent data Options Example, maximum segment size (MSS) Max sized segment each end wants to receive Default = 536 byte payload + 20, SUN 1500
TCP: Set-Up Syn=1 Ack = 0 A:SYN, MSS, SN=ISN A B B: SYN, MSS, SN=ISN Syn1 ack 1 B: ACK Full duplex A: ACK
TCP: Termination A:FIN A B B: ack of FIN B: FIN Both sides close A: ack of FIN
TCP: Termination, Half Close appl shutdown FIN deliver eof of appl ack of FIN appl write data ack of data appl shutdown FIN deliver eof of appl ack of FIN
TCP • Reset segments: sent whenever a segment • is received that doesn’t appear correct for • the referenced connection • To indicate wrong port • To indicate an abortive release • 16 bit window size • T3 = 44.736Mbps – 12 msec • If Rrt = 50ms • Left Shift up to 14 bit – by agreement
TCP: Interactive Data Flow Key stroke data byte ack of data byte Echo echo of data byte ack of echoed byte PSH=1
TCP: Interactive Data Flow Telnet and rlogin carry small chunks of data Typically 10 bytes or less IP header = 20 bytes TCP header = 20 bytes data = 1 byte Inefficient! Nagle algorithm: Only one outstanding segment In the meantime bytes are collected. Stop and wait
TCP: Interactive Data Flow Delayed ACKs. Acks are delayed approx 200ms This allows them to be accumulated before being piggybacked on a segment Nagle algorithm: Only one outstanding segment In the meantime bytes are collected. Stop and wait
TCP: Interactive Data Flow data Collect incoming bytes Self clocking = data rate is inversely dependent on rate at which acks return ack of data byte data
Nagle Alg. Default in telnet or rlogin What about in X windows?
TCP: Bulk Data Flow • TCP is a sliding window protocol • What big should the window be? • The bigger the window, the higher throughput • Not too big since it’ll swamp resources and • cause packet loss • Bandwidth delay product • capacity (bits) = bw (bits/s) x round trip time (sec) • Start from small window • If under bw, increase window size (probing) • If over bw (lose packets), decrease window size • (backoff)
Bulk Transfer • Dynamic sliding window • Offered window: • Advertised by the receiver in segments • Amount of buffer space at receiver • Congestion window (cwnd): • Set by sender • Local to the sender • Dynamically adjusted to optimize performance
TCP windows offered window usable window 1 2 3 4 5 6 7 8 9 10 11 can send asap sent and acked sent but not acked
TCP windows Actually, min{offered window from receiver, cwnd} usable window 1 2 3 4 5 6 7 8 9 10 11 can send asap sent and acked sent but not acked
Bulk Transfer: cwnd • Congestion window (cwnd) is dynamically adjusted • to optimize performance • Slow Start: • F = # bytes in a frame, set by receiver • Initially, cwnd = F • Each time an ack is received, cwnd = cwnd + F • Self-clocking: acks are generated at the same • rate as they are being • received, with the same kind of spacing in time.
Bulk Transfer: cwnd cwnd 1 1 ack1 2 2 3 ack2 ack3 4 5 6 7 4 Doubling every RT!
Bandwidth Delay Product This used to be 1 Recall for sliding window Throughput R= min{C, W/T} W = window size C = bandwidth T = round trip time We want R = C, thus W/T = C W = C x T = bandwidth x delay
TCP Timeout and Retransmission • Each data has a retransmission timer • It is initialized by the retransmission time out • (RTO) value • When the timer expires, a time out occurs and • the data is retransmitted • If a retransmission fails then the time-out doubles • i.e., exponential backoff. • It’s important to find a good RTO value
TCP RTO • RTO = Rb, • R = RTT round trip time estimate • b recommended to be 2 • Original Round trip time measurement • Update: R = axR + (1-a)xM, • a = fraction, recommendation = .9 • M = measured RTT • Not a good estimator due to high variance in meas.
Jacobson RTO estimate • M = Measured RTT • A = Averaged estimate of RTT • Err = M - A • D = Averaged |Err| value • A = A + g*Err g = 1/8 • D = D +h*(|Err| - D) h = 1/4 • RTO = A + 4D
Congestion Avoidance • variables: • cwnd = current window • ssthresh = estimate of the “best” window • i.e., largest window that won’t cause loss • Packet loss is indicated by time out or the receipt of • duplicate acks (3) • 1. Initialization: cwnd = bytes for a segment • ssthresh = 65535
Congestion Avoidance • 2. When congestion is detected (packet loss detected, • i.e, TO or duplicate ACK): • ssthresh = max{ current window/2 , 2} • Additionally, if timeout, cwnd = 1 (begin slow start) • 3. When new data is acknowledged • if cwnd <= ssthresh then slow start • if cwnd > ssthresh then congestion avoidance
Congestion Avoidance • Slow Start: • Upon receiving ack, cwnd++ • Exponentially increasing • Congestion Avoidance: • Upon receiving ack, cwnd += 1/cwnd • Linearly increasing cwnd 100 101 102 100 acks 101 acks
Fast Retransmit and Fast Recovery If tree or more duplicate packet, likely to have lost packet Should we wait RTO? If one or two duplicate packet, reordered It is with congestion avoidance rather than slow start
Fast Retransmit and Fast Recovery • Fast retransmit: Avoid timeouts and slow start. • 1. When a third duplicate ack is received • set ssthresh = current window/2 • retransmit the missing segment • cwnd = ssthresh + 3 x segment size - avoidance • 2. Each time another duplicate ack arrives • increment cwnd by the segment size • transmit a packet when window reaches new • packets
Fast Retransmit and Fast Recovery • 2. Each time another duplicate ack arrives • cwnd = cwnd + segment size • transmit a packet if window covers new packets 1 2 3 4 5 6 7 8 9 10 11 sent but not acked can send asap stuck • 3. When the ack arrives that acknowledges new data • cwnd = ssthresh.
TCP • Slow start: cwnd =1 • cwnd exponentially increasing • Congestion avoidance: cwnd reaches ssthresh, • cwnd linearly increasing
TCP • 3 dup acks, fast retransmit of packet old cwnd Packets • ssthresh = cwnd/2, cwnd = ssthresh + 3 • cwnd increments by 1 per duplicate ack. Note • no transmissions while cwnd <= old cwnd • Thus, oldcwnd/2 packets are in the pipe • When an ack for new data arrives, • ssthresh = cwnd and --> congestion avoidance
Additive Increase -- Multiplicative Decrease R1 Helps fairness C R2 additive increase multiplicative decrease R1 R1+R2 = C Tends to converge to R1 = R2 R2
TCP: Tahoe and Reno Tahoe: slow start + congestion avoidance Reno: fast retransmit + fast recovery
Improvements: TCP Vegas min round trip delay (x = 0) window size measured round trip delay D W T Router (single bottleneck) x R = transm rate = transm. rate router C backlog x = W - RD W = packets in flight + x = RD + x measured D = min{T}
Improvements: TCP Vegas • TCP Vegas: keep x at 2 for all flows (x = W - RD) • if W - RD <= 1 then increase W • if W - RD >= 3 then decrease W D W T Router (single bottleneck) x Leads to fair bandwidth allocation at router.
cwnd E(T) Back Traffic
Vegas cwnd E(T) Back Traffic
Improvements -- TCP Reno • Improvements by Janey Hoe (New Reno): • sstresh is initially too big: 65K • ssthresh should be estimate of bw x delay • TCP Reno does not work well with multiple losses Bandwidth estimate: send three closely spaced packets. Measure the times between their acks 1/times is approximate measure of bw of bottleneck link Delay estimate: Round trip time estimates
Improvements -- TCP Reno • Multiple losses: • TCP Reno retransmits one packet per RTT under • fast retransmit. • (Note a packet is retransmitted only under • fast retransmit (3 dup acks) or TO/slow start) • Improvement: Fast retransmit • Sometimes it can get stuck -- window doesn’t • cover new packets, and no more acks.
Improved Fast Retransmit max packet sent = sndMax 3 dup acks window Packets (i) ssthresh = cwnd/2 cwnd = 1 segment save_cwnd = ssthresh+1 (ii) Retransmit everything in using slow start Upon receiving 2 dup acks, send a new packet This phase is over when an ack is recvd for sndMax
TCP Timers • Keep alive timer: periodically transmit a message • (no data) to see if other end is alive. Application • telnet (client just turns off PC) • Persist timer: If windows go to zero, TCP is stuck • Window probes query receivers to see if • window has increased (1 byte of data • beyond window) • TCP has a 500 ms timer (crude) • Time Outs have exponential backoff. • Silly window syndrome: avoid small windows