290 likes | 490 Views
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581. TCP control parameters(state). point-to-point: one sender, one receiver connection-oriented: exchange control msgs first to initialize sender & receiver state full duplex data delivery: bi-directional data flow over the same connection
E N D
TCP: OverviewRFCs: 793, 1122, 1323, 2018, 2581 TCP control parameters(state) • point-to-point: one sender, one receiver • connection-oriented: • exchange control msgs first to initialize sender & receiver state • full duplex data delivery: • bi-directional data flow over the same connection • reliable, in-order byte steam delivery • no “message boundaries” • sender & receiver must buffer data • flow controlled • Prevent sender from flooding receiver • Congestion controlled • Reduce potential jam in the network application reads data application writes data Socket Interface TCP receive buff TCP send buffer CS118
What defines a TCP connection • TCP uses 4 values to define a connection (a communication association) local-host-addr, local-port#,remote-host-addr, remote-port# • each of the two ends keeps state for on-going communication • sequence# for data sent, received, ack'ed, retransmission timer, flow & congestion window TCP UDP IP Ethernet CS118
Issues To Consider • packets may be lost,duplicated,re-ordered • packets can be delayed arbitrarily long inside the network • the delay between two communicating ends is unknown beforehand and may vary over time • port numbers can be reused later • a later connection must not mistake packets from an earlier connection as its own CS118
TCP segment format 32 bits URG: urgent data (generally not used) counting by bytes of data ACK: ACK # field valid PSH: push data now (generally not used) # bytes rcvr willing to accept RST, SYN, FIN: connection estab. (setup, teardown commands) checksum (as in UDP) IP header source port # dest port # sequence number acknowledgement number head len not used rcvr window size U A P R S F checksum ptr to urgent data Options (variable length) application data (variable length) CS118
TCP Connection Establishment SYN (#) SYN-ACK/SYN (#) SYN-ACK connection established connection established listen( ) • initialize TCP control variables: • Initial seq. # used in each direction • Buffer size (rcvWindow) Three way handshake 1: client host sends TCP SYN segment to server • specifies initial seq # • Does not carry data 2: server receives SYN, replies with SYN_ACK and SYN control segment 3: client end sends SYN_ACK • May carry data server client connect( ) CS118
TCP Connection Close Either end can initiate the close of its end of the connection at any time 1: one end (A) sends TCP FIN control segment to the other 2: the other end (B) receives FIN, replies with FIN_ACK; when it’s ready to close too, send FIN 3: A receives FIN, replies with FIN-ACK. 4: B receives FIN_ACK, close connection what problem does A have? FIN FIN-ACK FIN FIN-ACK connection closed B A server client close( ) close( ) ? CS118
the well-known “two-army problem” Q: how can the 2 red armies agree on an attack time? • Fact: the last one who send a message does not whether the msg is delivered • Basic rule: one cannot send an ACK to acknowledge an ACK Blue army Red army Red army CS118
TCP Connection Close 1: one end (A) sends TCP FIN control segment to the other 2: the other end (B) receives FIN, replies with FIN_ACK; when it’s ready to close too, send FIN 3: A receives FIN, replies with ACK. 4: B receives FIN_ACK, close connection A Enters “timed wait”, waits for 2 min before deleting the connection state Abort a connection: send “reset” to the other end, enter closed state immediately All data assumed lost FIN FIN-ACK FIN FIN-ACK connection closed timed wait connection closed B A server client close( ) close( ) CS118
TCP Connection Management (cont) wait 2 min TCP server lifecycle TCP client lifecycle CS118
A B I-finished(M) ACK (M+1) I-finished(N) • ack(N+1) • wait for 2MSL • before deleting • the conn state Done TCP state-transition diagram CLOSED Active open /SYN Passive open Close Close LISTEN SYN/SYN + ACK Send/ SYN SYN/SYN + ACK SYN_RCVD SYN_SENT ACK SYN + ACK/ACK Close /FIN ESTABLISHED Close /FIN FIN/ACK FIN_WAIT_1 CLOSE_WAIT FIN/ACK ACK Close /FIN ACK + FIN/ACK FIN_WAIT_2 CLOSING LAST_ACK Timeout after two ACK ACK segment lifetimes FIN/ACK TIME_WAIT CLOSED CS118
How to Set TCP Retransmission Timer data Timeout! Timeout ACK retrans. data data retrans. SampleRTT ACK • TCP sets rxt timer based on measured RTT SRTT: EstimatedRTT SRTT= (1-) x SRTT + x SampleRTT • Setting retransmission timer: • SRTT plus “safety margin” Timer= SRTT + 4 X rttvar CS118
After obtain a new RTT sample: • difference = SampleRTT - SRTT • SRTT = (1-) x SRTT + x SampleRTT = SRTT + x difference • rttvar = (1-) x rttvar + x |difference| ) = rttvar + (|difference| - rttvar) • Retransmission Timer (RTO) = SRTT + 4 x rttvar Typically: = 1/8, = 1/4 CS118
An Example 650 600 Assuming SRTT = 500 msec, rttvar = 120, RTT(3)=600ms, = |RTT - SRTT| = 100ms SRTT = 500 + 0.125 * 100 = 512.5 rttvar = 120 + 0.25 (100 - 120) = 115 RTO = SRTT + 4 * rttvar = 512.5 + 460 = 972.5 ms RTT(4)=650ms, = |RTT - SRTT| =137ms SRTT = 512 + 0.125 * 137 = 529 rttvar = rttvar + 0.25 (137 - 115) = 120 sender 4 3 receiver CS118
Example RTT estimation: CS118
How to measure RTT in cases of retransmissions? D S Options • take the delay between first transmission and final ACK? • take the delay between last retransmission of segment(n) and ACK(n)? • Don’t measure? RTT? timeout CS118
Karn’s algorithm in case of retransmission • do not take the RTT sample (do not update SRTT or rttvar) • double the retransmission timer value (RTO) after each timeout • Take RTT measure again upon next transmission (without retrans.) CS118
One more question What initial SRTT, rttvar values to start with? • Currently by some engineered guessing • what if the guessed value too small? • Unnecessary retransmissions • what if the guessed value too large? • In case of first or first few packets being lost, wait longer than necessary before retransmission • current practice initial SRTT value: 3 sec, rttvar 3 sec when get first RTT, SRTTRTT, rttvar=SRTT/2 CS118
TCP’s seq. #s and ACK #s Seq. #: The number of first byte in segment’s data ACK #: seq # of next byte expected from other side cumulative ACK time Host B Host A Host A sends 10byte data Seq=42, ACK=79, data host B ACKs receipt of 10B data from A, and sends 5byte data Seq=79, ACK=52, data host ACKs receipt of 5B Seq=52, ACK=84 A simple example CS118
How to guarantee seq. # uniqueness • sequence#s will eventually wrap around • TCP assumes Maximum Segment Lifetime (MSL) of 120 sec. • make sure that for the same [src-addr, src-port, dest-addr, dest-port] tuple, the same sequence number does not get reused within 2xMSL • assure that no two different data segments can bear the same sequence number, as long as data’s life time < 120 sec. CS118
TCP: reliable data transfer • simplified sender, assuming • one way data transfer • not flow/congestion control 00 SendBase = Initial_SeqNumber 01 NextSeqnum = Initial_SeqNumber 02 03 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with seq. number NextSeqNum 07 start timer for segment SextSeqNum 08 pass segment to IP 09 NextSeqNum = NextSeqNum + length(data) 10 event: timer timeout for segment with seq. number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer 14 event: ACK received, with ACK field value of y 15 if (y > SendBase) {/* cumulative ACK of all data up to y*/ 16 SendBase = y 17 If (any outstanding not-yet-ack'ed segments) 18 Start timer } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment count of duplicate ACKs received for y 21 if (count of dup. ACKS received for y = 3) { 22 resend segment with sequence number y 23 reset dup. count 24 } 25 } /* end of loop forever */ event: data received from application create, send segment event: timeout for segment with seq # y wait for event wait for event retransmit segment event: ACK received, with ACK # y ACK processing CS118
Fast Retransmit Time-out period often relatively long: long delay before resending lost packet Detect lost segments via duplicate ACKs. Sender often sends many segments back-to-back If segment is lost, there will likely be many duplicate ACKs. If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: fast retransmit:resend segment before timer expires CS118
TCP: retransmission scenarios Host A Host B Seq=92, 8 bytes data ACK=100 Seq=92 timeout timeout X loss Seq=92, 8 bytes data ACK=100 time time lost ACK scenario Host A Host B Seq=92, 8 bytes data Seq=100, 20 bytes data ACK=100 ACK=120 Seq=92, 8 bytes data Sendbase = 100 SendBase = 120 ACK=120 Seq=92 timeout SendBase = 100 SendBase = 120 premature timeout CS118
TCP retransmission scenarios (more) Seq=92, 500 bytes data Seq=592, 500B data ACK592 Seq=1092, 500B data Seq=1592, 500B data ACK592 timeout timeout ACK592 Seq=2092, 500B data Seq=592, 500B data Host A Host B Host A Host B Seq=92, 8 bytes data X ACK=100 Seq=100, 20 bytes data timeout X loss ACK592 ACK=120 SendBase = 120 time time Fast RXT scenario Cumulative ACK scenario CS118
TCP Receiver: when to send ACK? Event TCP Receiver action delayed ACK: wait up to 500ms, If nothing arrived, send ACK in-order segment arrival, no gaps, everything earlier already ACKed in-order segment arrival, no gaps, one delayed ACK pending immediately send one cumulative ACK out-of-order arrival: higher-than-expect seq. #, gap detected send duplicate ACK, indicating seq. # of next expected byte arrival of segment that partially or completely fills a gap immediate ACK if segment starts at the lower end of the gap CS118
TCP Flow Control receiver: informs sender of (dynamically changing) amount of free buffer space RcvWindow field in TCP header sender: keeps the amount of transmitted, unACKed data no more than most recently received RcvWindow flow control window-size throughput = bytes/sec RTT Prevent sender from overrunning receiver’s buffer by transmitting too much too fast • Special case: When RcvWindow = 0 • sender can send a 1-byte segment • receiver can respond with current size • receiver buffer eventually freed windown size increased CS118
Design Choice:Counting bytes or counting packets? 300 200 pro’s of counting bytes: flexibility • need a byte counter somewhere anyway • can repackage data for retransmission • e.g. first sent segment-1 with 200 bytes • 300 more bytes are passed down from application • Segment-1 times out, send new segment with 500 byte data CS118
Counting Bytes: con's • sequence number runs out faster • needs a larger sequence# field • easily fall into traps of transmitting small packets • network overhead goes up with the number of packets transmitted • silly window syndrome: receiver ACKed a single byte, causing sender to send single byte segment forever CS118
Design Choices:Understand the consequence of the design • TCP sequence number: 32 bits4 Gbytes • wrap-around time: • 50 Kbps: ~20 hours • Ethernet (10 Mbps): about an hour • FDDI (100 Mbps): 6 minutes • at 1Gbps: about 30 seconds • TCP window size: 16-bits64Kbytes max assume RTT = 100 msec • can keep a channel of 5 Mbps fully utilized • OC3(155 Mbps) x 100 msec = 1.9 MB, need a window size at least 21 bits • 1 Gbps x 100 msec = CS118
Always Keeps the Big Picture in Mind M H H H H H H t l n t n t Application process Application process M M Write bytes Read bytes application transport network link physical M TCP TCP Send buffer Receive buffer segment segment Web server Web browser HTTP HTTP Socket interface Socket interface TCP TCP Unreliable network data packet delivery CS118