490 likes | 507 Views
CSE 524: Lecture 12. Transport Layer (Part 3). Transport Layer. Last class CIDR exam question Specific transport layers UDP This class TCP. TL: TCP and Transport Layer Functions. Demux to upper layer Quality of service Security Delivery semantics Flow control Congestion control
E N D
CSE 524: Lecture 12 Transport Layer (Part 3)
Transport Layer • Last class • CIDR exam question • Specific transport layers • UDP • This class • TCP
TL: TCP and Transport Layer Functions • Demux to upper layer • Quality of service • Security • Delivery semantics • Flow control • Congestion control • Reliable data transfer
TL: TCP Overview RFCs: 793, 1122, 1323, 2018, 2581 full duplex data: bi-directional data flow in same connection MSS: maximum segment size connection-oriented: handshaking (exchange of control msgs) init’s sender, receiver state before data exchange protocol implemented at ends (“fate-sharing”) flow and congestion controlled: sender will not overwhelm receiver or network point-to-point: one sender, one receiver reliable, in-order byte steam: no “message boundaries” pipelined: TCP congestion and flow control set window size send & receive buffers
TL: TCP header 32 bits source port # dest port # sequence number acknowledgement number head len not used rcvr window size U A P R S F checksum ptr urgent data Options (variable length) application data (variable length) URG: urgent data (generally not used) counting by bytes of data (not segments!) ACK: ACK # valid PSH: push data now (generally not used) # bytes rcvr willing to accept RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP)
TL: TCP connections TCP sender, receiver establish “connection” before exchanging data segments initialize TCP variables: Initial sequence #s Buffers, flow control info (e.g. RcvWindow) Window scaling client: connection initiator server: contacted by client Java API Socket clientSocket = new Socket("hostname","port#”); Socket connectionSocket = welcomeSocket.accept();
TL: TCP connections • Three way handshake: • Step 1:client end system sends TCP SYN control segment to server • specifies initial seq # • should be random to prevent spoofing ( http://www.rfc-editor.org/rfc/rfc1948.txt ) • Step 2:server end system receives SYN, replies with SYNACK control segment • ACKs received SYN • allocates buffers • specifies server-> receiver initial seq. # • Step 3:client receives SYNACK control segment, replies with ACK and potentially data • ACKs received SYNACK • goes to established state
TL: TCP Connection Establishment • A and B must agree on initial sequence number selection • 3-way handshake A B SYN + Seq A SYN+ACK-A + Seq B ACK-B
TL: TCP Sequence Number Selection • Why not simply chose 0? • Must avoid overlap with earlier incarnation • Client machine seq #0, initiates connection to server with seq #0. • Client sends one byte and machine crashes • Client reboots and initiates connection again • Server thinks new incarnation is the same as old connection
TL: TCP Sequence Number Selection • Why is selecting a random ISN Important? • Suppose machine X selects ISN based on predictable sequence • Fred has .rhosts to allow login to X from Y • Evil Ed attacks • Disables host Y – denial of service attack • Make a bunch of connections to host X • Determine ISN pattern a guess next ISN • Fake pkt1: [<src Y><dst X>, guessed ISN] • Fake pkt2: desired command • Attack popularized by K. Mitnick
TL: TCP ISN selection and spoofing attacks 7. Door now open, rlogin to X from Ed directly Ed 5. Send pre-canned rlogin/rsh messages rsh echo “Ed” >> .rhosts spoof acknowledgements 2. Spoof TCP SYN from Y With spoofed Y ISN 6. Real acks dropped so Y does not reset connection 3. TCP SYNACK ACK spoofed Y ISN Send X ISN PACKET DROPPED! 4. Send ACK with guess of X’s ISN as if you received TCP SYNACK 1. Flood continuously .rhosts Y X Ed Y
TL: TCP connection setup CLOSED active OPEN create TCB Snd SYN passive OPEN CLOSE create TCB delete TCB CLOSE LISTEN delete TCB APP SEND rcv SYN SYN RCVD SYN SENT snd SYN ACK snd SYN rcv SYN snd ACK Rcv SYN, ACK rcv ACK of SYN Snd ACK CLOSE ESTAB Send FIN
TL: TCP connections Data transfer for established connections using sequence numbers and sliding windows with cumulative ACKs Seq. #’s: byte stream “number” of first byte in segment’s data ACKs: seq # of next byte expected from other side cumulative ACK duplicate acks sent when out-of-order packet received See web trace Java API connectionSocket.receive(); clientSocket.send(); time Host B Host A User types ‘C’ Seq=42, ACK=79, data = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=79, ACK=43, data = ‘C’ host ACKs receipt of echoed ‘C’ Seq=43, ACK=80 simple telnet scenario
TL: TCP connections Closing a connection: Client-initiated close (reverse process for server-initiated close) Java API:clientSocket.close(); Step 1:client end system sends TCP FIN control segment to server Step 2:server receives FIN, replies with ACK. Closes connection, sends FIN. client server close FIN ACK close FIN ACK timed wait closed
TL: TCP connections Step 3:client receives FIN, replies with ACK. Enters “timed wait” - will respond with ACK to received FINs Step 4:server, receives ACK. Connection closed. Note:with small modification, can handle simultaneous FINs. client server closing FIN ACK closing FIN ACK timed wait closed closed
TL: TCP Connection Tear-down Sender Receiver FIN FIN-ACK Data write Data ack FIN FIN-ACK
TL: TCP Connection Tear-down CLOSE ESTAB send FIN CLOSE rcv FIN send FIN send ACK FIN WAIT-1 CLOSE WAIT rcv FIN rcv ACK CLOSE snd ACK snd FIN rcv FIN+ACK FIN WAIT-2 CLOSING LAST-ACK snd ACK rcv ACK of FIN rcv ACK of FIN TIME WAIT CLOSED rcv FIN Timeout=2msl snd ACK delete TCB
TL: Time Wait Issues • Cannot close connection immediately after receiving FIN • What if a new connection restarts and uses same sequence number? • Web servers not clients close connection first • Established Fin-Waits Time-Wait Closed • Why would this be a problem? • Time-Wait state lasts for 2 * MSL • MSL is should be 120 seconds (is often 60s) • Servers often have order of magnitude more connections in Time-Wait
TL: TCP connections TCP server lifecycle TCP client lifecycle
TL: TCP Demux to upper layer multiplexing/demultiplexing: based on sender, receiver port numbers, IP addresses source, dest port #s in each segment recall: well-known port numbers for specific applications Servers wait on well known ports (/etc/services) Multiplexing: gathering data from multiple app processes, enveloping data with header (later used for demultiplexing) 32 bits source port # dest port # other header fields application data (message) TCP/UDP segment format
TL: TCP Demux to upper layer Source IP: C Dest IP: B source port: x dest. port: 80 Source IP: C Dest IP: B source port: y dest. port: 80 Source IP: A Dest IP: B source port: x dest. port: 80 source port:23 dest. port: x source port: x dest. port: 23 Web client host C server B host A port use: simple telnet app Web server B Web client host A port use: Web server
TL: TCP Flow control • TCP is a sliding window protocol • For window size n, can send up to n bytes without receiving an acknowledgement • When the data is acknowledged then the window slides forward • Each packet advertises a window size • Indicates number of bytes the receiver has space for • Original TCP always sent entire window • Congestion control now limits this
TL: TCP Flow control receiver: explicitly informs sender of (dynamically changing) amount of free buffer space RcvWindow field in TCP segment sender: keeps the amount of transmitted, unACKed data less than most recently received RcvWindow flow control sender won’t overrun receiver’s buffers by transmitting too much, too fast RcvBuffer= size or TCP Receive Buffer RcvWindow = amount of spare room in Buffer receiver buffering
TL: TCP Flow control • What happens if window is 0? • Receiver updates window when application reads data • What if this update is lost? • Deadlock • TCP Persist timer • Sender periodically sends window probe packets • Receiver responds with ACK and up-to-date window advertisement
TL: TCP flow control enhancements • Problem: (Clark, 1982) • If receiver advertises small increases in the receive window then the sender may waste time sending lots of small packets • What happens if window is small? • Small packet problem known as “Silly window syndrome” • Receiver advertises one byte window • Sender sends one byte packet (1 byte data, 40 byte header = 4000% overhead)
TL: TCP flow control enhancements • Solutions to silly window syndrome • Clark (1982) • receiver avoidance • prevent receiver from advertising small windows • increase advertised receiver window by min(MSS, RecvBuffer/2) • Nagle’s algorithm (1984) • sender avoidance • prevent sender from unnecessarily sending small packets • http://www.rfc-editor.org/rfc/rfc896.txt • “Inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged” • Allow only one outstanding small (not full sized) segment that has not yet been acknowledged • Works for idle connections (no deadlock) • Works for telnet (send one-byte packets immediately) • Works for bulk data transfer (delay sending)
TL: TCP reliable data transfer • Segment integrity • Acknowledgement generation • Retransmission
TL: TCP RDT segment integrity • Checksum included in header • Is it sufficient to just checksum the packet contents? • No, need to ensure correct source/destination • Pseudoheader – portion of IP hdr that are critical • Checksum covers Pseudoheader, transport hdr, and packet body • Layer violation, redundant with parts of IP checksum
TL: TCP RDT acks and timeouts • TCP’s reliable data transfer approach • Cumulative acknowledgements • Receiver sends back the byte number it expects to receive next • Out of order packets generate duplicate acknowledgements • Receive 1, Ack 2 • Receive 4, Ack 2 • Receive 3, Ack 2 • Receive 2, Ack 5 • Retransmissions • Sender sends segment and sets a timer • Waits for an acknowledgement indicating segment was received • Send 1 • Wait for Ack 2 • No Ack 2 and timer expires • Send 1 again
TL: TCP RDT acks and timeouts event: data received from application above simplified sender, assuming • one way data transfer • no flow, congestion control create, send segment wait for event event: timer timeout for segment with seq # y wait for event retransmit segment event: ACK received, with ACK # y ACK processing
TL: TCP RDT acks and timeouts 00sendbase = initial_sequence number 01 nextseqnum = initial_sequence number 02 03 loop (forever) { 04 switch(event) 05 event: data received from application above 06 create TCP segment with sequence number nextseqnum 07 start timer for segment nextseqnum 08 pass segment to IP 09 nextseqnum = nextseqnum + length(data) 10 event: timer timeout for segment with sequence number y 11 retransmit segment with sequence number y 12 compute new timeout interval for segment y 13 restart timer for sequence number y 14 event: ACK received, with ACK field value of y 15 if (y > sendbase) { /* cumulative ACK of all data up to y */ 16 cancel all timers for segments with sequence numbers < y 17 sendbase = y 18 } 19 else { /* a duplicate ACK for already ACKed segment */ 20 increment number of duplicate ACKs received for y 21 if (number of duplicate ACKS received for y == 3) { 22 /* TCP fast retransmit */ 23 resend segment with sequence number y 24 restart timer for segment y 25 } 26 } /* end of loop forever */ Simplified TCP sender
TL: TCP delayed acknowledgements • Problem: • In request/response programs, you send separate ACK and Data packets for each transaction • Delay ACK in order to send ACK back along with data • Solution: • Don’t ACK data immediately • Wait 200ms (must be less than 500ms – why?) • Must ACK every other packet • Must not delay duplicate ACKs • Without delayed ACK: 40 byte ack + data packet • With delayed ACK: data packet includes ACK • See web trace example • Extensions for asymmetric links • See later part of lecture
TL: TCP ACK generation[RFC 1122, RFC 2581] TCP Receiver action delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK immediately send single cumulative ACK send duplicate ACK, indicating seq. # of next expected byte immediate ACK if segment starts at lower end of gap Event in-order segment arrival, no gaps, everything else already ACKed in-order segment arrival, no gaps, one delayed ACK pending out-of-order segment arrival higher-than-expect seq. # gap detected arrival of segment that partially or completely fills gap
TL: TCP retransmission • Wait at least one RTT before retransmitting packet • Importance of accurate RTT estimators: • Estimator too low unneeded retransmissions • Estimator too high poor throughput, slow reaction to segment loss • RTT estimator must adapt to change in RTT • But not too fast, or too slow! • Backing off the retransmission timeout • Exponential backoff • Double retransmission timer interval after every loss until successful retransmission
TL: TCP retransmission scenarios Host A Host B Seq=92, 8 bytes data ACK=100 timeout X loss Seq=92, 8 bytes data ACK=100 time time lost ACK scenario Host A Host B Seq=92, 8 bytes data Seq=100, 20 bytes data Seq=92 timeout ACK=100 ACK=120 Seq=100 timeout Seq=92, 8 bytes data ACK=120 premature timeout, cumulative ACKs
TL: Initial Round-trip Estimator • Round trip times exponentially averaged: • Recommended value for x: 0.1-0.2 • 0.125 for most TCP’s • Influence of given sample decreases exponentially fast • Retransmit timer set to b RTT, where b = 2 • Every time timer expires, RTO exponentially backed-off • Like Ethernet • Not good at preventing spurious timeouts EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
TL: Jacobson’s Retransmission Timeout • Key observation: • At high loads round trip variance is high • Need larger safety margin with larger variations in RTT • Solution: • Base RTO value on RTT and standard deviation (RRTT)
TL: Jacobson’s Retransmission Timeout Setting the timeout EstimtedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT Timeout = EstimatedRTT + 4*Deviation Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT|
TL: Retransmission Ambiguity A B Original transmission RTO ACK Sample RTT retransmission A B Original transmission X RTO Sample RTT retransmission ACK
TL: Karn’s algorithm • Accounts for retransmission ambiguity • If a segment has been retransmitted: • Don’t count RTT sample on ACKs for this segment • Keep backed off time-out for next packet • Reuse RTT estimate only after one successful transmission
TL: Timer Granularity • Many TCP implementations set RTO in multiples of 200,500,1000ms • Why? • Avoid spurious timeouts – RTTs can vary quickly due to cross traffic • Make timers interrupts efficient
TL: TCP Congestion Control • Motivated by ARPANET congestion collapse • Flow control, but no congestion control • Sender sends as much as the receiver resources will allow • Go-back-N on loss, burst out advertised window • Congestion control • Extending control to network resources • Underlying design principle: packet conservation • At equilibrium, inject packet into network only when one is removed • Basis for stability of physical systems (fluid model) • Why was this not working before? • No equilibrium • Solved by self-clocking and congestion window • Spurious retransmissions • Solved by accurate RTO estimation (see earlier discussion) • Resource limitations prevent equilibrium • Solved by congestion window and congestion avoidance algorithms
TL: TCP congestion control basics • Keep a congestion window, cwnd • Book calls this “Congwin” • Denotes how much network is able to absorb • Sender’s maximum window: • Min (receiver’s advertised window, cwnd) • Sender’s actual window: • Max window - unacknowledged segments
TL: TCP Congestion Control end-end control (no network assistance) transmission rate limited by congestion window size, cwnd over segments: w * MSS throughput = Bytes/sec RTT cwnd • w segments, each with MSS bytes sent in one RTT:
TL: TCP congestion control: two “phases” slow start congestion avoidance important variables: cwnd ssthresh: defines threshold between two slow start phase, congestion control phase (Book calls this threshold) useful reference http://www.aciri.org/floyd/papers/sacks.ps.Z “probing” for usable bandwidth: ideally: transmit as fast as possible (cwnd as large as possible) without loss increasecwnd until loss (congestion) loss: decreasecwnd, then begin probing (increasing) again
TL: TCP slow start • Start the self-clocking behavior of TCP • Use acks to clock sending new data • Do not send entire advertised window in one shot Pr Pb Sender Receiver Ab As Ar
TL: TCP slow start exponential increase (per RTT) in window size Window actually increases to W in RTT * log2(W) Can overshoot window and cause packet loss Slowstart algorithm time Host A Host B initialize: cwnd = 1 for (each segment ACKed) cwnd++ until (loss event OR cwnd > ssthresh) one segment RTT two segments four segments
TL: TCP slow start example One RTT 0R 1 One pkt time 1R 1 2 3 2R 2 3 4 6 5 7 4 5 6 7 3R 8 10 12 14 9 11 13 15
TL: TCP slow start sequence plot . . . Sequence No Time