450 likes | 696 Views
3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer. 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management
E N D
3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Transport Layer Outline Transport Layer
Wait for ACK0 Wait for ACK1 Wait for call 1 from above Wait for call 0from above Recap: rdt3.0 sender (Stop-and-wait) rdt_send(data) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,1) ) sndpkt = make_pkt(0, data, checksum) udt_send(sndpkt) start_timer L rdt_rcv(rcvpkt) L timeout udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,1) rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0) stop_timer stop_timer timeout udt_send(sndpkt) start_timer rdt_rcv(rcvpkt) L rdt_send(data) rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) || isACK(rcvpkt,0) ) sndpkt = make_pkt(1, data, checksum) udt_send(sndpkt) start_timer L Transport Layer
Recap: rdt3.0: stop&wait op sender receiver first packet bit transmitted, t = 0 last packet bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK ACK arrives, send next packet, t = RTT + L / R Transport Layer
Recap: Pipelining: increased utilization sender receiver first packet bit transmitted, t = 0 last bit transmitted, t = L / R first packet bit arrives RTT last packet bit arrives, send ACK last bit of 2nd packet arrives, send ACK last bit of 3rd packet arrives, send ACK ACK arrives, send next packet, t = RTT + L / R Increase utilization by a factor of 3! Transport Layer
Sender: There is a k-bit sequence # in packet header “window” of up to N, consecutive unacknowledged sent/can-be-sent packets allowed window moves by 1 packet at a time when its 1st sent pkt is acknowledged (standard behavior) Recap: GBN for Pipelined Error Recovery window cannot contain acknowledged pkts Sender must respond to three types of events: • 1- Invocation from above: application layers tries to send a packet, if window is full then packet is returned otherwise the packet is accepted and sent. • 2- Receipt of an ACK: One ACK(n) received indicates that all pkts up to, including seq # n have been received - “cumulative ACK” • may receive duplicate ACKs (when receiver receives out-of-order packets) • 3- A timeout event (only cause of retransmission): • timer for each in-flight pkt. • if timeout occurs: retransmit packets that have not been acknowledged. Transport Layer
Recap: Selective repeat for error recovery Window may contain acknowledged pkts (unlike GBN) Transport Layer
full duplex data: bi-directional data flow in same connection at the same time flow controlled: sender will not overwhelm receiver point-to-point: one sender, one receiver no one to many multicasts connection-oriented: processes must handshake before sending data three-way handshake: (exchange of control msgs) initializes sender, receiver state before data exchange pipelined: TCP congestion and flow control set window size send & receive buffers: set-aside during the 3-way handshaking TCP: OverviewRFCs: 793, 1122, 1323, 2018, 2581 Transport Layer
Maximum Segment Size (MSS): Defined as the maximum amount of application-layer data in the TCP segment. TCP grabs data in chunks from the send buffer where the maximum chunk size is called MSS. TCP segment contains TCP header and MSS. MSS is set by determining the largest link layer frame (Maximum Transmission Unit or MTU) that can be sent by the local host MSS is set so that an MSS put into an IP datagram will fit into a single link layer frame. Common values of MTU is 1460 bytes, 536 bytes and 512 bytes. TCP sequence #s: both sides randomly choose initial seq #s (other than 0) to prevent receiving segments of older connections that were using the same ports. TCP views data as unordered structured stream of bytes so seq #s are over the stream of byes. file size of 500,000 bytes and MSS=1,000 bytes, segment seq #s are: 0, 1000, 2000, etc. TCP acknowledgement #s: uses cumulative acks: TCP only acks bytes up to the first missing byte in the stream . TCP RFCs do not address how to handle out-of-order segments. ACK # field has the next byte offset that the sender or receiver is expecting TCP: Overview - cont Transport Layer
TCP segment structure 32 bits URG: urgent data (generally not used) source port # dest port # counting by bytes of data (not segments!) largest file that can be sent = 232 (4GB) total #segments= filesize/MSS sequence number ACK: ACK # valid acknowledgement number not used header length Receive window U A P R S F PSH: push data now to upper layer 16-bit= # bytes receiver willing to accept (RcvWindow size) checksum Urgent data pointer Options (variable length) used to negotiate MSS SYN/FIN: connection setup and close. RST=1: used in response when client tries to connect to a non-open server port . header-length = 4-bits in 32-bit words application data (variable length) Internet checksum (as in UDP) Transport Layer
Seq Numbers and Ack Numbers • Suppose a data stream of size 500,000 bytes, MSS is 1,000 bytes; the first byte of the data stream is numbered zero. • Seq number of the segments: • 1st seg: 0; 2nd seg: 1000; 3rd seg: 2000, … • Ack number: • Assume host A is sending seg to host B. Because TCP is full-duplex, A may be receiving data from B simultaneously. • Ack number that host B puts in its seg is the seq number of the next byte B is expecting from A • B has received all bytes numbered 0 through 535 from A. If B is about to send a segment to host A. The ack number in its segment should 536 Transport Layer
Telnet uses “echo back” to ensure characters seen by user already been received and processed at server. Assume starting seq #s are 42 and 79 for client and server respectively. After connection is established, client is waiting for byte 79 and server for byte 42. Seq. #’s: byte stream “number” of first byte in segment’s data ACKs: seq # of next byte expected from other side cumulative ACK time TCP seq. #’s and ACKs - Telnet example Host B server Host A client User types ‘C’ Seq=42, ACK=79, data = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=79, ACK=43, data = ‘C’ host ACKs receipt of echoed ‘C’ Seq=43, ACK=80 simple telnet scenario Transport Layer
Q: how to set TCP timeout value ? (timer management) based on RTT longer than RTT but RTT varies too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss Q: how to estimate RTT? SampleRTT: measured time from segment transmission (handing the segment to IP) until ACK receipt ignore retransmissions (why?) SampleRTT will vary from segment to segment, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT TCP maintains an average called EstimatedRTT to use it to calculate the timeout value TCP Round Trip Time and Timeout Transport Layer
TCP Round Trip Time (RTT) and Timeout EstimatedRTT = (1- ) * priorEstimatedRTT + * currentSampleRTT • Exponential Weighted Moving Average (EWMA) • Puts more weight on recent samples rather than old ones • influence of past sample decreases exponentially fast • typical value: = 0.125 • Formula becomes: EstimatedRTT = 0.875 * priorEstimatedRTT + 0.125 * currentSampleRTT Why TCP ignores retransmissions when calculating SampleRTT: Suppose source sends packet P1, the timer for P1 expires, and the source then sends P2, a new copy of the same packet. Further suppose the source measures SampleRTT for P2 (the retransmitted packet) and that shortly after transmitting P2 an acknowledgment for P1 arrives. The source will mistakenly take this acknowledgment as an acknowledgment for P2 and calculate an incorrect value of SampleRTT. Transport Layer
A B Original transmission eRTT ACK Sample RTT retransmission RTT Sample Ambiguity A B • Karn’s RTT Estimator • If a segment has been retransmitted: • Don’t count RTT sample on ACKs for this segment • Keep backed off time-out for next packet • Reuse RTT estimate only after one successful transmission Original transmission X Estimate RTT Sample RTT retransmission ACK Transport Layer
Example RTT estimation: Transport Layer
Setting the timeout EstimtedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT: TCP Round Trip Time and Timeout DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT| (typically, = 0.25) Then set timeout interval: TimeoutInterval = EstimatedRTT + 4*DevRTT Transport Layer
segment structure RTT Estimation and Timeout reliable data transfer flow control connection management TCP: conn-oriented transport Transport Layer
TCP creates rdt service on top of IP’s unreliable service Pipelined segments Cumulative acks TCP uses single retransmission timer as multiple timers require considerable overhead Retransmissions are triggered by: timeout events duplicate acks Initially consider simplified TCP sender: ignore duplicate acks ignore flow control, congestion control TCP reliable data transfer Transport Layer
data rcvd from app: Create segment with seq # seq # is byte-stream number of first data byte in segment start timer if not already running for some other segment (think of timer as for oldest unacknowledged segment) expiration interval: TimeOutInterval timeout: retransmit segment that caused timeout restart timer Ack rcvd: a valid ACK field (cumulative ACK) acknowledges previously unacknowledged segments: update expected ACK # restart timer if there are currently unacknowledged segments TCP sender events: Transport Layer
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum loop (forever) { switch(event) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data) event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */ TCP sender(simplified) • Comment: • SendBase-1: last • cumulatively ack’ed byte • Example: • SendBase-1 = 71;y= 73, so the rcvrwants 73+ ;y > SendBase, sothat new data is acked Transport Layer
Host A Host B Seq=92, 8 bytes data ACK=100 Seq=92 timeout timeout X loss Seq=92, 8 bytes data ACK=100 time time lost ACK scenario TCP: retransmission scenarios Host A Host B Seq=92, 8 bytes data Seq=100, 20 bytes data transmit not-yet-ack segment with smallest seq # ACK=100 ACK=120 Seq=92, 8 bytes data Sendbase = 100 SendBase = 120 ACK=120 Seq=92 timeout SendBase = 100 SendBase = 120 premature timeout Transport Layer
Host A Host B Seq=92, 8 bytes data ACK=100 Seq=100, 20 bytes data timeout X loss ACK=120 time Cumulative ACK scenario TCP retransmission scenarios (more) • Doubling the timeout value technique is used in TCP implementations. The timeout value is doubled for every retransmission since the timeout could have occurred because the network is congested. (the intervals grow exponentially after each retransmission and reset after either of the two other events) SendBase = 120 Transport Layer
TCP ACK generation policy[RFC 1122, RFC 2581] TCP Receiver action Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK Immediately send single cumulative ACK, ACKing both in-order segments Immediately send duplicate ACK, indicating seq. # of next expected byte Immediate send ACK, provided that segment starts at lower end of gap Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Arrival of in-order segment with expected seq #. One other segment has ACK pending Arrival of out-of-order segment higher-than-expect seq. # . Gap detected Arrival of segment that partially or completely fills gap leaves buffering of out-of-order segments open Transport Layer
Time-out period often relatively long: long delay before resending lost packet Detect lost segments via duplicate ACKs. Dup Ack is an ack that reaknolwedges the receipt of an acknowledged segment Sender often sends many segments back-to-back If segment is lost, there will likely be many duplicate ACKs. If sender receives 3 ACKs for the same data, it supposes that segment after last ACKed segment was lost: sender performs fast retransmit:resend segment before that segment’s timer expires algorithm comes as a result of 15 years TCP experience ! Fast Retransmit Transport Layer
Fast retransmit algorithm: event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } a duplicate ACK for already ACKed segment fast retransmit Transport Layer
TCP can buffer out-of-order segments (like SR). TCP has a proposed RFC called selective acknowledgement to selectively acknowledge out-of-order segments and save on retransmissions (like SR). TCP sender need only maintain smallest seq # of a transmitted but unacknowledged byte and the seq # of next byte to be sent (like GBN). TCP is hybrid between GBN and SR. Is TCP a GBN or SR protocol ? Transport Layer
segment structure RTT Estimation and Timeout reliable data transfer flow control connection management TCP: conn-oriented transport Transport Layer
receive side of TCP connection has a receive buffer: speed-matching service: matching the send rate to the receiving app’s drain rate flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast TCP Flow Control • app process may be slow at reading from buffer Transport Layer
(Suppose TCP receiver discards out-of-order segments) sender maintains variable called receive window spare room in buffer = RcvWindow = RcvBuffer-[LastByteRcvd - LastByteRead] TCP is not allowed to overflow the allocated buffer (LastByteRcvd - LastByteRead <= RcvBuffer) Rcvr advertises spare room by including value of RcvWindow in segments RcvWindow = RcvBuffer at the start of transmission Sender limits unACKed data to RcvWindow sender keeps track of UnAcked data size = (LastByteSent - LastByteAcked) UnAcked data size <= RcvWindow When Receiver RcvWindow = 0, Sender does not block but rather sends 1 byte segments that are acked by receiver until RcvWindow becomes bigger. TCP Flow control: how it works Transport Layer
segment structure RTT Estimation and Timeout reliable data transfer flow control connection management TCP: conn-oriented transport Transport Layer
Client create socket, connect to hostid, port=x create socket, port=x, for incoming request: clientSocket = Socket() welcomeSocket = ServerSocket() TCP connection setup wait for incoming connection request connectionSocket = welcomeSocket.accept() send request using clientSocket read request from connectionSocket write reply to connectionSocket read reply from clientSocket close connectionSocket close clientSocket Recap: TCP socket interaction Server (running on hostid) Transport Layer
Recall:TCP sender, receiver establish “connection” before exchanging data segments initialize TCP variables: seq. #s buffers, flow control info (e.g. RcvWindow) client: connection initiator Socket clientSocket = new Socket("hostname","port number"); server: contacted by client Socket connectionSocket = welcomeSocket.accept(); TCP Connection Management 32 bits source port # dest port # sequence number acknowledgement number not used header length Receive window U A P R S F checksum Urgent data pointer Options (variable length) used to negotiate MSS application data (variable length) Transport Layer
Three way handshake: Step 1: client host sends TCP SYN segment (SYN bit=1) to server specifies initial seq # (client_isn) no data Step 2: server host receives SYN, replies with SYNACK segment server allocates buffers specifies server initial seq. # (server_isn), with ACK # = client_isn+1 Step 3: client receives SYNACK, replies with ACK # = server_isn+1, which may contain data TCP Connection Management - connecting client server conn request SYN=1, seq=client_isn conn granted SYN=1, seq=server_isn, ack=client_isn+1 ACK SYN=0, seq=client_isn+1, ack=server_isn+1 Time Time Transport Layer
TCP Connection Setup Example 09:23:33.042318 IP 128.2.222.198.3123 > 192.216.219.96.80: S 4019802004:4019802004(0) win 65535 <mss 1260,nop,nop,sackOK> 09:23:33.118329 IP 192.216.219.96.80 > 128.2.222.198.3123: S 3428951569:3428951569(0) ack 4019802005 win 5840 <mss 1460,nop,nop,sackOK> 09:23:33.118405 IP 128.2.222.198.3123 > 192.216.219.96.80: . ack 3428951570 win 65535 • Client SYN • SeqC: Seq. #4019802004, window 65535, max. seg. 1260 • Server SYN-ACK+SYN • Receive: #4019802005 (= SeqC+1) • SeqS: Seq. #3428951569, window 5840, max. seg. 1460 • Client SYN-ACK • Receive: #3428951570 (= SeqS+1) sackOK: selective acknowledge Transport Layer
Closing a connection: client closes socket:clientSocket.close(); Step 1:client end system sends TCP FIN control segment (FIN bit=1) to server Step 2:server receives FIN, replies with ACK. Closes connection, sends FIN=1. client server close FIN ACK close FIN ACK timed wait closed TCP Connection Management - disconnecting Transport Layer
Step 3:client receives FIN, replies with ACK. Enters “timed wait” - will respond with ACK to received FINs where typical wait is 30 sec. All resources and ports are released. Step 4:server, receives ACK. Connection closed. TCP Connection Management (cont.) client server closing FIN ACK closing FIN ACK timed wait closed closed Transport Layer
TCP Conn.Teardown Example 09:54:17.585396 IP 128.2.222.198.4474 > 128.2.210.194.6616: F 1489294581:1489294581(0) ack 1909787689 win 65434 09:54:17.585732 IP 128.2.210.194.6616 > 128.2.222.198.4474: F 1909787689:1909787689(0) ack 1489294582 win 5840 09:54:17.585764 IP 128.2.222.198.4474 > 128.2.210.194.6616: . ack 1909787690 win 65434 • Session • Echo client on 128.2.222.198, server on 128.2.210.194 • Client FIN • SeqC: 1489294581 • Server ACK + FIN • Ack: 1489294582 (= SeqC+1) • SeqS: 1909787689 • Client ACK • Ack: 1909787690 (= SeqS+1) Transport Layer
TCP Connection Management (cont) TCP server lifecycle TCP client lifecycle Transport Layer
Queue Management • Two queues for each listening socket Transport Layer
Concurrent Server (1) pid_t pid; (2) int listenfd, connfd; (3) listenfd = Socket( ... ); (4) /* fill in sockaddr_in{} with server's well-known port */ (5) Bind(listenfd, ... ); (6) Listen(listenfd, LISTENQ); (7) for ( ; ; ) { (8) connfd = Accept (listenfd, ... ); /* probably blocks */ (9) if( (pid = Fork()) == 0) { (10) Close(listenfd); /* child closes listening socket */ (11) doit(connfd); /* process the request */ (12) Close(connfd); /* done with this client */ (13) exit(0); /* child terminates */ (14) } (15) Close(connfd); /* parent closes connected socket */ (16) } Transport Layer
Concurrent Server (Cont’) (a) Status before call to call to accept returns (b) status after return from accept (d) Status after parent/child close appropriate sockets (c) Status after return of spawning a process Transport Layer
TCP Summary • TCP Properties: • point to point, connection-oriented, full-duplex, reliable • TCP Segment Structure • How TCP sequence and acknowledgement #s are assigned • How does TCP measure the timeout value needed for retransmissions using EstimatedRTT and DevRTT • TCP retransmission scenarios, ACK generation and fast retransmit • How does TCP Flow Control work • TCP Connection Management: 3-segments exchanged to connect and 4-segments exchanged to disconnect Transport Layer