600 likes | 742 Views
Chapter 3 Transport Layer. Part 3: TCP. Computer Networking: A Top Down Approach 5 th edition. Jim Kurose, Keith Ross Addison-Wesley, April 2009. . Our goals: understand principles behind transport layer services: multiplexing/demultiplexing reliable data transfer flow control
E N D
Chapter 3Transport Layer Part 3: TCP Computer Networking: A Top Down Approach 5th edition. Jim Kurose, Keith RossAddison-Wesley, April 2009. Transport Layer
Our goals: understand principles behind transport layer services: multiplexing/demultiplexing reliable data transfer flow control congestion control learn about transport layer protocols in the Internet: UDP: connectionless transport TCP: connection-oriented transport TCP congestion control Chapter 3: Transport Layer Transport Layer
3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP TCP connection segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Chapter 3 outline Transport Layer
reliable, in-order byte steam: no “message boundaries” pipelined: TCP congestion and flow control set window size send & receive buffers point-to-point: one sender, one receiver No multi-cast (at this layer) Intermediate nodes (routers, etc.) do not know anything about the protocol TCP: OverviewRFCs: 793, 1122, 1323, 2018, 2581 Transport Layer
connection-oriented: handshaking (exchange of control msgs) init’s sender, receiver state before data exchange flow controlled: sender will not overwhelm receiver full duplex data: bi-directional data flow in same connection MSS: maximum segment size TCP: OverviewRFCs: 793, 1122, 1323, 2018, 2581 Transport Layer
TCP connection • TCP in client connects to TCP in server via 3-way handshake • Client TCP sends a special TCP segment • Server responds with a second special TCP segment • Client responds with third special TCP segment • First two have no payload (i.e., application data) • Third may have payload Transport Layer
Opening a connection • One side (client) does an active open to the other (server) side Socket connectToServer = new Socket(“computer1.ithaca.edu”, 8000); • Server must have previously done a passive open. ServerSocketmyServer = new ServerSocket(port); Socket clientConnection = myServer.accept(); • Then begin the 3-way handshake Active open Passive open Transport Layer
3-way handshake • Goal: agree on a set of parameters: • Starting sequence numbers • MSS • Etc. • Client sends a segment to server with initial seg no. (Flags=SYN, SequenceNum=x) Transport Layer
3-way handshake • Server responds with a single segment that both acknowledges the client’s sequence no. (Flags=ACK, Ack = x + 1) • And states its own beginning sequence no. (Flags=SYN, SequenceNum=y) • i.e., both SYN and ACK bits are Set in the Flags field Transport Layer
3-way handshake • Client then sends third segment to acknowledge the server’s sequence no. (Flags=ACK, Ack = y + 1) • Reason for ACK’ing the Seq no: the ACK’ed no. + 1 is the next seq no. expected. Thus it implicitly ack’s all earlier seq no. • Note that a timer is set for the ACK messages. Transport Layer
3-way handshake • Why are seq no. exchanged? • Protects against two incarnations of the same connection reusing the same seq no. too soon • TCP requires that each side select an initial starting seq no at random. Transport Layer
FSM for opening/closing conn The server-side starts here The client-side starts here. When there is an active open it sends a SYN When the receives the SYN, it sends the SYN+ACK This diagram is confusing because server can actively try to connect to client (send a SYN) Transport Layer
3-way handshake • If client’s ACK to server is lost (3rd leg), then the connection still functions correctly • Client is already in ESTABLISHED state • Application can send data • Each segment sent will have ACK flag set and the correct value in the Acknowledgement field • Server will move to ESTABLISHED state when receives first segment. Transport Layer
3-way handshake • The server can send a SYN to a client • i.e., the server can actively make a connection! • No application process actually uses this • The time-out arcs are not shown in the diagram • If the expected response does not arrive resend • After several tries, give up and return to CLOSED state. Transport Layer
Closing the connection Transport Layer
Closing a connection • Both sides must independently close its half of the connection • If only one side closes, it cannot send but can still receive. • So there are three possible combinations: • This side closes first • The other side closes first • Both sides close at the same time. • There is actually a fourth possibility (from FIN WAIT 1 to TIME) Transport Layer
Closing a connection • A connection in the TIME_WAIT state cannot move to the CLOSED state until • It has waited for two times the maimum amount of time an IP datagram might live in the Internet (i.e., 120 seconds) • Reason: the local side may have sent an ACK in response to the other side’s FIN, but does not know if it was received • The other side might thus retrans it’s FIN segment • If we closed too soon, some other process might open the same connection then receive the FIN segment! Transport Layer
TCP connection • Data processing • Client sends stream of data through socket to TCP • TCP puts data into a send buffer • From time-to-time TCP grabs some data from buffer and passes it to network layer • Network layer encloses in an IP datagram, then sends over network • Network layer at server extracts the TCP segment from the IP datagram and passes the segment to server TCP layer • TCP layer extracts the data and places in a receive buffer • Server process reads data from the receive buffer Transport Layer
TCP connection • Data processing • Max data that can be grabbed by TCP is limited by the maximum segment size (MSS) • MSS set by first determining the length of the largest link-layer frame (the maximum transmission unit) • Then set MSS to ensure that a TCP segment, when encapsulated in an IP datagram plus TCP/IP header length (40 bytes) will fit into the MTU Ehternet and PPP link layer protocols both have MSS of 1,500 bytes Transport Layer
TCP data processing • TCP is byte-oriented • The sender writes bytes into a TCP connection • The receiver reads bytes out of a TCP conn • TCP does not transmit individual bytes over the internet • Source host buffers enough bytes to fill a reasonably sized packet • Destination TCP empties the content into a buffer; application reads from buffer at its leisure. Transport Layer
TCP data processing Called “segments” because they contain a segment of the byte stream This diagram only shows one direction; in reality connections are bi-directional Transport Layer
32 bits source port # dest port # sequence number acknowledgement number head len not used Receive window U A P R S F checksum Urg data pointer Options (variable length) application data (variable length) TCP segment structure URG: urgent data (generally not used) counting by bytes of data (not segments!) ACK: ACK # valid PSH: push data now (generally not used) # bytes rcvr willing to accept RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) Transport Layer
32 bits source port # dest port # sequence number acknowledgement number head len not used Receive window U A P R S F checksum Urg data pointer Options (variable length) application data (variable length) TCP segment structure 4 tuple: < SrcPort, SrcIPAddr, DstPort, DstIPAddr > Can be closed and later reopened (called incarnations) Transport Layer
32 bits source port # dest port # sequence number acknowledgement number head len not used Receive window U A P R S F checksum Urg data pointer Options (variable length) application data (variable length) TCP segment structure Seq no. are for first byte of data in segment ACK is no. of next byte expected Used in sliding window algorithm Flag bits: SYN, FIN, RESET, PUSH, URG, and ACK Checksum field same as UDP— computed over the TCP header, the TCP data, and the pseudoheader (source address, destination address, and length fields from the IP header) Transport Layer
Seq. #’s: byte stream “number” of first byte in segment’s data ACKs: seq # of next byte expected from other side cumulative ACK Q: how receiver handles out-of-order segments A: TCP spec doesn’t say, - up to implementer time TCP seq. #’s and ACKs Host B Host A User types ‘C’ Seq=42, ACK=79, data = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=79, ACK=43, data = ‘C’ host ACKs receipt of echoed ‘C’ Seq=43, ACK=80 simple telnet scenario Transport Layer
TCP Round Trip Time and Timeout Q: how to set TCP timeout value? longer than RTT but RTT varies too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss time TCP seq. #’s and ACKs Host B Host A User types ‘C’ Seq=42, ACK=79, data = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=79, ACK=43, data = ‘C’ host ACKs receipt of echoed ‘C’ Seq=43, ACK=80 simple telnet scenario Transport Layer
Q: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions SampleRTT will vary, want estimated RTT “smoother” average several recent measurements, not just current SampleRTT time TCP seq. #’s and ACKs Host B Host A User types ‘C’ Seq=42, ACK=79, data = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=79, ACK=43, data = ‘C’ host ACKs receipt of echoed ‘C’ Seq=43, ACK=80 simple telnet scenario Transport Layer
timeouts • TCP uses a timeout/retransmit mechanism like GBN or SR • How long should this be? • Larger than RTT • How much larger? • How do we estimate this in the first place? • Do we associate a timer with each segment? Transport Layer
Estimating round-trip time • Definition: SampleRTTfor a segment is the amount of time between when the segment is “sent” (passed to IP) and when an acknowledgement for the segment is received. Transport Layer
Estimating round-trip time • Approach • TCP take one SampleRTT measurement at a time, not one for each segment sent. • i.e., SampleRTT is being estimated for only one of the transmitted but currently unack’d segments. • TCP never calculates SampleRTT for a retransmitted segment. Transport Layer
Estimating round-trip time • Approach • SampleRTT varies (depends on router congestion, load on end systems, etc.) • Need some sort of average: EstimatedRTT • When TCP gets a new SampleRTT, it calculates EstimatedRTT: • EstimatedRTT is a combination of past estimates and newest time. EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT Past values Most recent value Transport Layer
Estimating round-trip time • Approach • EstimatedRTT: • EstimatedRTT is a combination of past estimates and newest time. • Recommended value of α is α = 0.125, so formula becomes: EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT EstimatedRTT = 0.875*EstimatedRTT +0.125*SampleRTT Transport Layer
Estimating round-trip time • Approach • Recommended value of α is α = 0.125, so formula becomes: • Places more weight on recent samples than old samples. How? Expand this a few terms. • Done because recent samples reflect current network congestion. • Called an Exponential weighted moving average • the weight of a given SampleRTT decays exponentially fast. • influence of past sample decreases exponentially fast • See next slide: variations in SampleRTT are smoothed out. EstimatedRTT = 0.875*EstimatedRTT +0.125*SampleRTT Transport Layer
Example RTT estimation: Transport Layer
Variability of RTT • RFC 6298 defined DevRTT • An estimate of how much SampelRTT typically deviates from EstimatedRTT: • If DevRTT is small, there there is little fluctuation in SampleRTT DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT| (typically, = 0.25) Transport Layer
TCP Round Trip Time and Timeout • Setting the timeout value for TCP • EstimtedRTT plus “safety margin” • Interval must be ≥ EstimatedRTTow unnecessary retransmissions • Interval must not be too large ow get unnecessary retransmission delays when packets are lost • large variation in EstimatedRTT -> need larger safety margin TimeoutInterval = EstimatedRTT + 4*DevRTT Transport Layer
TCP Round Trip Time and Timeout • Setting the timeout value for TCP • first estimate of how much SampleRTT deviates from EstimatedRTT: • Initial TimeoutInterval value of 1 second is recommended [RFC 6298] • When timeout occurs, double value of TimeoutInterval (avoid premature timeout of subsequent segment) • But when the next segment is received, go back to formula TimeoutInterval = EstimatedRTT + 4*DevRTT Transport Layer
3.1 Transport-layer services 3.2 Multiplexing and demultiplexing 3.3 Connectionless transport: UDP 3.4 Principles of reliable data transfer 3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 Principles of congestion control 3.7 TCP congestion control Chapter 3 outline Transport Layer
TCP creates rdt service on top of IP’s unreliable service pipelined segments cumulative ACKs TCP uses single retransmission timer Reduces overhead retransmissions are triggered by: timeout events duplicate ACKs initially consider simplified TCP sender: ignore duplicate ACKs ignore flow control, congestion control TCP reliable data transfer Transport Layer
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum loop (forever) { switch(event) event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data) event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } } /* end of loop forever */ TCP sender(simplified) • Comment: • SendBase-1: last • cumulatively ACKed byte • Example: • SendBase-1 = 71;y= 73, so the rcvrwants 73+ ;y > SendBase, sosegments 71, 72 • are considered • Acked Transport Layer
data rcvd from app: create segment with seq # seq # is byte-stream number of first data byte in segment start timer if not already running (think of timer as for oldest unACKed segment) expiration interval: TimeOutInterval Use EstimatedRTT and DevRTT to determine timeout: retransmit segment that caused timeout restart timer ACK rcvd: if acknowledges previously unACKed segments update what is known to be ACKed start timer if there are outstanding segments TCP sender events (3 events): Transport Layer
Host A Host B Seq=92, 8 bytes data ACK=100 timeout X loss Seq=92, 8 bytes data ACK=100 time lost ACK scenario TCP: retransmission scenarios Discards packet that it already received. SendBase = 100 Transport Layer
Seq=92 timeout time TCP: retransmission scenarios Host A Host B Resends first packet and resets timer. Does not resend second packet unless timeout again Seq=92, 8 bytes data Seq=100, 20 bytes data ACK=100 ACK=120 Seq=92, 8 bytes data Sendbase = 100 SendBase = 120 ACK=120 Seq=92 timeout SendBase = 120 premature timeout Transport Layer
Host A Host B Seq=92, 8 bytes data ACK=100 Seq=100, 20 bytes data timeout X loss ACK=120 time Cumulative ACK scenario TCP retransmission scenarios (more) Receives Ack to second packet. CummulativeAck, so sender knows that 1st packet also arrived. SendBase = 120 Transport Layer
Timeout interval revisited • Modifications (not all TCP implementations) • Timesout: • retransmit packet with smallest unAck’dseq no. • Double time interval • if timeout occurs again will again double interval. Etc. • If Ack is received, go back to formula for EstimatedRTT and DevRTT • Why? Limited form of congestion control • Timeout probably due to congestion • TCP acts politely; sends less packets Transport Layer
TCP ACK generation[RFC 1122, RFC 2581] TCP Receiver action Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK Immediately send single cumulative ACK, ACKing both in-order segments Immediately send duplicate ACK, indicating seq. # of next expected byte Immediate send ACK, provided that segment starts at lower end of gap Event at Receiver Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Arrival of in-order segment with expected seq #. One other segment has ACK pending Arrival of out-of-order segment higher-than-expect seq. # . Gap detected Arrival of segment that partially or completely fills gap Transport Layer
time-out period often relatively long: long delay before resending lost packet detect lost segments via duplicate ACKs. sender often sends many segments back-to-back if segment is lost, there will likely be many duplicate ACKs for that segment (see previous slide) If sender receives 3 ACKs for same data, it assumes that segment after the ACKed segment was lost: fast retransmit:resend segment before timer expires Fast Retransmit Transport Layer
Host A Host B seq # x1 seq # x2 seq # x3 ACK x1 X seq # x4 seq # x5 ACK x1 ACK x1 ACK x1 triple duplicate ACKs resend seq X2 timeout time Transport Layer
Fast retransmit algorithm: Replaces the “ACK” event in previous code event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } a duplicate ACK for already ACKed segment fast retransmit Transport Layer
Go-Back-N or Selective Repeat? • TCP acknowledgements are accumulative • Correctly received but out-of-order segments are not individually ACKd • TCP sender only need maintain the sequence no. of the smallest unACKd byte and the seq. no. of the next byte to send. • Many TCP implementations buffer correctly received but out-of-order segments • If sender sends N segments and all are ACKd, but one ACK (n < N) gets lost. • GBN would retransmit all segments > n • TCP only retransmits the lost segment and would not retransmit this if ACK for segment n+1 arrived! Looks like GBN Looks like SR Transport Layer