630 likes | 760 Views
3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer. 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management
E N D
3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control Chapter 3 outline TransportLayer
full duplex data: bi-directional data flow in same connection MSS: maximum segment size connection-oriented: handshaking (exchange of control msgs) initiates sender/receiver state before data exchange flow controlled: sender will not overwhelm receiver point-to-point: one sender, one receiver reliable, in-order byte steam: no “message boundaries” pipelined: TCP congestion and flow control set “window size” TCP: Overview RFCs: 793,1122,1323, 2018, 2581 TransportLayer
TCP segment structure 32 bits URG: urgent data (generally not used) counting by bytes of data (not segments!) source port # dest port # sequence number ACK: ACK # valid acknowledgement number head len not used receive window U A P R S F PSH: push data now (generally not used) # bytes rcvr willing to accept checksum Urg data pointer RST, SYN, FIN: connection estab (setup, teardown commands) options (variable length) application data (variable length) Internet checksum (as in UDP) TransportLayer
TCP Sequence Number • Indicates the position of the data in the packets (segments) • Every “byte” is sequenced • Used for re-ordering packets and finding lost packets • Initial Sequence Number (ISN) is randomly assigned for every TCP connection • [Note] • SYN and FIN packets also consume one sequence number, although they do not include any data. • ACK packets without payload also consume one sequence number KUT
sequence numbers: byte stream “number” of first byte in segment’s data acknowledgements: seq # of next byte expected from sender cumulative ACK Q: how receiver handles out-of-order segments A: TCP spec doesn’t say, - up to implementer outgoing segment from sender source port # source port # dest port # dest port # incoming segment to sender sequence number sequence number acknowledgement number acknowledgement number rwnd rwnd checksum checksum urg pointer urg pointer A TCP seq. numbers, ACKs window size N • sender sequence number space sent ACKed usable but not yet sent sent, not-yet ACKed (“in-flight”) not usable TransportLayer
TCP seq. numbers, ACKs Host B Host A User types ‘C’ Seq=42, ACK=79, data = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=79, ACK=43, data = ‘C’ host ACKs receipt of echoed ‘C’ Seq=43, ACK=80 simple telnet scenario TransportLayer
Q: how to set TCP timeout value? longer than RTT but RTT varies too long: slow reaction to segment loss too short: premature timeout, unnecessary retransmissions TCP round trip time, timeout TransportLayer
Q: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt ignore RTT sampling for retransmissions ([note: see the next page) SampleRTT will vary need to “smooth” estimated RTT average (평균을 내다) several recent measurements, not just current SampleRTT TCP round trip time, timeout TransportLayer
Retransmission Ambiguity TCP round trip time, timeout A B A B Original transmission Original transmission loss Timeout Timeout X ACK Sample RTT Sample RTT retransmission retransmission ACK Which one is correct for “Sample RTT”?So, what should we do? Answer: just ignore RTT sampling for retransmissions TransportLayer
time (seconds) TCP round trip time, timeout EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT • Exponential Weighted Moving Average (EWMA) • influence of past sample decreases exponentially fast • Recommended values [RFC2988]: = 0.125 RTT:gaia.cs.umass.edutofantasia.eurecom.fr RTT (milliseconds) sampleRTT EstimatedRTT TransportLayer
A Measured SNR Values #1-2 TCP round trip time, timeout dB • SampleRTT • EstimatedRTT = (1-a) * EstimatedRTT + a * SampleRTT Use Moving Average! ms a=0.6 a=0.9 a=0.1 a=0.3 KUT TransportLayer
timeout interval: EstimatedRTT plus “safety margin” large variation in EstimatedRTT -> larger safety margin estimate SampleRTT deviation from EstimatedRTT: TCP round trip time, timeout DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT| (typically, = 0.25) TimeoutInterval = EstimatedRTT + 4*DevRTT estimated RTT “safety margin” TransportLayer
TCP round trip time, timeout • Measurement of Internet Delays For 100 Successive Packets At 1 Second Intervals • TCP Timeout Interval For Sampled Internet Delays TransportLayer
Karn’s Algorithm (참고사항 - 교과서외) • Karn's algorithm • Rule 1: Ignore measured RTT for retransmitted packets. • When retransmissions occur, the RTT estimate is not updated • Reuse RTT estimate only after one successful transmission • Remove ambiguity from RTT measurements. • Rule 2: “Timeout Interval” should be doubled after retransmission. • This is called "Exponential Back-off" KUT
Karn’s Algorithm (참고사항 - 교과서외) • Why is Rule 2 necessary? • When “Timeout Interval” is smaller than Real RTT.. • If only Rule 1 is applied, TCP will use S as Timeout interval for a long time (or forever). • Many packets will be retransmitted. • More severe congestion occurs Data 1 S Real RTT Retransmission Ack Data 2 S Real RTT Retransmission Ack KUT
3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control Chapter 3 outline TransportLayer
TCP creates rdt service on top of IP’s unreliable service pipelined segments cumulative acks single retransmission timer retransmissions triggered by: timeout events duplicate acks let’s initially consider simplified TCP sender: ignore duplicate acks ignore flow control, congestion control TCP reliable data transfer TransportLayer
data rcvd from app: create segment with seq # seq # is byte-stream number of first data byte in segment start timer if not already running think of timer as for oldest unacked segment expiration interval: TimeOutInterval timeout: retransmit segment that caused timeout restart timer ack rcvd: if ack acknowledges previously unacked segments update what is known to be ACKed start timer if there are still unacked segments TCP sender events: TransportLayer
data received from application above create segment, seq. #: NextSeqNum pass segment to IP (i.e., “send”) NextSeqNum = NextSeqNum + length(data) if (timer currently not running) start timer ACK received, with ACK field value y if (y > SendBase) { SendBase = y /* SendBase–1: last cumulatively ACKed byte */ if (there are currently not-yet-acked segments) start timer else stop timer } timeout retransmit not-yet-acked segment with smallest seq. # start timer TCP sender (simplified) L wait for event NextSeqNum = InitialSeqNum SendBase = InitialSeqNum • Example: • SendBase-1 = 71; y= 73, so the rcvr wants 73~ ;y > SendBase, so ~72 data are ACKed TransportLayer
Seq=100, 20 bytes of data ACK=120 ACK=100 TCP: retransmission scenarios Host B Host B Host A Host A SendBase=92 Seq=92, 8 bytes of data Seq=92, 8 bytes of data timeout timeout ACK=100 X Seq=92, 8 bytes of data Seq=92, 8 bytes of data SendBase=100 SendBase=120 ACK=100 ACK=120 SendBase=120 lost ACK scenario premature timeout TransportLayer
Seq=100, 20 bytes of data timeout ACK=100 ACK=120 TCP: retransmission scenarios Host B Host A Seq=92, 8 bytes of data X Seq=120, 15 bytes of data cumulative ACK TransportLayer
TCP ACK generation[RFC 1122, RFC 2581] TCP receiver action Wait up to 500ms for next segment. If no next segment within 500ms, send delayed ACK. immediately send single cumulative (delayed) ACK, ACKing both in-order segments immediately send duplicate ACK, indicating seq. # of next expected byte immediate send cumulative ACK, event at receiver arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed arrival of in-order segment with expected seq #. One other segment has ACK pending arrival of out-of-order segment higher-than-expect seq. # . Gap detected arrival of segment that partially or completely fills gap TransportLayer
time-out period often relatively long: long delay before resending lost packet detect lost segments via duplicate ACKs. sender often sends many segments back-to-back if segment is lost, there will likely be many duplicate ACKs. TCP fast retransmit TCP fast retransmit if sender receives 3 ACKs for same data (“triple duplicate ACKs”), resend unacked segment with smallest seq # • likely that unacked segment lost, so don’t wait for timeout (“triple duplicate ACKs”), TransportLayer
timeout ACK=100 ACK=100 ACK=100 ACK=100 TCP fast retransmit Host B Host A Seq=92, 8 bytes of data Seq=100, 20 bytes of data X triple duplicate ACKs Seq=100, 20 bytes of data fast retransmit after sender receipt of triple duplicate ACK TransportLayer
TCP - Cumulative Acknowledgement sender receiver • Let’s think the following scenario (1/3) Seq. #=101, 100 bytes data Seq. #=201, 100 bytes data Seq. #=301, 100 bytes data Seq. #=401, 100 bytes data Seq. #=501, 100 bytes data Seq. #=601, 100 bytes data Acq. #=201 Acq. #=401 Acq. #=501 Acq. #=601 Acq. #=701 TransportLayer
TCP - Cumulative Acknowledgement sender receiver • Let’s think the following scenario (2/3) Seq. #=101, 100 bytes data Seq. #=201, 100 bytes data Seq. #=301, 100 bytes data Seq. #=401, 100 bytes data Seq. #=501, 100 bytes data Seq. #=601, 100 bytes data Acq. #=201 Acq. #=201 Acq. #=201 Acq. #=201 Timeout Acq. #=201 Seq. #=201, 100 bytes data TransportLayer
TCP - Cumulative Acknowledgement • Let’s think the following scenario (3/3) sender receiver Seq. #=101, 100 bytes data Seq. #=201, 100 bytes data Seq. #=301, 100 bytes data Seq. #=401, 100 bytes data Seq. #=501, 100 bytes data Seq. #=601, 100 bytes data Acq. #=201 Acq. #=201 Acq. #=201 Acq. #=201 Seq. #=201, 100 bytes data Acq. #=201 Duplicate ACKs & Fast Retransmit Acq. #=701 TransportLayer
Fast retransmit algorithm: event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { /* y == SendBase */ increment count of “duplicate ACKs” received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } a duplicate ACK for already ACKed segment fast retransmit TransportLayer
More TCP Scenario (1/3) Segment Corruption Receiver sender TimerStart Seq : 1001, 200bytes cumulative (delayed) ACK Seq : 1201, 200bytes Seq : 1401, 200bytes Segment 3 - corrupted Timer Start(update) for the unACKed packet (Seq: 1401) ACK : 1401 Seq : 1401, 200bytes Timeout ACK : 1601 OK (Everything is ok.) TransportLayer 3-29
More TCP Scenario (2/3) Lost segment Receiver sender TimerStart Seq : 1001, 200bytes Seq : 1201, 200bytes Seq : 1401, 200bytes Timer Start(update) for the unACKed packet (Seq: 1401) Segment 3 - lost ACK : 1401 Seq : 1401, 200bytes Timeout ACK : 1601 OK (Everything is ok.) TransportLayer
More TCP Scenario (3/3) Cumulative Ack Scenario Receiver sender Seq : 1001, 200bytes Seq : 1201, 200bytes Seq : 1401, 200bytes ACK : 1401 Acknowledgement Lost ACK : 1601 (Everything is ok.) TransportLayer
If we should select one which is more similar to TCP, TCP is more close to GBN… (or TCP is mix of GBN and SR) However, TCP is different from GBN… TCP is GBN or SR? TransportLayer
GBN: ACK number is seq # of pkt being ACKed. TCP: ACK number represents the expected next number. GBN: No buffering at Receiver, TCP: buffering at Receiver GBN sender retransmits the pkt n and all higher seq # pkts in window at timeout(n). But, TCP retransmits only pkt n. TCP is GBN or SR? 1 2 rcv pkt3, No discard, Beffering send ACK2 1 rcv pkt4, No discard, Beffering send ACK2 2 rcv pkt5, No discard, Beffering send ACK2 rcv pkt2, send ACK6 TransportLayer
3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control Chapter 3 outline TransportLayer
flow control application OS (currently) unused buffer space application process IP datagrams TCP data (in buffer) receiver controls sender, so sender won’t overflow receiver’s buffer by transmitting too much, too fast TCP socket receiver buffers TCP flow control application process application may remove data from TCP socket buffers …. • receive side of TCP connection has a receive buffer: … slower than TCP receiver receives (sender is sending) TCP code IP code from sender receiver protocol stack TransportLayer
receiver “advertises” free buffer space by including rwnd value in TCP header of receiver-to-sender segments RcvBuffer size set via socket options many operating systems autoadjust RcvBuffer sender limits amount of unacked (“in-flight”) data to receiver’s rwnd value guarantees receive buffer will not overflow buffered data free buffer space speed-matching service:matching send rate to receiving application’s drain rate TCP flow control to application process RcvBuffer rwnd TCP segment payloads receiver-side buffering TransportLayer
3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 principles of congestion control 3.7 TCP congestion control Chapter 3 outline TransportLayer
before exchanging data, sender/receiver “handshake”: agree to establish connection (each knowing the other willing to establish connection) agree on connection parameters Connection Management application application • connection state: ESTAB • connection variables: • seq # client-to-server • server-to-client • rcvBuffer size • at server,client • connection state: ESTAB • connection Variables: • seq # client-to-server • server-to-client • rcvBuffer size • at server,client network network Socket clientSocket = newSocket("hostname","port number"); Socket connectionSocket = welcomeSocket.accept(); TransportLayer
client state server state LISTEN LISTEN choose init seq num, x send TCP SYN msg SYNSENT SYNbit=1, Seq=x choose init seq num, y send TCP SYNACK msg, acking SYN SYN RCVD SYNbit=1, Seq=y ACKbit=1; ACKnum=x+1 received SYNACK(x+1) indicates server is live; send ACK for SYNACK; this segment may contain client-to-server data ESTAB ACKbit=1, ACKnum=y+1 received ACK(y+1) indicates client is live TCP 3-way handshake ESTAB TransportLayer
TCP 3-way handshake: FSM closed Socket connectionSocket = welcomeSocket.accept(); L Socket clientSocket = newSocket("hostname","port number"); SYN(x) SYNACK(seq=y,ACKnum=x+1) create new socket for communication back to client SYN(seq=x) listen SYN sent SYN rcvd SYNACK(seq=y,ACKnum=x+1) ACK(ACKnum=y+1) ESTAB ACK(ACKnum=y+1) L TransportLayer
client, server each close their side of connection send TCP segment with FIN bit = 1 respond to received FIN with ACK on receiving FIN, ACK can be combined with own FIN simultaneous FIN exchanges can be handled Instead of FIN, TCP layer can send a RST segment that terminates a connection if something is wrong. TCP: closing a connection TransportLayer
App2 App1 FIN SN=X 1 ACK=X+1 2 ... FIN SN=Y 3 ACK=Y+1 4 TCP: closing a connection • Modified 3 way handshake (or 4 way termination) App1: “I have no more data for you. Send FIN segment”. App2: “OK, I understand you are done sending. Send ACK segment” …..server can send data to client…. App2: “OK - Now I’m also done sending data. Send FIN segment”. App1: “I understand , Goodbye. Send ACK segment” 1 2 3 4 TransportLayer
clientSocket.close() FINbit=1, seq=x FIN_WAIT_1 can no longer send but can receive data CLOSE_WAIT ACKbit=1; ACKnum=x+1 can still send data FIN_WAIT_2 wait for server close LAST_ACK can no longer send data FINbit=1, seq=y TIMED_WAIT ACKbit=1; ACKnum=y+1 timed wait for 2*max segment lifetime CLOSED CLOSED TCP: closing a connection client state server state ESTAB ESTAB TransportLayer
TCP: closing a connection • Why TIME_WAIT? • This gives enough time to Client TCP so as to ensure the ACK it sent to the server was correctly received. • If the ACK the client sent is lost, the server will re-transmit FIN. • The FIN should be received by Client Start TIME_WAIT Re-Start TIME_WAIT TransportLayer
MSS (Maximum Segment Size) • Link MTU vs. Path MTU vs. MSS • Maximum Transmission Unit (MTU) is defined by the maximum payload size of the Layer 2 frame. • Link MTU: The max packet size that can be transmitted over a link • Path MTU: The minimum link MTU of all links in a path between a source and a destination • Layer 3 payload determines Layer 4 Maximum Segment Size(MSS) TransportLayer
MSS (Maximum Segment Size) • What is MSS? • MSS: Maximum Segment Size • Largest payload size that TCP can send for this connection. • Usually, MSS is calculated by “Maximum Transmission Unit (MTU) - 40 bytes.” MAC Header (Path MTU) TransportLayer
MSS (Maximum Segment Size) • What is MSS? • An example of MSS negotiation • In this example, both sides use 960 bytes as MSS. • In Modern Internet, path MTU is usually 1500 and MSS can be 1460 • Self-check: http://www.speedguide.net:8080 TransportLayer
3.1 transport-layer services 3.2 multiplexing and demultiplexing 3.3 connectionless transport: UDP 3.4 principles of reliable data transfer 3.5 connection-oriented transport: TCP segment structure reliable data transfer flow control connection management 3.6 & 3.7 principles of congestion control / TCP congestion control (국내서 기준 -pp.301 and pp.310~317) Chapter 3 outline TransportLayer
congestion: informally: “too many sources sending too much data too fast for network to handle” different from flow control! manifestations: lost packets (buffer overflow at routers) long delays (queueing in router buffers) a top-10 problem! Principles of congestion control TransportLayer
Principles of congestion control • Flow Control vs. Congestion Control Src Dest Limits amount ofdata that destinationmust buffer Src Dest Attempts to reducebuffer overflow insidethe network TransportLayer