580 likes | 718 Views
Outline. The Transport Layer The TCP Protocol (RFC 793, 1122, 1323,...) TCP Characteristics TCP Connection setup TCP Segments TCP Sequence Numbers TCP Sliding Window Timeouts and Retransmission (Congestion Control and Avoidance) The UDP Protocol (RFC 768). Well known port numbers.
E N D
Outline • The Transport Layer • The TCP Protocol (RFC 793, 1122, 1323,...) • TCP Characteristics • TCP Connection setup • TCP Segments • TCP Sequence Numbers • TCP Sliding Window • Timeouts and Retransmission • (Congestion Control and Avoidance) • The UDP Protocol (RFC 768)
Well known port numbers • 0-1023 is managed by IANA, e.g.:
Application Layer Transport Layer O.S. O.S. Network Layer Link Layer D D D D D D H H H H H H Data Data Header Header Review of the transport layer Athena.MIT.edu Leland.Stanford.edu Nick Dave
Layering: The OSI Model layer-to-layer communication Application Application 7 7 Presentation Presentation 6 6 Session Session 5 5 Peer-layer communication Transport Transport Router Router 4 4 Network Network Network Network 3 3 Link Link Link Link 2 2 Physical Physical Physical Physical 1 1
FTP ASCII/Binary TCP IP Ethernet or HDLC +V.35 Layering: Our FTP Example Application Application Presentation Transport Session Transport Network Network Link Link Physical The 7-layer OSI Model The 4-layer Internet model
TCP Characteristics • TCP is connection-oriented. • 3-way handshake used for connection setup/teardown. • TCP provides a stream-of-bytes service. • TCP is reliable: • Acknowledgements indicate delivery of data. • Checksums are used to detect corrupted data. • Sequence numbers detect missing, or mis-sequenced data. • Corrupted data is retransmitted after a timeout. • Mis-sequenced data is re-sequenced. • (Window-based) Flow control prevents over-run of receiver. • TCP uses congestion control to share network capacity among users.
TCP is connection-oriented (Active) Client (Passive) Server (Active) Client (Passive) Server Syn Fin Syn + Ack (Data +) Ack (Data) Ack Fin Ack Connection Setup 3-way handshake Connection Close/Teardown 2 x 2-way handshake
TCP supports a “stream of bytes” service Host A Byte 0 Byte 1 Byte 2 Byte 3 Byte 80 Host B Byte 0 Byte 1 Byte 2 Byte 3 Byte 80
…which is emulated using TCP “segments” Host A Byte 0 Byte 1 Byte 2 Byte 3 Byte 80 • Segment sent when: • Segment full (MSS bytes), • Not full, but times out, or • “Pushed” by application. TCP Data TCP Data Host B Byte 0 Byte 1 Byte 2 Byte 3 Byte 80
Pseudo header used in checksum IP header
The TCP Segment Format IP Data IP Hdr TCP Data TCP Hdr 0 15 31 Src port Dst port Sequence # Src/dst port numbers and IP addresses uniquely identify socket Ack Sequence # TCP Header and Data + IP Addresses Flags Window Size RSVD 6 HLEN 4 SYN PSH URG RST FIN ACK Checksum Urg Pointer (TCP Options) TCP Data
32 bits source port # dest. port # sequence number acknowledgement number head len not used rcvr window size U A P R S F checksum ptr urgent data Options (variable length) application data (variable length) TCP segment structure URG: urgent data (generally not used) counting by bytes of data (not segments!) ACK: ACK # valid PSH: push data now (generally not used) # bytes rcvr willing to accept RST, SYN, FIN: connection established (setup, tear down commands) typically: maximum TCP payload (default is 536bytes); window scale, selective repeat Internet checksum (as in UDP)
Sequence Numbers Host A ISN (initial sequence number) Sequence number = 1st byte TCP HDR TCP Data Ack sequence number = next expected byte TCP HDR TCP Data Host B
Initial Sequence Numbers (Active) Client (Passive) Server Syn +ISNA Syn + Ack +ISNB Ack Connection Setup 3-way handshake
3-way Handshake for connection establishment Host A Host B SYN, Seq_no = x SYN, Seq_no = y, ACK, Ack_no = x+1 Seq_no = x+1, ACK, Ack_no = y+1
TCP application example Host B (Server) Host A (Client) socket bind listen accept (blocks) socket connect (blocks) SYN, Seq_no = x SYN, Seq_no = y, ACK, Ack_no = x+1 connect returns Seq_no = x+1, ACK, Ack_no = y+1 write read (blocks) accept returns read (blocks) request message read returns write read (blocks) reply message read returns
TCP Window control Host A Host B t0 Seq_no = 1, Ack_no = 2000, Win = 2048, No Data t1 Seq_no = 2000, Ack_no = 1, Win = 1024, Data = 2000-3023 t2 Seq_no = 3024, Ack_no = 1, Win = 1024, Data = 3024-4047 t3 Seq_no = 1, Ack_no = 4048, Win = 512, Data = 1-128 t4 Seq_no = 4048, Ack_no = 129, Win = 1024, Data = 4048-4559
Connection Termination Host A Host B FIN, seq = 5086 ACK = 5087 Data, seq. = 303, ACK = 5087 Deliver 150 bytes ACK = 453 FIN, seq. =453, ACK = 5087 ACK = 454
TCP flow control • Window based • Sender cannot send more data than a window without acknowledgements. • Window is a minimum of receiver’s buffer and ‘congestion window’. • After a window of data is transmitted, in steady state, acks control sending rate.
TCP Flow control • Congestion window is increased gradually • At the beginning, set cwnd = 1 (TCP segm) • At the beginning, set treshold = 64K • For each ack, double the cwnd until a threshold (slow start) • Increase by 1 for a window of acks after that (additive increase)
Basic Control Model • Reduce speed when congestion is perceived • How is congestion signaled? • Either mark or drop packets • How much to reduce? • Increase speed otherwise • Probe for available bandwidth – how?
Phase Plots • Simple way to visualize behavior of competing connections over time User 2’s Allocation x2 User 1’s Allocation x1
Phase Plots • What are desirable properties? • What if flows are not equal? Fairness Line Overload User 2’s Allocation x2 Optimal point Underutilization Efficiency Line User 1’s Allocation x1
Additive Increase/Decrease • Both X1 and X2 increase/ decrease by the same amount over time • Additive increase improves fairness and additive decrease reduces fairness Fairness Line T1 User 2’s Allocation x2 T0 Efficiency Line User 1’s Allocation x1
Muliplicative Increase/Decrease • Both X1 and X2 increase by the same factor over time • Extension from origin – constant fairness Fairness Line T1 User 2’s Allocation x2 T0 Efficiency Line User 1’s Allocation x1
Fairness Line x1 x0 User 2’s Allocation x2 x2 Efficiency Line User 1’s Allocation x1 What is the Right Choice? • Constraints limit us to AIMD • Can have multiplicative term in increase • AIMD moves towards optimal point
TCP Congestion Avoidance Congestion avoidance /* slowstart is over */ /* Congwin > threshold */ Until (loss event) { every w segments ACKed: Congwin++ } threshold = Congwin/2 Congwin = 1 perform slowstart 1
TCP Congestion Control • When TCP sender sees loss in the network, TCP window is reduced (sending rate slowed) • In fact, TCP cuts the window size in half whenever a loss occurs and then slowly builds it back up
TCP Sliding Window Window Size Data ACK’d Outstanding Un-ack’d data Data OK to send Data not OK to send yet • Retransmission policy is “Go Back N”. • Current window size is “advertised” by receiver • (usually 4k – 8k Bytes when connection set-up).
Round-trip time Window Size Window Size ??? ACK (2) RTT = Window size TCP Sliding Window Round-trip time Window Size Host A Host B ACK ACK (1) RTT > Window size
TCP: Retransmission and Timeouts Round-trip time (RTT) Retransmission TimeOut (RTO) Guard Band Host A Estimated RTT Data1 Data2 ACK ACK Host B • TCP uses an adaptive retransmission timeout value: • Congestion • Changes in Routing RTT changes frequently
RTT probability density large network small network
Q: how to set TCP timeout value? too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss even worse: RTT fluctuates Q: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions, cumulatively ACKed segments SampleRTT will vary, want a “smoother” estimated RTT use several recent measurements, not just current SampleRTT Using the average of SampleRTT will generate many timeouts due to network variations consider variance as well freq. RTT TCP Timeout RTT
TCP: Retransmission and Timeouts • Picking the RTO is important: • Pick a values that’s too big and it will wait too long to retransmit a packet, • Pick a value too small, and it will unnecessarily retransmit packets. • The original algorithm for picking RTO: • EstimatedRTT = EstimatedRTT + (1 - ) SampleRTT • RTO = 2 * EstimatedRTT • Characteristics of the original algorithm: • Variance is assumed to be fixed. • But in practice, variance increases as congestion increases.
TCP: Retransmission and Timeouts • Newer Algorithm includes estimate of variance in RTT: • Difference = SampleRTT - EstimatedRTT • EstimatedRTT = EstimatedRTT + (*Difference) • Deviation = Deviation + *( |Difference| - Deviation ) • RTO = * EstimatedRTT + * Deviation • 1 • 4
Estimate the variance of RTT TCP Timeout: Initial Timeout • Estimate the average of RTT EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT • exponential weighted moving average • influence of given sample decreases exponentially fast • typical value of x: 0.125 Deviation = (1-x)*Deviation + x*|SampleRTT-EstimatedRTT| • Set initial timeout value Timeout = EstimatedRTT + 4*Deviation
An Example of Initial Timeout timeout value per packet round-trip time
TCP: Retransmission and TimeoutsKarn’s Algorithm Host A Host B Host A Host B Retransmission Retransmission Wrong RTT Sample Wrong RTT Sample Problem: How can we estimate RTT when packets are retransmitted? Solution: On retransmission, don’t update estimated RTT (and double RTO).
TL: TCP flow control enhancements • Solutions to silly window syndrome • Problem: sender sends in large blocks, but receiving application reads data 1 byte at the time • Clark (1982) • receiver avoidance • prevent receiver from advertising small windows • increase advertised receiver window by min(MSS, RecvBuffer/2)
TL: TCP flow control enhancements • Nagle’s algorithm (1984) • sender avoidance • prevent sender from unnecessarily sending small packets • http://www.rfc-editor.org/rfc/rfc896.txt • “Inhibit the sending of new TCP segments when new outgoing data arrives from the user if any previously transmitted data on the connection remains unacknowledged” • Allow only one outstanding small (not full sized) segment that has not yet been acknowledged • Works for idle connections (no deadlock) • Works for telnet (send one-byte packets immediately) • Works for bulk data transfer (delay sending)
TCP MSS • Earlier • 576 bytes for non-local destinations (other network) • 1460 bytes for local destinations (same network) • Now • 1460 butes and DF bit in IP header set • ICMP message “fragmentation required, but not permitted” triggers reduction of MSS • Workaround now • Restet DF bit to “0”