1 / 52

TCP [RFCs: 793, 1122, 1323, 2018, 2581]

full duplex data: bi-directional data flow in same connection MSS: maximum segment size connection-oriented: handshaking (exchange of control msgs) init’s sender, receiver state before data exchange flow controlled: sender will not overwhelm receiver. point-to-point:

fkristen
Download Presentation

TCP [RFCs: 793, 1122, 1323, 2018, 2581]

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. full duplex data: bi-directional data flow in same connection MSS: maximum segment size connection-oriented: handshaking (exchange of control msgs) init’s sender, receiver state before data exchange flow controlled: sender will not overwhelm receiver point-to-point: one sender, one receiver reliable, in-order byte steam: no “message boundaries” pipelined: TCP congestion and flow control set window size send & receive buffers TCP [RFCs: 793, 1122, 1323, 2018, 2581] ICSS420 - UDP/TCP

  2. 32 bits source port # dest port # sequence number acknowledgement number head len not used rcvr window size U A P R S F checksum ptr urgent data Options (variable length) application data (variable length) TCP segment structure URG: urgent data (generally not used) counting by bytes of data (not segments!) ACK: ACK # valid PSH: push data now (generally not used) # bytes rcvr willing to accept RST, SYN, FIN: connection estab (setup, teardown commands) Internet checksum (as in UDP) ICSS420 - UDP/TCP

  3. TCP Header Flags ICSS420 - UDP/TCP

  4. Reliability • TCP provides reliability by doing the following: • Data is broken down into chunks (the unit of information passed by TCP is called a segment) • TCP maintains a timer for each segment, waiting for the other end to acknowledge reception of the segment. If an acknowledgement is not received in time, the segment is retransmitted. • When TCP receives data from the other end of the connection, it sends an acknowledgement. • TCP maintains a checksum on its header and data. ICSS420 - UDP/TCP

  5. Sequence Numbers • Because TCP sends data in variable length segments, and because retransmitted segments can include more data than the original, ACKs cannot easily refer to datagrams or segments. • TCP sequence numbers refer to a position in the data stream. The receiver ACKs the longest contiguous prefix of a stream that has been received correctly. ACKs specify the sequence number of the next octet that the receiver expects ICSS420 - UDP/TCP

  6. Seq. #’s: byte stream “number” of first byte in segment’s data ACKs: seq # of next byte expected from other side cumulative ACK time TCP seq. #’s and ACKs Host B Host A User types ‘C’ Seq=42, ACK=79, data = ‘C’ host ACKs receipt of ‘C’, echoes back ‘C’ Seq=79, ACK=43, data = ‘C’ host ACKs receipt of echoed ‘C’ Seq=43, ACK=80 simple telnet scenario ICSS420 - UDP/TCP

  7. TCP: reliable data transfer event: data received from application above simplified sender, assuming • one way data transfer • no flow, congestion control create, send segment wait for event event: timer timeout for segment with seq # y wait for event retransmit segment event: ACK received, with ACK # y ACK processing ICSS420 - UDP/TCP

  8. TCP Data Flow • Studies of TCP traffic usually find that on a packet-count basis about half of all segments contains bulk data and the other have contain interactive data. • On a byte-count basis the ratio is around 90% bulk and 10% interactive. • TCP obviously handles both types of traffic, but different algorithms come into play for each. ICSS420 - UDP/TCP

  9. Rlogin • Each interactive keystroke normally generates a data packet. • Furthermore the remote system echoes the characters typed by the client data byte ACK of data byte echo of data byte ACK of echoed byte ICSS420 - UDP/TCP

  10. The Problem • When using rlogin, 1 byte at a time normally flows from the client to the server • This generates 41-byte packets: 20 bytes for the IP header, 20 bytes for the TCP header, and 1 byte of data • These small packets, called tinygrams, are normally not a problem on LANs, since LANs are not congested ICSS420 - UDP/TCP

  11. Delayed ACKs • Normally TCP does not send an ACK the instant it receives data • Instead it delays sending the ACK, hoping to have some data going in the same direction as the ACK, so the ACK can be sent along with the data • Most implementations use a 500ms delay • This at least piggybacks ACK with data, but it does not solve the problem of small segments ICSS420 - UDP/TCP

  12. Nagle Algorithm • A simple and elegant solution was proposed in RFC896 called the Nagle Algorithm. • This algorithm says that a TCP connection can have only one outstanding small segment that has not yet been acknowledged. • No additional small segments can be sent until the acknowledgement is received. • TCP collects small segments and sends them in a single segment when the ACK arrives. ICSS420 - UDP/TCP

  13. Nagle Algorithm • This algorithm is self-clocking: the faster the ACKs come back, the faster the data is sent. On a slow WAN fewer segments are sent. • The definition of small is less than the maximum segment size. • If the round-trip time on an Ethernet for a single byte to be sent, ACKd, and echoed is around 16ms. We would need to type at 60cps to worry about this algorithm ICSS420 - UDP/TCP

  14. Disabling Nagle • There are times when the Nagle algorithm needs to be turned off • X window mouse movement messages must be delivered without delay • Terminals that sends function keys as a sequence of characters. • Most socket based APIs use the TCP_NODELAY socket option to disable the Nagle algorithm ICSS420 - UDP/TCP

  15. ACK generation [RFC 1122, RFC 2581] TCP Receiver action delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK immediately send single cumulative ACK send duplicate ACK, indicating seq. # of next expected byte immediate ACK if segment starts at lower end of gap Event in-order segment arrival, no gaps, everything else already ACKed in-order segment arrival, no gaps, one delayed ACK pending out-of-order segment arrival higher-than-expect seq. # gap detected arrival of segment that partially or completely fills gap ICSS420 - UDP/TCP

  16. time TCP: retransmission scenarios Host A Host B Host A Host B Seq=92, 8 bytes data Seq=92, 8 bytes data Seq=100, 20 bytes data ACK=100 Seq=92 timeout timeout X ACK=100 ACK=120 Seq=100 timeout loss Seq=92, 8 bytes data Seq=92, 8 bytes data ACK=120 ACK=100 premature timeout, cumulative ACKs lost ACK scenario ICSS420 - UDP/TCP

  17. Q: how to set TCP timeout value? longer than RTT note: RTT will vary too short: premature timeout unnecessary retransmissions too long: slow reaction to segment loss Q: how to estimate RTT? SampleRTT: measured time from segment transmission until ACK receipt ignore retransmissions, cumulatively ACKed segments SampleRTT will vary, want estimated RTT “smoother” use several recent measurements, not just current SampleRTT TCP Round Trip Time and Timeout ICSS420 - UDP/TCP

  18. Simple Timeout and Retransmission • When a timeout occurs, should the timer value be modified? • Typically the timeout value is doubled for each retransmission • This doubling is referred to as exponential backoff ICSS420 - UDP/TCP

  19. RTT Measurement • Fundamental to TCP’s timeout and retransmission is the measurement of the round-trip time (RTT) experienced on a given connection • TCP should track these changes in the RTT and modify its timeout accordingly • TCP must measure the RTT between sending a byte with a particular sequence number and receiving an ACK that covers that sequence number ICSS420 - UDP/TCP

  20. Smoothed RTT Estimator • The original TCP specification had TCP update a smoothed RTT estimator • R  R + (1-)M • where  is a smoothing factor with a recommended value of 0.9 • Given the smoothed estimator, which changes as RTT changes, RFC793 recommends setting the retransmission timeout to • RTO = R ( is a delay variance factor = 2) ICSS420 - UDP/TCP

  21. Problems • There are problems with the smoothed estimator • basically it can’t keep up with wide fluctuations in the RTT, causing unnecessary retransmissions • In addition to the smoothed RTT we also need to keep track of the variances in the RTT measurements • The mean deviation is a good approximation to the standard deviation and is easier to compute ICSS420 - UDP/TCP

  22. Karn’s Algorithm • Say a packet is transmitted, timeout occurs, the RTO is backed off, and the packet is retransmitted • When the ACK comes in, which packet is it for? • When a timeout and retransmission occur, the RTT estimators cannot be updated because we don’t know which transmission the ACK corresponds to ICSS420 - UDP/TCP

  23. TCP Flow Control • TCP uses a sliding window protocol to handle ACKs and flow control. offered window (advertised by receiver) usable window 1 2 3 4 5 6 7 8 9 10 11 ... can’t send until the window moves sent and ACK’d sent, not ACK’d can send ICSS420 - UDP/TCP

  24. Movement of Window Edges closes shrinks opens • The window closes as the left edge advances to the right • The window opens when the right edge moves to the right, allowing more data to be sent • The window shrinks when the right edge moves to the left. window ICSS420 - UDP/TCP

  25. Dynamics of Sliding Windows Host A Host B win 4096 1:1025(1024) ack 1 1025:2049(1024) ack 1 2049:3073(1024) ack 1 Ack 3073, win 1024 3073:4097(1024) ack 1 Ack 4097, win 4096 ICSS420 - UDP/TCP

  26. Window Dynamics • The sender does not have to transmit a full window’s worth of data • One segment from the receiver acknowledges data and slides the window to the right (window size is relative to the acknowledged sequence number) • The size of the window can decrease, but the right edge should not move to the left • The receiver does not have to wait for a window to fill before sending an acknowledgement ICSS420 - UDP/TCP

  27. Recall:TCP sender, receiver establish “connection” before exchanging data segments initialize TCP variables: seq. #s buffers, flow control info (e.g. RcvWindow) client: connection initiator Socket clientSocket = new Socket("hostname","port number"); server: contacted by client Socket connectionSocket = welcomeSocket.accept(); Three way handshake: Step 1:client end system sends TCP SYN control segment to server specifies initial seq # Step 2:server end system receives SYN, replies with SYNACK control segment ACKs received SYN allocates buffers specifies server-> receiver initial seq. # TCP Connection Management ICSS420 - UDP/TCP

  28. TCP Open Client (Active open) Server (Passive open) SYN X SYN Y, ACK X+1 ACK Y+1 ICSS420 - UDP/TCP

  29. Closing A Connection • TCP uses a modified three-way handshake to close connections • Each direction must be shutdown independently. The rule is that either end can send a FIN when it is done sending data • Once a connection has been closed in a given direction, TCP refuses to accept more data for that direction. Data can flow in the opposite direction until the sender closes it. ICSS420 - UDP/TCP

  30. TCP Close Client Server FIN X Close Connection Inform App (eof) ACK X+1 Can be piggybacked FIN Y, ACK X+1 Close Connection ACK Y+1 ICSS420 - UDP/TCP

  31. TCP Close • The end that first issues the close performs the active close and the other end performs the passive close • Sometimes abnormal conditions arise that force an application to break a connection. • To reset a connection, one side initiates termination by sending a segment with the RST bit set. The other side responds to a reset segment by immediately aborting the connection. ICSS420 - UDP/TCP

  32. TCP Half Close • A half-close provides the ability for one end of the connection to terminate its output, while still receiving data from the other end. • Why is there a half-close? One example is the Unix rsh command, which executes a command on another system. Consider the following command: • kiev> rsh cobalt sort < datafile ICSS420 - UDP/TCP

  33. TCP Half-Close ICSS420 - UDP/TCP

  34. Half-Open Connections • A TCP connection is said to be half-open if one end has closed or aborted the connection without the knowledge of the other end. • This can happen if one of the two hosts crashes (or is simply turned off) • As long as there is not attempt to transfer data across a half-open connection, the end that is still up won't detect that the other end has crashed. ICSS420 - UDP/TCP

  35. 2MSL Wait State • Every implementation must choose a value for the maximum segment lifetime (MSL). It is the maximum amount of time any segment can exist in the network before being discarded • RFC793 specifies the MSL as 2 minutes. Common implementation values, however, are 30 seconds, 1 minute, or 2 minutes ICSS420 - UDP/TCP

  36. 2MSL Wait State • Given an MSL for an implementation, the rule is: • when TCP performs an active close, and sends the final ACK, that connection must stay in the TIME_WAIT state for twice the MSL • Any delayed packets will be discarded before the same connection can be re-established • An effect of this 2MSL wait is that while the TCP connection is in the 2MSL wait, the socket pair defining that connection cannot be reused. ICSS420 - UDP/TCP

  37. 2MSL Wait State • The client, which performs the active close, enters the 2MSL wait. This means if we terminate a client, and restart the client immediately, the new client cannot reuse the same local port number • Servers use well-known ports. If we terminate a server that has a connection established, and immediately try to restart the server, the server cannot assign its well-known port number to its end point. ICSS420 - UDP/TCP

  38. Quiet Time • The 2MSL state provides protection against delayed segments being interpreted as part of a new connection that uses the same local and foreign IP addresses and ports numbers • This works only if a host with connections in the 2MSL state does not crash • RFC793 states that TCP should not create any connections for MSL seconds after rebooting. This is called quiet time ICSS420 - UDP/TCP

  39. TCP State Machine Client transitions Server transitions ICSS420 - UDP/TCP

  40. TCP Connection Management TCP server lifecycle TCP client lifecycle ICSS420 - UDP/TCP

  41. TCP In Action • System with no active Telnet sessions • The local address is output as *.23. This means that incoming requests will be accepted on any interface • The remote address is output as *.*, which means the remote address and port are not known yet Local Address Remote Address Swind Send-Q Rwind Recv-Q State -------------------- -------------------- ----- ------ ----- ------ ------- *.23 *.* 0 0 0 0 LISTEN ICSS420 - UDP/TCP

  42. TCP In Action • Now a telnet session starts: • The second line is the ESTABLISHED connection. All four elements of the local and remote address are filled in for this connection. The local address corresponds to the interface on which the connection request arrived. Local Address Remote Address Swind Send-Q Rwind Recv-Q State -------------------- -------------------- ----- ------ ----- ------ ------- *.23 *.* 0 0 0 0 LISTEN 131.173.161.14.23 131.173.161.111.1273 4096 0 8760 0 ESTABLISHED ICSS420 - UDP/TCP

  43. TCP In Action • Now the telnet client is terminated • Here is an example of a port that appears to be half-open: Local Address Remote Address Swind Send-Q Rwind Recv-Q State -------------------- -------------------- ----- ------ ----- ------ ------- *.23 *.* 0 0 0 0 LISTEN 131.173.161.14.23 131.173.161.111.1273 4096 0 8760 0 TIME_WAIT Local Address Remote Address Swind Send-Q Rwind Recv-Q State -------------------- -------------------- ----- ------ ----- ------ ------- 131.173.161.14.58567 131.173.160.11.80 8760 0 8760 0 CLOSE_WAIT ICSS420 - UDP/TCP

  44. TCP DOS Attacks • All the denial-of-service attacks created try to stall the TCP state machine in a particular state either indefinitely or for a finite time • Look for states for which there are no timers • Consider a scenario where a host receives a TCP segment with both the SYN and the FIN bit set • Most implementations transit to the CLOSE_WAIT state • No timer ICSS420 - UDP/TCP

  45. SYN Attacks • SYN attacks (also known as SYN Flooding) take advantage of a flaw in how most hosts implement the three-way handshake • When a host receives a SYN request, it must keep track of the partially opened connection in a "listen queue" for at least 75 seconds • Many implementations can only keep track of a very limited number of connections (most track only 5 connections by default) • A malicious host can exploit the small size of the listen queue by sending multiple SYN requests to a host, but never replying to the SYN&ACK the other host sends back ICSS420 - UDP/TCP

  46. IP Spoofing • IP Spoofing is an attack where an attacker pretends to be sending data from an IP address other than its own • Many protocols and applications assume that the IP address contained in the IP header is valid • There are two catches • All communication is likely to be one-way. The remote host will send all replies to the spoofed source address -- not to the host actually doing the spoofing • An attacker needs to use the correct TCP sequence numbers if they plan on establishing a TCP connection with the attacked host (most common services, like Telnet, FTP, and r-commands use TCP) ICSS420 - UDP/TCP

  47. Sequence Guessing • The sequence number used in TCP connections is a 32 bit number, so it would seem that the odds of guessing the correct ISN are exceedingly low. • If the ISN for a connection is assigned in a predictable way, it becomes relatively easy to guess • In BSD 4.2, the ISN for a connection is assigned from a global counter. This counter is incremented by 128 each second, and by 64 after each new connection • By first establishing a real connection to the victim, the attacker can determine the current state of the system's counter. The attacker then knows that the next ISN to be assigned by the victim is quite likely to be the predetermined ISN, plus 64 ICSS420 - UDP/TCP

  48. How to Use It • When the host receiving spoofed packets completes its part of the three-way handshake, it will send a SYN&ACK to the spoofed host • This host will reject the SYN&ACK, because it never started a connection -- the host indicates this by sending a reset command (RST), and the attacker's connection will be aborted • To avoid this, the attacker can use a SYN attack to swamp the host it is imitating. The SYN&ACK sent by the attacked host will then be ignored, along with any other packets sent while the host is flooded ICSS420 - UDP/TCP

  49. Connection Hijacking • Connection hijacking exploits a "desynchronized state" in TCP communication • When the sequence number in a received packet is not the same as the expected sequence number, the connection is said to be “desynchronized” • Depending on the actual value of the received sequence number, the TCP layer may either discard or buffer the packet • When two hosts are desynchronized enough, they will discard (ignore) packets from each other. An attacker can then inject forged packets with the correct sequence numbers (and potentially modify or add commands to the communication) ICSS420 - UDP/TCP

  50. A Signature • Note that "ignored" packets may actually generate ACKs, rather than being completely ignored • When the other end receives packets with incorrect sequence numbers, it replies with an ACK packet containing the sequence number it is expecting • The receiver of these ACK discards them, as they have the wrong sequence numbers • The receiver then sends its own ACK to notify the sender • Thus, a large number of ACKs are generated in this attack ICSS420 - UDP/TCP

More Related