970 likes | 1.04k Views
This chapter explores the Reliable Stream Transport Service provided by the TCP protocol, discussing its properties, methods for ensuring reliability, and its role in the TCP/IP protocol suite. It covers features like virtual circuit connections, buffered transfer, full-duplex communication, sliding windows, and TCP implementation details.
E N D
Reliable Stream Transport Service (TCP) Chapter 12
We’ve looked at • Unreliable connectionless packet delivery service • And the IP protocol that defines it • Now we will examine • Reliable stream delivery • And the Transmission Control Protocol that defines it • TCP is presented as a part of TCP/IP • Is independent, general purpose protocol • Can be adapted for use with other delivery systems
Need for Stream Delivery • At low levels, have unreliable packets • Lost, destroyed, discarded, duplicated, delayed • Size constraints affect efficient transfer • Applications need to send lots of data • Unreliability is tedious and annoying • Programmers must worry about errors • Goal of network protocol research • General purpose reliable stream delivery method
Properties of the Service • Interface between applications and TCP/IP has five characteristic features: • Stream Orientation • Sender provides stream of bits divided into bytes • Receiver is passed exact same sequence • Virtual Circuit Connection • Service provides illusion of dedicated circuit • “Call” setup from one application to the other • Two OSs talk and settle details • Continue to communicate during transfer • If error, detect and report to applications
Buffered Transfer • Applications send stream in whatever size it wants • May be as small as a single octet • Protocol software wants efficient transfer • Small blocks of data: buffer until get enough for a datagram • Large blocks of data: break into smaller pieces • Push mechanism • When transfer needs to happen before buffer is full • Application invokes a push • Data generated until then is sent immediately • At receiving end, is delivered without delay • Protocol software may divide stream in unexpected ways
Unstructured Stream • Applications cannot mark record boundaries • Must agree that stream service will be unstructured • Full Duplex Connection • Connections allow concurrent transfer both ways • Appears as two independent streams in opposite directions • Can terminate one direction without affecting other • Control information can be piggybacked on data
Providing Reliability • Want reliable transfer out of unreliable packet delivery system • Most reliable protocols use a single technique • Positive acknowledgement with retransmission • Recipient must send ACK message as it gets data • Sender keeps record of each packet sent • If timer expires for an ACK, retransmits packet
Can also have duplicate packets • Network delays may cause premature retransmission • Both packets and ACKs can be duplicated • Usually solve by assigning sequence numbers • Receiver must remember which sequence numbers received • ACKs include the sequence numbers as well
Sliding Windows • Sending one packet and waiting for ACK wastes time • Full duplex circuit; have lots of idle time • Sliding window technique used • More complex form of positive ack & retrans • Use bandwidth more efficiently • Sender transmits multiple packets before ACK
Number of unacknowledged packets limited by window size • Performance depends upon window size • Size of 1: same as simple positive ack protocol • Increase size with goal of sending packets as fast as the network can handle • Conceptually, separate timer for each packet • Only unack’ed packets are retransmitted • Receiver has a similar window
TCP • Is a communication protocol • NOT a piece of software • TCP is the standard • Various TCP software implements the standard • Standard includes: • Format of data and acknowledgments • Procedures for reliability • Distinguish multiple destinations on a machine • Error recovery procedures • Initiation and closing a TCP stream transfer
Standard does not include: • Details of application/TCP interface • Not discuss exact procedures to invoke for operations • Not specified for flexibility • TCP usually implemented in OS • Can use whatever interface given OS provides • Single specification for variety of machines • TCP assumes little about underlying system • Can be used with variety of packet delivery systems (including IP) • Dialup lines; LAN; high speed fiber; low speed WAN
Ports, Connections, & Endpoints • TCP resides above IP in the layering scheme
Multiple applications can communicate concurrently • Multiplexes and demultiplexes incoming msgs • Uses port numbers (like UDP discussion) • TCP ports more complex • Using the connection abstraction • Objects are virtual circuits, not ports • Connections identified by a pair of endpoints • Endpoint is pair of integers: (host, port) • host is IP address for a host • port is TCP port on that host
Pair of endpoints defines connection (128.9.0.32, 1184) and (128.10.2.3, 53) • A single TCP port can be shared by multiple connections on the same machine (128.2.254.139, 1012) and (128.10.2.3, 53) • No ambiguity • Incoming messages associated with connection, not port • Both endpoints used to identify appropriate connection • Makes things easier for programmers • Can provide concurrent service without unique ports • Example: Email • Multiple computers can send mail concurrently • Accepting program needs only one TCP port
Passive & Active Opens • TCP is connection-oriented • Both endpoints must agree to participate • Passive open • Application at one end tells OS it will accept connection • OS assigns a TCP port number for its end • Active open • Done by application wishing to connect • Tells OS to establish a connection • Two TCP modules communicate • Establish and verify the connection; then pass data
Segments, Streams, & Sequence Numbers • TCP views the data stream in segments • Segment contains sequence of octets • Usually each segment in one IP datagram • Two important problems: • Efficient transmission • Good use of available network • Flow control • End-to-end problem • Cannot overflow the receiver’s buffer
Special sliding window protocol used • Solves both problems • Octets of the data stream are numbered sequentially • 1st pointer: sent and ACKed vs sent and not ACKed • 2nd pointer: end of window • 3rd pointer: boundary between sent and unsent 1 3 2
Receiver maintains a similar window • Full duplex: SW at each end maintains 2 windows • Also allows window size to vary over time • Each ACK has window advertisement • Tells how many more octets willing to accept • Increased advertisement: • Sender can increase size of sliding window, send more • Decreased advertisement: • Sender decreases size of sliding window, stop at boundary • Extreme case: sends advertisement of zero, stops all
This provides flow control • Essential in internet environment • Two independent flow problems: • End-to-end • Minicomputer communicating with mainframe • Intermediate systems • Routers need to control flow, too • Overloaded router condition is congestion • No explicit congestion control mechanism; uses sliding window • Good TCP implementation can detect & recover • Poor implementation can make it worse
TCP Segment Format • Unit of TCP/IP sw transfer is segment • Establish connections • Transfer data • Send ACKs • May piggyback on a segment carrying data • Advertise window size • Close connections
Out of Band Data • Out of Band • Data sent without waiting for octets in the stream to be consumed by the receiver • Ex: to interrupt or abort a program • Use urgent bit and URGENT POINTER field • This data is consumed first, regardless of stream position
Maximum Segment Size Option • Not all segments will be of same size • But, must agree on a maximum size • Uses OPTIONS field • Can specify MSS (maximum segment size) • If on same network, may use size such that resulting datagrams match network MTU • If not, will attempt to discover the minimum MTU along the path • Or use 536 (default datagram size, minus IP & TCP headers)
Choosing good MSS is difficult • Too large or too small are both bad • Too small: network utilization is low • Segments in datagram; datagram in frame • At least 40 octets of headers • Small amount of data gives poor utilization • Too large: large IP datagrams • Probably get fragmented somewhere • Cannot ACK partial segment • Must receive all fragments • More fragments increases probability of losing one
In theory, best MSS is when IP datagrams are as large as possible without being fragmented • Difficult to figure out: • Most implementations do not have a mechanism for doing so • Routes can change dynamically • This may change the MTU of the path • Optimum size depends on lower level headers • Segment size must be reduced to account for IP options
Window Scaling Option • WINDOW field is 16 bits • Limits max window size to 64 Kbytes • Ok in early networks • Need more for networks with large delay • Option allows a larger size • Do not need to know details….
Timestamp Option • Used to: • Help compute delay on underlying network • Handle “wrap around” sequence numbers • Process: • Sender: • Places timestamp from its clock in message • Receiver: • Copies timestamp field into ack • Allows sender to compute elapsed time
TCP Checksum • CHECKSUM contains 16-bit integer • Uses a pseudo header like UDP • Purpose is just the same • Verify segment has reached correct destination
ACKs & Retransmission • Hard to refer to datagrams or segments • Variable length segments • Retransmitted segments may have more data than original • Instead, use position in stream • Based on sequence numbers
Cumulative acknowledgement scheme • Receiver collects arriving data octets • Reconstructs stream of sender • May have to reorder segments due to delivery • Will have reconstructed zero or more octets • May have other stream pieces present but out of order • Receiver ACKs longest contiguous prefix • ACK specifies the next octet expected to be received • Adv: • ACKs easy to generate and unambiguous • Lost ACKs may not force retransmission • Disadv: • Only send info about single position in the stream
Lack of information is inefficient • Imagine window that spans 5000 octets • Starts with position 101 in the stream • Sender has sent all data in five segments • Suppose first segment got lost • Receiver sends ACK as each segment arrives • All ACKs specify octet 101 as next expected • No way to tell sender that all the other data is there • Sender has two choices upon timeout: • Send all five segments over • Send only first segment, then wait for ACK to do anything else
Timeout and Retransmission • TCP has a timer for each segment • If timer goes off before ACK received – retrans • Different algorithm than other protocols • Due to internet environment • Cannot know how quickly ACKs should come • May span one or many networks • May encounter router delays • Must accommodate vast time differences
Adaptive Retransmission Algorithm • Used to accommodate varying delays • Monitors performance of each connection • Deduces reasonable values for timeouts • As performance changes, timeout value revised • Must collect data for the algorithm • Records time each segment sent & when ACK arrives • Computes elapsed time (sample round trip time) • Get new sample; adjust average round trip time for the connection • RTT stored as weighted average (usually) • New round trip samples change the average slowly
Example: RTT = (a * Old_RTT) + ((1-a) * New_Round_Trip _Sample) where: a is the constant weighting factor; 0 <a < 1 • Choosing a value close to 1: • Weighted average only changed small amount • Immune to changes that last a short time • Choosing a value close to 0: • Weighted average responds quickly to changes in delay
Timeout value is a function of the current RTT • Early implementations used constant weighting factor, B (B > 1) • Timeout = B * RTT • Choosing a value for B is hard • Close to 1 • Timeout close to current RTT • Detects packet loss quickly • Any small delay may cause unnecessary retransmissions • Original specification recommended B=2 • Will look at better techniques for timeout
Measuring Round Trip Samples • Measuring round trip sample seems trivial • But, TCP uses cumulative acknowledgement • ACK refers to data received, not datagram that carried it • Consider a retransmission: • Form segment; put in datagram; send; timer expires • Send again in second datagram • Get ACK: for which datagram? • Called acknowledgement ambiguity
Assume ACK belongs to earliest datagram • Make estimated round trip time grow • Incorrect if the original datagram was really lost • If many lost, estimate grows arbitrarily large • Assume ACK belongs to latest datagram • Send retransmission just before ACK arrives • Decreases the timeout time • Makes things worse; more retransmissions • Estimate will eventually stabilize • RTT will be slightly less than ½ of the correct value • Every segment sent twice even though no loss occurs
Karn’s Algorithm • If associating ACK with earliest or most recent are both wrong…what to do? • Do not update on retransmitted segments • Idea known as Karn’s Algorithm • Avoids ambiguous acknowledgement problem • Simplistic implementation can be a problem • Get sharp increase in delay; do some retransmissions • Ignore ACKs for retransmissions; no new estimate
Must also use a timer backoff strategy • Compute initial timeout with round trip estimate • If timer expires and causes retransmission, increase the timeout (within a bound) • Most implementations multiply timeout by 2 • Next segment timed with new timeout • Continues backoff until send segment without retransmitting • Computes new round trip estimate • Resets timeout accordingly • Shown to work well even with high packet loss
High Variance in Delay • Computations do not respond well to wide range of variation in delay • Variation in RTT • Proportional to 1/(1-network load) • Original TCP standard estimated RTT as shown earlier • Limiting B to 2 can adapt to loads of at most 30% • 1989 spec requires estimates of both average RTT and variance • Must use variance in place of constant B
Approximations are computationally easy DIFF = SAMPLE – Old_RTT Smoothed_RTT = Old_RTT + d * DIFF DEV = Old_DEV + p (|DIFF| - Old_DEV) Timeout = Smoothed_RTT + e * DEV Where: DEV is the estimated mean deviation d is fraction between 0 & 1; controls effect on weighted average p is fraction between 0 & 1; controls effect on mean deviation e is a factor controlling how much deviation effects RT timeout (Research suggests d and p to be inverse power of 2; scales by 2n, uses integer arithmetic, and: d = 1/(23), p = 1/(22), n = 3, and e = 4 )