970 likes | 1.25k Views
Reliable Stream Transport Service (TCP). Chapter 12. We’ve looked at Unreliable connectionless packet delivery service And the IP protocol that defines it Now we will examine Reliable stream delivery And the Transmission Control Protocol that defines it
E N D
Reliable Stream Transport Service (TCP) Chapter 12
We’ve looked at • Unreliable connectionless packet delivery service • And the IP protocol that defines it • Now we will examine • Reliable stream delivery • And the Transmission Control Protocol that defines it • TCP is presented as a part of TCP/IP • Is independent, general purpose protocol • Can be adapted for use with other delivery systems
Need for Stream Delivery • At low levels, have unreliable packets • Lost, destroyed, discarded, duplicated, delayed • Size constraints affect efficient transfer • Applications need to send lots of data • Unreliability is tedious and annoying • Programmers must worry about errors • Goal of network protocol research • General purpose reliable stream delivery method
Properties of the Service • Interface between applications and TCP/IP has five characteristic features: • Stream Orientation • Sender provides stream of bits divided into bytes • Receiver is passed exact same sequence • Virtual Circuit Connection • Service provides illusion of dedicated circuit • “Call” setup from one application to the other • Two OSs talk and settle details • Continue to communicate during transfer • If error, detect and report to applications
Buffered Transfer • Applications send stream in whatever size it wants • May be as small as a single octet • Protocol software wants efficient transfer • Small blocks of data: buffer until get enough for a datagram • Large blocks of data: break into smaller pieces • Push mechanism • When transfer needs to happen before buffer is full • Application invokes a push • Data generated until then is sent immediately • At receiving end, is delivered without delay • Protocol software may divide stream in unexpected ways
Unstructured Stream • Applications cannot mark record boundaries • Must agree that stream service will be unstructured • Full Duplex Connection • Connections allow concurrent transfer both ways • Appears as two independent streams in opposite directions • Can terminate one direction without affecting other • Control information can be piggybacked on data
Providing Reliability • Want reliable transfer out of unreliable packet delivery system • Most reliable protocols use a single technique • Positive acknowledgement with retransmission • Recipient must send ACK message as it gets data • Sender keeps record of each packet sent • If timer expires for an ACK, retransmits packet
Can also have duplicate packets • Network delays may cause premature retransmission • Both packets and ACKs can be duplicated • Usually solve by assigning sequence numbers • Receiver must remember which sequence numbers received • ACKs include the sequence numbers as well
Sliding Windows • Sending one packet and waiting for ACK wastes time • Full duplex circuit; have lots of idle time • Sliding window technique used • More complex form of positive ack & retrans • Use bandwidth more efficiently • Sender transmits multiple packets before ACK
Number of unacknowledged packets limited by window size • Performance depends upon window size • Size of 1: same as simple positive ack protocol • Increase size with goal of sending packets as fast as the network can handle • Conceptually, separate timer for each packet • Only unack’ed packets are retransmitted • Receiver has a similar window
TCP • Is a communication protocol • NOT a piece of software • TCP is the standard • Various TCP software implements the standard • Standard includes: • Format of data and acknowledgments • Procedures for reliability • Distinguish multiple destinations on a machine • Error recovery procedures • Initiation and closing a TCP stream transfer
Standard does not include: • Details of application/TCP interface • Not discuss exact procedures to invoke for operations • Not specified for flexibility • TCP usually implemented in OS • Can use whatever interface given OS provides • Single specification for variety of machines • TCP assumes little about underlying system • Can be used with variety of packet delivery systems (including IP) • Dialup lines; LAN; high speed fiber; low speed WAN
Ports, Connections, & Endpoints • TCP resides above IP in the layering scheme
Multiple applications can communicate concurrently • Multiplexes and demultiplexes incoming msgs • Uses port numbers (like UDP discussion) • TCP ports more complex • Using the connection abstraction • Objects are virtual circuits, not ports • Connections identified by a pair of endpoints • Endpoint is pair of integers: (host, port) • host is IP address for a host • port is TCP port on that host
Pair of endpoints defines connection (128.9.0.32, 1184) and (128.10.2.3, 53) • A single TCP port can be shared by multiple connections on the same machine (128.2.254.139, 1012) and (128.10.2.3, 53) • No ambiguity • Incoming messages associated with connection, not port • Both endpoints used to identify appropriate connection • Makes things easier for programmers • Can provide concurrent service without unique ports • Example: Email • Multiple computers can send mail concurrently • Accepting program needs only one TCP port
Passive & Active Opens • TCP is connection-oriented • Both endpoints must agree to participate • Passive open • Application at one end tells OS it will accept connection • OS assigns a TCP port number for its end • Active open • Done by application wishing to connect • Tells OS to establish a connection • Two TCP modules communicate • Establish and verify the connection; then pass data
Segments, Streams, & Sequence Numbers • TCP views the data stream in segments • Segment contains sequence of octets • Usually each segment in one IP datagram • Two important problems: • Efficient transmission • Good use of available network • Flow control • End-to-end problem • Cannot overflow the receiver’s buffer
Special sliding window protocol used • Solves both problems • Octets of the data stream are numbered sequentially • 1st pointer: sent and ACKed vs sent and not ACKed • 2nd pointer: end of window • 3rd pointer: boundary between sent and unsent 1 3 2
Receiver maintains a similar window • Full duplex: SW at each end maintains 2 windows • Also allows window size to vary over time • Each ACK has window advertisement • Tells how many more octets willing to accept • Increased advertisement: • Sender can increase size of sliding window, send more • Decreased advertisement: • Sender decreases size of sliding window, stop at boundary • Extreme case: sends advertisement of zero, stops all
This provides flow control • Essential in internet environment • Two independent flow problems: • End-to-end • Minicomputer communicating with mainframe • Intermediate systems • Routers need to control flow, too • Overloaded router condition is congestion • No explicit congestion control mechanism; uses sliding window • Good TCP implementation can detect & recover • Poor implementation can make it worse
TCP Segment Format • Unit of TCP/IP sw transfer is segment • Establish connections • Transfer data • Send ACKs • May piggyback on a segment carrying data • Advertise window size • Close connections
Out of Band Data • Out of Band • Data sent without waiting for octets in the stream to be consumed by the receiver • Ex: to interrupt or abort a program • Use urgent bit and URGENT POINTER field • This data is consumed first, regardless of stream position
Maximum Segment Size Option • Not all segments will be of same size • But, must agree on a maximum size • Uses OPTIONS field • Can specify MSS (maximum segment size) • If on same network, may use size such that resulting datagrams match network MTU • If not, will attempt to discover the minimum MTU along the path • Or use 536 (default datagram size, minus IP & TCP headers)
Choosing good MSS is difficult • Too large or too small are both bad • Too small: network utilization is low • Segments in datagram; datagram in frame • At least 40 octets of headers • Small amount of data gives poor utilization • Too large: large IP datagrams • Probably get fragmented somewhere • Cannot ACK partial segment • Must receive all fragments • More fragments increases probability of losing one
In theory, best MSS is when IP datagrams are as large as possible without being fragmented • Difficult to figure out: • Most implementations do not have a mechanism for doing so • Routes can change dynamically • This may change the MTU of the path • Optimum size depends on lower level headers • Segment size must be reduced to account for IP options
Window Scaling Option • WINDOW field is 16 bits • Limits max window size to 64 Kbytes • Ok in early networks • Need more for networks with large delay • Option allows a larger size • Do not need to know details….
Timestamp Option • Used to: • Help compute delay on underlying network • Handle “wrap around” sequence numbers • Process: • Sender: • Places timestamp from its clock in message • Receiver: • Copies timestamp field into ack • Allows sender to compute elapsed time
TCP Checksum • CHECKSUM contains 16-bit integer • Uses a pseudo header like UDP • Purpose is just the same • Verify segment has reached correct destination
ACKs & Retransmission • Hard to refer to datagrams or segments • Variable length segments • Retransmitted segments may have more data than original • Instead, use position in stream • Based on sequence numbers
Cumulative acknowledgement scheme • Receiver collects arriving data octets • Reconstructs stream of sender • May have to reorder segments due to delivery • Will have reconstructed zero or more octets • May have other stream pieces present but out of order • Receiver ACKs longest contiguous prefix • ACK specifies the next octet expected to be received • Adv: • ACKs easy to generate and unambiguous • Lost ACKs may not force retransmission • Disadv: • Only send info about single position in the stream
Lack of information is inefficient • Imagine window that spans 5000 octets • Starts with position 101 in the stream • Sender has sent all data in five segments • Suppose first segment got lost • Receiver sends ACK as each segment arrives • All ACKs specify octet 101 as next expected • No way to tell sender that all the other data is there • Sender has two choices upon timeout: • Send all five segments over • Send only first segment, then wait for ACK to do anything else
Timeout and Retransmission • TCP has a timer for each segment • If timer goes off before ACK received – retrans • Different algorithm than other protocols • Due to internet environment • Cannot know how quickly ACKs should come • May span one or many networks • May encounter router delays • Must accommodate vast time differences
Adaptive Retransmission Algorithm • Used to accommodate varying delays • Monitors performance of each connection • Deduces reasonable values for timeouts • As performance changes, timeout value revised • Must collect data for the algorithm • Records time each segment sent & when ACK arrives • Computes elapsed time (sample round trip time) • Get new sample; adjust average round trip time for the connection • RTT stored as weighted average (usually) • New round trip samples change the average slowly
Example: RTT = (a * Old_RTT) + ((1-a) * New_Round_Trip _Sample) where: a is the constant weighting factor; 0 <a < 1 • Choosing a value close to 1: • Weighted average only changed small amount • Immune to changes that last a short time • Choosing a value close to 0: • Weighted average responds quickly to changes in delay
Timeout value is a function of the current RTT • Early implementations used constant weighting factor, B (B > 1) • Timeout = B * RTT • Choosing a value for B is hard • Close to 1 • Timeout close to current RTT • Detects packet loss quickly • Any small delay may cause unnecessary retransmissions • Original specification recommended B=2 • Will look at better techniques for timeout
Measuring Round Trip Samples • Measuring round trip sample seems trivial • But, TCP uses cumulative acknowledgement • ACK refers to data received, not datagram that carried it • Consider a retransmission: • Form segment; put in datagram; send; timer expires • Send again in second datagram • Get ACK: for which datagram? • Called acknowledgement ambiguity
Assume ACK belongs to earliest datagram • Make estimated round trip time grow • Incorrect if the original datagram was really lost • If many lost, estimate grows arbitrarily large • Assume ACK belongs to latest datagram • Send retransmission just before ACK arrives • Decreases the timeout time • Makes things worse; more retransmissions • Estimate will eventually stabilize • RTT will be slightly less than ½ of the correct value • Every segment sent twice even though no loss occurs
Karn’s Algorithm • If associating ACK with earliest or most recent are both wrong…what to do? • Do not update on retransmitted segments • Idea known as Karn’s Algorithm • Avoids ambiguous acknowledgement problem • Simplistic implementation can be a problem • Get sharp increase in delay; do some retransmissions • Ignore ACKs for retransmissions; no new estimate
Must also use a timer backoff strategy • Compute initial timeout with round trip estimate • If timer expires and causes retransmission, increase the timeout (within a bound) • Most implementations multiply timeout by 2 • Next segment timed with new timeout • Continues backoff until send segment without retransmitting • Computes new round trip estimate • Resets timeout accordingly • Shown to work well even with high packet loss
High Variance in Delay • Computations do not respond well to wide range of variation in delay • Variation in RTT • Proportional to 1/(1-network load) • Original TCP standard estimated RTT as shown earlier • Limiting B to 2 can adapt to loads of at most 30% • 1989 spec requires estimates of both average RTT and variance • Must use variance in place of constant B
Approximations are computationally easy DIFF = SAMPLE – Old_RTT Smoothed_RTT = Old_RTT + d * DIFF DEV = Old_DEV + p (|DIFF| - Old_DEV) Timeout = Smoothed_RTT + e * DEV Where: DEV is the estimated mean deviation d is fraction between 0 & 1; controls effect on weighted average p is fraction between 0 & 1; controls effect on mean deviation e is a factor controlling how much deviation effects RT timeout (Research suggests d and p to be inverse power of 2; scales by 2n, uses integer arithmetic, and: d = 1/(23), p = 1/(22), n = 3, and e = 4 )