280 likes | 372 Views
EE 122: Transport Protocols. Kevin Lai October 16, 2002. Motivation. IP provides a weak, but efficient service model ( best-effort ) packets can be delayed, dropped, reordered, duplicated packets have limited size (why?) IP packets are addressed to a host
E N D
EE 122: Transport Protocols Kevin Lai October 16, 2002
Motivation • IP provides a weak, but efficient service model (best-effort) • packets can be delayed, dropped, reordered, duplicated • packets have limited size (why?) • IP packets are addressed to a host • how to decide which application gets which packets? • How should hosts send into the network? • every sends as fast as they can drop many packets, network is under-utilized (congestion collapse) laik@cs.berkeley.edu
Transport Protocol TCP/UDP Transport Layer • Provides more than the underlying network protocol • more reliability, in order delivery, at most once delivery • supports messages of arbitrary length • provide a way to decide which packets go to which applications (multiplexing/demultiplexing) • govern how hosts should send data to prevent congestion collapse (congestion control and avoidance) Networking Layer IP Link Layer Physical Layer laik@cs.berkeley.edu
UDP • User Datagram Protocol • minimalistic transport protocol • same best-effort service model as IP • messages can be larger than one packet, but still limited (64KB) • uses fragmentation • provides multiplexing/demultiplexing to IP • does not provide congestion control • advantage over TCP: does not increase end-to-end delay over IP • application example: video/audio streaming laik@cs.berkeley.edu
TCP • Transmission Control Protocol • reliable, in-order, and at most once delivery • messages can be of arbitrary length • provides multiplexing/demultiplexing to IP • provides congestion control and avoidance • increases end-to-end delay over IP • e.g., file transfer, chat laik@cs.berkeley.edu
Headers • IP header used for IP routing, fragmentation, error detection… • UDP header used for multiplexing/demultiplexing, error detection • TCP header used for multiplexing/demultiplexing, flow and congestion control Receiver Sender Application Application data data TCP UDP TCP UDP TCP/UDP data TCP/UDP data IP IP IP TCP/UDP data IP TCP/UDP data laik@cs.berkeley.edu
IP Header 0 4 8 16 19 31 Version HLen TOS Length • Comments • HLen – header length only in 32-bit words (5 <= HLen <= 15) • TOS (Type of Service): Differentiated Service (6 bits) Explicit Congestion Notification (ECN) (2 bits) • Length – the length of the entire datagram/segment; header + data • Flags: Don’t Fragment (DF) and More Fragments (MF) • Protocol: identifies the transport protocol • Header checksum - uses 1’s complement Identification Flags Fragment offset 20 bytes TTL Protocol Header checksum Source address Destination address Options (variable) Payload laik@cs.berkeley.edu
Fragmentation • What happens if router has to forward an IP packet that is larger than allowed by a data link layer? • Break the IP packet into smaller IP packets and provide a way to reassemble • set “more fragments” bit in all fragments but last • set the fragment offset of fragment to be offset (in 8-byte offsets) from beginning of original packet • set the packet len to be length of this fragment laik@cs.berkeley.edu
Fragmentation Issues • Sending host had better be changing the IP ID • Loose one fragment, loose them all • Reassembly is complex • requires per packet state • Only reassemble at destination • Fragmentation can be avoided using Path Maximum Transmission Unit Discovery (PMTU) • most TCP implementations use PMTU laik@cs.berkeley.edu
UDP Header 0 16 31 Destination port Source port • Source and destination ports use port address space • UDP length is UDP packet length (including UDP header and payload, but not IP header) • Optional UDP checksum is over UDP packet • why have UDP checksum in addition to IP checksum? • why not have just the UDP checksum? • why is the UDP checksum optional? UDP length UDP checksum Payload (variable) laik@cs.berkeley.edu
Port Addressing • Need to decide which application gets which packets • Solution: map each socket to a port • Client must know server’s port • separate 16-bit port address space for UDP and TCP • (src IP, src port, dst IP, dst port) uniquely identifies TCP connection • Well known ports(0-1023): everyone agrees which services run on these ports • e.g., ssh:22, http:80 • on UNIX, must be root to gain access to these ports (why?) • ephemeral ports(most 1024-65535): given to clients • e.g. chatclient gets one of these laik@cs.berkeley.edu
TCP Header 0 4 10 16 31 Destination port Source port • Sequence number, acknowledgement, and advertised window – used by sliding-window based flow control • Flags: • SYN, FIN – establishing/terminating a TCP connection • ACK – set when Acknowledgement field is valid • URG – urgent data; Urgent Pointer says where non-urgent data starts • PUSH – don’t wait to fill segment • RESET – abort connection Sequence number Acknowledgement Advertised window HdrLen Flags Checksum Urgent pointer Options (variable) Payload (variable) laik@cs.berkeley.edu
TCP Challenges • how to provide reliable, in-order, and at most once delivery? (sliding window) • need to synchronize sender and receiver (connection establishment) • e.g., exchange initial sequence numbers • prevent sender from sending too fast for receiver (flow control) • estimate RTT for flow control and timeouts • how to initially decide on sending rate (slow start) • estimate how much bandwidth is available in network (congestion avoidance) • slow down sending rate when we were sending too fast (congestion control) laik@cs.berkeley.edu
SYN, SeqNum = x SYN and ACK, SeqNum = y and Ack = x + 1 ACK, Ack = y + 1 Connection Establishment: How it works • Three-way handshake • Goal: agree on a set of parameters: the start sequence number for each side • Starting sequence numbers are random. Server Client (initiator) ActiveOpen connect() listen() PassiveOpen accept() allocatebuffer space laik@cs.berkeley.edu
Three-way Handshake: Rationale • Three-way handshare adds 1 RTT delay • Why not just start sending data immediately? • congestion control • network could be congested • SYN = 40 bytes, Data < 1500 bytes • packets which are dropped at a link waste the bandwidth of all previous links • smaller packets waste less bandwidth • SYN acts as cheap probe of network conditions laik@cs.berkeley.edu
More Rationale • protection from denial of service (1) • attacker could use one host to fake many SRC IP address (spoofing) and send many SYNs to server • server must devote resources (e.g., buffer space) for open connections • server would run out of resources and become very slow or crash • 3-way handshake requires client to reply before server allocates significant resources • protection from denial of service (2) • client and server begin connection using well-known sequence number instead of random one • attacker guesses sequence number, inserts bogus packets into stream laik@cs.berkeley.edu
Even More Rationale • protection from delayed packets • client connects to server twice in succession using the same port • a packet from the first connection is delayed and arrives during the second connection • if sequence numbers are close, old packet could be accepted laik@cs.berkeley.edu
segment 1 segment 2 segment 3 RTT (Round Trip Time) ACK 1 ACK 2 ACK 3 segment 4 segment 5 segment 6 Flow control: Window Size and Throughput wnd = 3 • Sliding-window based flow control: • Higher window higher throughput • Throughput = wnd/RTT • Remember: window size control throughput • How to determine effective window size? • How to detect packet loss?
Effective Window Size • Receiver window (MaxRcvBuf – maximum buffer size at receiver) • Sender window (MaxSendBuf – maximum buffer size at sender) AdvertisedWindow = MaxRcvBuffer – (LastByteRcvd – LastByteRead) EffectiveWindow = AdvertisedWindow – (LastByteSent – LastByteAcked) MaxSendBuffer >= LastByteWritten - LastByteAcked Sending Application Receiving Application MaxRcvBuffer MaxSendBuffer LastByteRead LastByteWritten NextByteExpected LastByteRcvd LastByteAcked LastByteSent sequence number increases sequence number increases laik@cs.berkeley.edu
Advertised Window = 0 • Sender cannot send any data receiver will not send acks receiver cannot notify sender that advertised window has grown • Solution: TCP Persist Timer • when sender gets advertised window == 0, it sets timer • if sender receives advertised window > 0, cancels timer • when timer expires, sender sends 1 byte payload to receiver • receiver must accept data 1 byte past window • receiver sends ack for byte before 1 byte • sender gets new advertised window laik@cs.berkeley.edu
Silly Window Syndrome (SWS) advWin=w advWin = w app: send 1 • Maximum Segment Size (MSS) = w • App sends of small segments and/or receiver advertises small window • causes small packets to be sent in network • small packets have high header overhead size=1 app: send w+1 app: read 1 advWin=w size=w-1 app: read w-1 advWin=w size=1 laik@cs.berkeley.edu
SWS Solution • Sender only sends if • no unacknowledged data, (Nagle’s algorithm) or • full packet to send • Receiver only sends new advertised window if • newAdvWin – oldAdvWin > min(MSS, 0.5*maxRcvBuf) laik@cs.berkeley.edu
Set timeout • If haven’t received ack by timeout, retransmit packet after last acked packet • How to set timeout? • Too long: connection has low throughput • Too short: retransmit packet that was just delayed • packet was probably delayed because of congestion • sending another packet too soon just makes congestion worse • Solution: make timeout proportional to RTT laik@cs.berkeley.edu
RTT Estimation • Use exponential averaging: SampleRTT EstimatedRTT laik@cs.berkeley.edu Time
Problem • How to differentiate between the real ACK, and ACK of the retransmitted packet Sender Receiver Sender Receiver Original Transmission Original Transmission SampleRTT ACK SampleRTT Retransmission Retransmission ACK laik@cs.berkeley.edu
Karn/Partridge Algorithm • Measure SampleRTT only for original transmissions • Exponential backoff for each retransmission, double EstimatedRTT laik@cs.berkeley.edu
Jacobson/Karels Algorithm • Problem: exponential average is not enough • one solution: use standard deviation (requires expensive square root computation) • use mean deviation instead laik@cs.berkeley.edu
Summary • IP • routing, fragmentation • UDP • Multiplexing/demultiplexing using ports • error detection • TCP • reliable, in order, at most once delivery • Connection establishment three way handshake • RTT exponential averaging and variance • Flow control based on sliding window protocol • Congestion control next lecture laik@cs.berkeley.edu