350 likes | 592 Views
The Transport Layer: TCP and UDP. Chap 2. Basic Philosophy of TCP/IP. Simple core, complex edge Edge can be hosts, edge routers, network boundaries, etc. Why? Scalable, flexibility for different complexities at edge. Internet Architecture.
E N D
Basic Philosophy of TCP/IP • Simple core, complex edge • Edge can be hosts, edge routers, network boundaries, etc. • Why? Scalable, flexibility for different complexities at edge
Internet Architecture • Mesh of separate networks connected at exchange points • Tier 1 providers carry full Internet routing tables no defaults • Tier 2+ providers carry subset and point to upstream default
IPv4, IPv6 Header Format IPv4 Header Format IPv6 Header Format
Extension Headers • Extension Header Order • IPv6 header • Hop-by-Hop Options header • Destination Options header • Processing option for node indicated by IPv6 Destination Address & Routing header’s list • Routing header • Fragment header • Authentication header • Encapsulating Security Payload header • Destination Options header • Only by the final destination of the packet • Upper-layer header
Text Representation of Address • X:X:X:X:X:X:X:X (X: Hexadecimal) ex) FEDC:BA98:7654:3210:FEDC:BA98:7654:3210 • In order to make writing addresses containing zero bits easier a special syntax is available to compress the zeros. ex) 1080:0:0:0:8:800:200C:417A -> 1080::8:800:200C:417A = a unicast addr. FF01:0:0:0:0:0:0:101 -> FF01::101 = a multicast addr. 0:0:0:0:0:0:0:1 -> ::1 = loopback addr. 0:0:0:0:0:0:0:0 -> :: = unspecified addr. • X:X:X:X:X:X:d.d.d.dA mixed environment of IPv4 and IPv6.ex) 0:0:0:0:0:0:13.1.68.3 0:0:0:0:0:FFFF:129.144.52.38
Buffer Size and Limitation • Max. size of IP datagram • IPv4: 65535 bytes including header(20 bytes) • IPv6: 65535 bytes(payload) + 40 bytes(header) • MTU (Max. Transmission Unit) • Network이 전달해 줄 수 있는 최대 payload 크기(Ethernet에서 1500 bytes) • path MTU: the smallest MTU in the path between two hosts • Fragmentation • is performed if datagram size > link MTU • In IPv4: by host or router, IPv6: only by host • DF bit in IPv4 header • may be used for path MTU discovery • Min. reassembly buffer size (guaranteed by any implementation) • 576 bytes in IPv4, 1500 bytes in IPv6 • MSS (Max. Segment Size): max TCP payload size to avoid IP fragmentation • In Ethernet, MSS = MTU(1500) – IP header(20 or 40) – TCP header(20) = 1460 B (IPv4) or 1440 B (IPv6)
Source Port Destination Port Length UDP Checksum Data UDP • UDP • connection-less • datagram service • lack of reliability • No flow control, no congestion control • Support multicasting • No overhead like TCP • UDP user datagram format
TCP Overview • Connection-oriented • Byte-stream • sending process writes some number of bytes • TCP breaks into segments and sends via IP • receiving process reads some number of bytes • Full duplex • Flow control: keep sender from overrunning receiver • Congestion control: keep sender from overrunning network
TCP Fundamental Objectives • Deliver data in sequence to receiver • Fill pipe from sender to receiver • avoid congestion at receiver and in network • ACK-driven • Sending rate clocked to arrival of ACKs • Impacted by • Large (bandwidth * delay) networks need to fill pipe • Packet loss recover and keep the pipe full
Establishment: Three-Way Handshake Termination TCP Connection Setup and Teardown
Sliding Window • Each byte has a sequence number • ACKs are cumulative • Sending side LastByteAcked LastByteSent LastByteWritten • Bytes between LastByteAcked and LastByteWritten must be buffered • Receiving side NextByteRead < NextByteExpected LastByteRcvd + 1 • Bytes between NextByteRead and LastByteRcvd must be buffered NextByteRead
Bandwidth T1 (1.5Mbps) Ethernet (10Mbps) T3 (45Mbps) FDDI (100Mbps) STS-3 (155Mbps) STS-12 (622Mbps) STS-24 (1.2Gbps) Time Until Wrap Around 6.4 hours 57 minutes 13 minutes 6 minutes 4 minutes 55 seconds 28 seconds Delay x Bandwidth (RTT = 100ms) 18KB 122KB 549KB 1.2MB 1.8MB 7.4MB 14.8MB Keeping the Pipe Full • Wrap Around: 32-bit SequenceNum • Bandwidth & Time Until Wrap Around • Bytes in Transit: 16-bit AdvertisedWindow (< 64KB) • Bandwidth & Delay x Bandwidth Product
RTO Estimation for Adaptive Retransmission Jacobson/Karels Algorithm • New calculation for average RTT Diff = SampleRTT - EstimatedRTT EstimatedRTT = EstimatedRTT + ( x Diff) Deviation = Deviation + (|Diff|- Deviation) where is a fraction between 0 and 1 (1/8) • Consider variance when setting timeout value RTO = x EstimatedRTT + x Deviation where = 1 and = 4
TCP Extensions • Implemented as header options • Store timestamp in outgoing segments • Use 32-bit timestamp to extend sequence space (PAWS) • Shift (scale) advertised window
TCP Congestion Control • Congestion control prevents a sender from overrunning the capacity of the network (e.g. links and routers) • TCP adapts sender's rate to network capacity and attempts to avoid potential congestion situations • Basic congestion control mechanisms that TCP supports are: • Slow start • Congestion avoidance • Fast retransmission • Fast recovery
Old TCP would "blast" a full advertised-window's worth of segments into the network at connection startup thus overrunning buffers in the routers, links and hosts TCP Slow-Start avoids this problem by sending a few packets at the beginning, waiting for the ACKs and then gradually increasing the number of packets sent into the network Slow-Start invoked at connection setup (initial window), connection restart after a long idle period (restart window) or at connection restart after a retransmit timeout (loss window) TCP Slow-Start
TCP Congestion Control • TCP probes for congestion by sending more packets into the network until a timeout occurs or duplicate ACK is received • If congestion occurs, the TCP sender(s) must reduce the amount of data sent into the network • Congestion avoidance operation: • Define a new state variable at the sender, Slow-Start Threshold (SSTHRESH) • When TCP detects congestion (time-out or duplicate ACK), set SSTHRESH=one-half of current window-size and set CWND=1 (if time-out occurred) • TCP then Slow-Starts (exponential increase) up to SSTHRESH and then increases window size by at most one segment per round-trip time(MSS*MSS/CWND). This is a linear increase • TCP Slow-Start and congestion avoidance are implemented together
TCP Fast Retransmit and FastRecovery • If the TCP receiver receives a segment out of order, it will resend the ACK (duplicate ACK) of the last correctly received segment • Fast retransmission operation • If the TCP sender receives three duplicate ACKs in row, then this is a strong indication that the segment was lost • TCP sender will retransmit lost segment • This avoids having to wait for a time-out to resend the lost segment • Fast recovery operation • The fact that the TCP receiver is generating duplicate ACKs means that other segments have been received. This suggests that data is continuing to flow between the TCP sender and receiver • TCP sender is allowed to send one segment per duplicate ACK even if this exceeds the current window-size • After fast retransmission, TCP performs congestion avoidance instead of slow-start • This avoids throughput reduction associated with initial slow-start
TCP Performance Objectives • Fill pipe with as many outstanding segments as possible before receiving an ACK • Delay(or RTT) x Bandwidth(or Throuhtput) <= 64KB • Minimize time in or avoid slow-start altogether • Recover from packet loss(es) and maintain ACK clock without experiencing RTO
TIME_WAIT State • MSL: maximum segment lifetime • 30 sec in BSD-derived implementation • 2 min in RFC 1122 • Reason for the TIME-WAIT state(waiting for 2MSL) • to implement TCP’s full-duplex connection termination reliably • termination 중에 lost, duplicated packet 문제를 해결하기 위해 • to allow old duplicate segments to expire in the network • 같은 host의 같은 port로 연결되는 next TCP connection에 그전 session의 packet을 제거하기 위해
Association • Association • {protocol, local address, local port, foreign address, foreign port} • Socket = {address, port} • Notation: <IP address>:<port number> • Socket pair in TCP • uniquely identify every TCP connection in the Internet • (local IP address, local TCP port, foreign IP address, foreign TCP port) • Wildcard local address (e.g. *:21) • any choice of local addresses (INADDR_ANY)
local local local local local local foreign foreign foreign foreign foreign foreign 0 206.168.112.219 0 206.168.112.219 0 12.106.32.254 ? 12.106.32.254 ? ? 12.106.32.254 206.168.112.219 1500 21 1500 21 21 21 21 ? 1500 ? 21 ? TCP Port Numbers and Concurrent Servers (1)
local local local local local foreign foreign foreign foreign foreign 12.106.32.254 206.168.112.219 0 12.106.32.254 206.168.112.219 0 12.106.32.254 12.106.32.254 206.168.112.219 206.168.112.219 21 21 21 1501 1500 21 21 1501 0 1500 TCP Port Numbers and Concurrent Servers (2)
TCP Output • Successful return from write means that we can reuse application buffer • TCP must keep a copy of our data in the socket send buffer until ACK is received.
UDP Output • Successful return from write means that the datagram or fragments of the datagram have been added to the datalink output queue
Standard Internet Services • See /etc/services • Services running on TCP or UDP • Distinguished by protocol and port number • A Service is a symbolic representation of port number