490 likes | 641 Views
EECS 122: Introduction to Computer Networks Congestion Control. Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776. Today’s Lecture: 10. 2. 17, 18, 19. Application. 10 , 11. 6. Transport.
E N D
EECS 122: Introduction to Computer Networks Congestion Control Computer Science Division Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720-1776
Today’s Lecture: 10 2 17, 18, 19 Application 10, 11 6 Transport 14, 15, 16 Network (IP) 7, 8, 9 21, 22, 23 Link Physical 25
Big Picture • Where do IP routers belong? Communication Network SwitchedCommunication Network BroadcastCommunication Network Packet-SwitchedCommunication Network Circuit-SwitchedCommunication Network Virtual Circuit Network Datagram Network
Packet (Datagram) Switching Properties • Expensive forwarding • Forwarding table size depends on number of different destinations • Must lookup in forwarding table for every packet • Robust • Link and router failure may be transparent for end-hosts • High bandwidth utilization • Statistical multiplexing • No service guarantees • Network allows hosts to send more packets than available bandwidth congestion dropped packets
Virtual Circuit (VC) Switching • Packets not switched independently • Establish virtual circuit before sending data • Forwarding table entry • (input port, input VCI, output port, output VCI) • VCI – Virtual Circuit Identifier • Each packet carries a VCI in its header • Upon a packet arrival at interface i • Input port uses i and the packet’s VCI v to find the routing entry (i, v, i’, v’) • Replaces v with v’ in the packet header • Forwards packet to output port i’
1 11 5 7 VC Forwarding: Example in-VCI out-VCI out in … … … … in-VCI out-VCI out in 1 7 4 1 … … … … … … … … destination source 3 5 4 11 … … … … 1 1 2 2 1 1 3 3 2 2 4 4 3 3 4 4 1 1 2 2 3 3 4 4 in-VCI out-VCI out in … … … … 2 11 3 7 … … … …
VC Forwarding (cont’d) • A signaling protocol is required to set up the state for each VC in the routing table • A source needs to wait for one RTT (round trip time) before sending the first data packet • Can provide per-VC QoS • When we set the VC, we can also reserve bandwidth and buffer resources along the path
VC Switching Properties • Less expensive forwarding • Forwarding table size depends on number of different circuits • Must lookup in forwarding table for every packet • Much higher delay for short flows • 1 RTT delay for connection setup • Less Robust • End host must spend 1 RTT to establish new connection after link and router failure • Flexible service guarantees • Either statistical multiplexing or resource reservations
Circuit Switching • Packets not switched independently • Establish circuit before sending data • Circuit is a dedicated path from source to destination • E.g., old style telephone switchboard, where establishing circuit means connecting wires in all the switches along path • E.g., modern dense wave division multiplexing (DWDM) form of optical networking, where establishing circuit means reserving an optical wavelength in all switches along path • No forwarding table
Circuit Switching Properties • Cheap forwarding • No table lookup • Much higher delay for short flows • 1 RTT delay for connection setup • Less robust • End host must spend 1 RTT to establish new connection after link and router failure • Must use resource reservations
Summary • Routers • Key building blocks of today a network in general, and Internet in particular • Main functionalities implemented by a router • Packet forwarding • Buffer management • Packet scheduling • Packet classification • Forwarding techniques • Datagram (packet) switching • Virtual circuit switching • Circuit switching
Starting New Lecture Congestion Control
What We Know We know: • How to process packets in a switch • How to route packets in the network • How to send packets reliably We don’t know: • How fast to send
What’s at Stake? • Send too slow: link is not fully utilized • wastes time • Send too fast: link is fully utilized but.... • queue builds up in router buffer (delay) • overflow buffers in routers • overflow buffers in receiving host (ignore) • Why are buffer overflows a problem? • packet drops (mine and others) • Interesting history....(Van Jacobson rides to the rescue)
A B Abstract View • We ignore internal structure of router and model it as having a single queue for a particular input-output pair Receiving Host Sending Host Buffer in Router
Three Congestion Control Problems • Adjusting to bottleneck bandwidth • Adjusting to variations in bandwidth • Sharing bandwidth between flows
A B Single Flow, Fixed Bandwidth • Adjust rate to match bottleneck bandwidth • without any a priori knowledge • could be gigabit link, could be a modem 100 Mbps
A B Single Flow, Varying Bandwidth • Adjust rate to match instantaneous bandwidth • assuming you have rough idea of bandwidth BW(t)
A1 B1 100 Mbps A2 B2 A3 B3 Multiple Flows Two Issues: • Adjust total sending rate to match bandwidth • Allocation of bandwidth between flows
Reality Congestion control is a resource allocation problem involving many flows, many links, and complicated global dynamics
General Approaches • Send without care • many packet drops • not as stupid as it seems • Reservations • pre-arrange bandwidth allocations • requires negotiation before sending packets • low utilization • Pricing • don’t drop packets for the high-bidders • requires payment model
General Approaches (cont’d) • Dynamic Adjustment • probe network to test level of congestion • speed up when no congestion • slow down when congestion • suboptimal, messy dynamics, simple to implement • All three techniques have their place • but for generic Internet usage, dynamic adjustment is the most appropriate • due to pricing structure, traffic characteristics, and good citizenship
TCP Congestion Control • TCP connection has window • controls number of unacknowledged packets • Sending rate: ~Window/RTT • Vary window size to control sending rate
Congestion Window (cwnd) • Limits how much data can be in transit • Implemented as # of bytes • Described as # packets in this lecture MaxWindow = min(cwnd, AdvertisedWindow) EffectiveWindow = MaxWindow – (LastByteSent – LastByteAcked) MaxWindow LastByteAcked EffectiveWindow LastByteSent sequence number increases
Two Basic Components • Detecting congestion • Rate adjustment algorithm • depends on congestion or not • three subproblems within adjustment problem • finding fixed bandwidth • adjusting to bandwidth variations • sharing bandwidth
Detecting Congestion • Packet dropping is best sign of congestion • delay-based methods are hard and risky • How do you detect packet drops? ACKs • TCP uses ACKs to signal receipt of data • ACK denotes last contiguous byte received • actually, ACKs indicate next segment expected • Two signs of packet drops • No ACK after certain time interval: time-out • Several duplicate ACKs (ignore for now)
Rate Adjustment • Basic structure: • Upon receipt of ACK (of new data): increase rate • Upon detection of loss: decrease rate • But what increase/decrease functions should we use? • Depends on what problem we are solving
Problem #1: Single Flow, Fixed BW • Want to get a first-order estimate of the available bandwidth • Assume bandwidth is fixed • Ignore presence of other flows • Want to start slow, but rapidly increase rate until packet drop occurs (“slow-start”) • Adjustment: • cwnd initially set to 1 • cwnd++ upon receipt of ACK
Slow-Start • cwnd increases exponentially: cwnd doubles every time a full cwnd of packets has been sent • Each ACK releases two packets • Slow-start is called “slow” because of starting point segment 1 cwnd = 1 cwnd = 2 segment 2 segment 3 cwnd = 3 cwnd = 4 segment 4 segment 5 segment 6 segment 7 cwnd = 8
Problems with Slow-Start • Slow-start can result in many losses • roughly the size of cwnd ~ BW*RTT • Example: • at some point, cwnd is enough to fill “pipe” • after another RTT, cwnd is double its previous value • all the excess packets are dropped! • Therefore, need a more gentle adjustment algorithm once have rough estimate of bandwidth
Problem #2: Single Flow, Varying BW • Want to be able to track available bandwidth, oscillating around its current value • Possible variations: (in terms of RTTs) • multiplicative increase or decrease: cwnd a*cwnd • additive increase or decrease: cwnd cwnd + b • Four alternatives: • AIAD: gentle increase, gentle decrease • AIMD: gentle increase, drastic decrease • MIAD: drastic increase, gentle decrease (too many losses) • MIMD: drastic increase and decrease
Problem #3: Multiple Flows • Want steady state to be “fair” • Many notions of fairness, but here all we require is that two identical flows end up with the same bandwidth • This eliminates MIMD and AIAD • AIMD is the only remaining solution!
A B Buffer and Window Dynamics x C = 50 pkts/RTT • No congestion x increases by one packet/RTT every RTT • Congestion decrease x by factor 2
AIMD Sharing Dynamics x A B y D E • No congestion rate increases by one packet/RTT every RTT • Congestion decrease rate by factor 2 Rates equalize fair share
AIAD Sharing Dynamics x A B y D E • No congestion x increases by one packet/RTT every RTT • Congestion decrease x by 1
C x A B y D E Limit rates: x = y AIMD
C x A B y D E Limit rates: x and y depend on initial values AIAD
Implementing AIMD • After each ACK • increment cwnd by 1/cwnd (cwnd += 1/cwnd) • as a result, cwnd is increased by one only if all segments in a cwnd have been acknowledged • But need to decide when to leave slow-start and enter AIMD • use ssthresh variable
Slow Start/AIMD Pseudocode Initially: cwnd = 1; ssthresh = infinite; New ack received: if (cwnd < ssthresh) /* Slow Start*/ cwnd = cwnd + 1; else /* Congestion Avoidance */ cwnd = cwnd + 1/cwnd; Timeout: /* Multiplicative decrease */ ssthresh = cwnd/2; cwnd = 1;
The big picture (with timeouts) cwnd Timeout Timeout AIMD AIMD ssthresh SlowStart SlowStart SlowStart Time
Congestion Detection Revisited • Wait for Retransmission Time Out (RTO) • RTO kills throughput • In BSD TCP implementations, RTO is usually more than 500ms • the granularity of RTT estimate is 500 ms • retransmission timeout is RTT + 4 * mean_deviation • Solution: Don’t wait for RTO to expire
segment 1 cwnd = 1 Fast Retransmits • Resend a segment after 3 duplicate ACKs • a duplicate ACK means that an out-of sequence segment was received • Notes: • ACKs are for next expected packet • packet reordering can cause duplicate ACKs • window may be too small to get enough duplicate ACKs ACK 2 cwnd = 2 segment 2 segment 3 ACK 3 ACK 4 cwnd = 4 segment 4 segment 5 segment 6 segment 7 ACK 4 3 duplicate ACKs ACK 4 ACK 4
Fast Recovery: After a Fast Retransmit • ssthresh = cwnd / 2 • cwnd = ssthresh • instead of setting cwnd to 1, cut cwnd in half (multiplicative decrease) • for each dup ack arrival • dupack++ • MaxWindow = min(cwnd + dupack, AdvWin) • indicates packet left network, so we may be able to send more • receive ack for new data (beyond initial dup ack) • dupack = 0 • exit fast recovery • But when RTO expires still do cwnd = 1
Fast Retransmit and Fast Recovery cwnd • Retransmit after 3 duplicated acks • Prevent expensive timeouts • Reduce slow starts • At steady state, cwnd oscillates around the optimal window size AI/MD Slow Start Fast retransmit Time
TCP Congestion Control Summary • Measure available bandwidth • slow start: fast, hard on network • AIMD: slow, gentle on network • Detecting congestion • timeout based on RTT • robust, causes low throughput • Fast Retransmit: avoids timeouts when few packets lost • can be fooled, maintains high throughput • Recovering from loss • Fast recovery: don’t set cwnd=1 with fast retransmits
Issues to Think About • What about short flows? (setting initial cwnd) • most flows are short • most bytes are in long flows • How does this work over wireless links? • packet reordering fools fast retransmit • loss not always congestion related • High speeds? • to reach 10gbps, packet losses occur every 90 minutes! • Why are losses bad? • Tornado codes: can reconstruct data proportional to packets that get through. Why not send at maximal rate? • Fairness: how do flows with different RTTs share link?
Bonus Question • Why is TCP like Blanche Dubois? • Because it “relies on the kindness of strangers...” • What happens if not everyone cooperates?