300 likes | 377 Views
The Internet’s architecture for managing congestion. Damon Wischik, UCL www.wischik.com/damon. Some Internet History. 1974: First draft of TCP/IP [ “A protocol for packet network interconnection” , Vint Cerf and Robert Kahn ] 1983: ARPANET switches on TCP/IP 1986: Congestion collapse
E N D
The Internet’s architecture for managing congestion Damon Wischik, UCL www.wischik.com/damon
Some Internet History • 1974: First draft of TCP/IP[“A protocol for packet network interconnection”, Vint Cerf and Robert Kahn] • 1983: ARPANET switches on TCP/IP • 1986: Congestion collapse • 1988: Congestion control for TCP[“Congestion avoidance and control”, Van Jacobson] “A Brief History of the Internet”, the Internet Society
End-to-end control Internet congestion is controlled by the end-systems.The network operates as a dumb pipe.[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981] request Server (TCP) User
End-to-end control Internet congestion is controlled by the end-systems.The network operates as a dumb pipe.[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981] Server (TCP) User data
End-to-end control Internet congestion is controlled by the end-systems.The network operates as a dumb pipe.[“End-to-end arguments in system design” by Saltzer, Reed, Clark, 1981] The TCP algorithm, running on the server, decides how fast to send data. acknowledgements Server (TCP) User data
TCP if (seqno > _last_acked) { if (!_in_fast_recovery) { _last_acked = seqno; _dupacks = 0; inflate_window(); send_packets(now); _last_sent_time = now; return; } if (seqno < _recover) { uint32_t new_data = seqno - _last_acked; _last_acked = seqno; if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0; _cwnd += _mss; retransmit_packet(now); send_packets(now); return; } uint32_t flightsize = _highest_sent - seqno; _cwnd = min(_ssthresh, flightsize + _mss); _last_acked = seqno; _dupacks = 0; _in_fast_recovery = false; send_packets(now); return; } if (_in_fast_recovery) { _cwnd += _mss; send_packets(now); return; } _dupacks++; if (_dupacks!=3) { send_packets(now); return; } _ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss)); retransmit_packet(now); _cwnd = _ssthresh + 3 * _mss; _in_fast_recovery = true; _recover = _highest_sent; } traffic rate [0-100 kB/sec] time [0-8 sec]
How TCP shares capacity individualflowbandwidths availablebandwidth sum of flowbandwidths time
Motivation: buffer size • Internet routers have buffers, to accomodate bursts in traffic. • How big do the buffers need to be? • 3 GByte? Rule of thumb—what Cisco does today • 300 MByte? [Appenzeller, Keslassy, McKeown, 2004 ] • 30 kByte? • Large buffers are unsustainable: • Data volumes double every 10 months • CPU speeds double every 18 months • Memory access speeds double every 10 years
U(x) x Motivation: TCP’s teleology[Kelly, Maulloo, Tan, 1998] • Consider several TCP flows sharing a single link • Let xr be the mean bandwidth of flow r[pkts/sec]Let y be the total bandwidth of all flows [pkts/sec]Let C be the total available capacity [pkts/sec] • TCP and the network act so as to solvemaximise årU(xr) - P(y,C) over xr0 where y=årxr P(y,C) y C
Bad teleology U(x) little extra valued attached to high-bandwidth flows severe penalty for allocating too little bandwidth x
Bad teleology flows with largeRTT are satisfied with little bandwidth U(x) flows with small RTT want more bandwidth x
Bad teleology P(y,C) no penalty unlesslinks are overloaded y C
U(x) x TCP’s teleology • The network acts as if it’s trying tosolve an optimization problem • Is this what we want the Internet to optimize? • Does it even succeed in performing the optimization? P(y,C) y C
Desynchronized TCP flows: aggregate traffic is smooth network solves the optimization SynchronizedTCP flows: aggregate traffic is bursty network oscillates about the optimum + + = Synchronization + individualflow rates + = aggregatetraffic rate time
Desynchronized TCP flows: aggregate traffic is smooth network solves the optimization SynchronizedTCP flows: aggregate traffic is bursty network oscillates about the optimum Synchronization + + individualflow rates + + = = aggregatetraffic rate time
TCP traffic model • When there are many TCP flows, the aggregate traffic rate xt varies smoothly, according to a differential equation[Misra, Gong, Towsley, 2000] • The equation involves • pt, the packet loss probability at time t, • RTT, the average round trip time aggregatetraffic rate desynchronized synchronized time
TCP if (seqno > _last_acked) { if (!_in_fast_recovery) { _last_acked = seqno; _dupacks = 0; inflate_window(); send_packets(now); _last_sent_time = now; return; } if (seqno < _recover) { uint32_t new_data = seqno - _last_acked; _last_acked = seqno; if (new_data < _cwnd) _cwnd -= new_data; else _cwnd=0; _cwnd += _mss; retransmit_packet(now); send_packets(now); return; } uint32_t flightsize = _highest_sent - seqno; _cwnd = min(_ssthresh, flightsize + _mss); _last_acked = seqno; _dupacks = 0; _in_fast_recovery = false; send_packets(now); return; } if (_in_fast_recovery) { _cwnd += _mss; send_packets(now); return; } _dupacks++; if (_dupacks!=3) { send_packets(now); return; } _ssthresh = max(_cwnd/2, (uint32_t)(2 * _mss)); retransmit_packet(now); _cwnd = _ssthresh + 3 * _mss; _in_fast_recovery = true; _recover = _highest_sent; } traffic rate [0-100 kB/sec] time [0-8 sec]
Queue model • How does packet loss probability ptdepend on buffer size? • There are two families of answers, depending on queueing delay: • Small buffers (queueing delay «RTT) • Large buffers (queueing delay RTT)
Small buffers As the optical fibre’s line rate increases • queue size fluctuates more and more rapidly • queue size distribution does not change(it depends only on link utilization, not on line rate) queueing delay19 ms queueing delay1.9 ms queueing delay0.19 ms queue size[0-15 pkt] time [0-5 sec]
Large buffers (queueing delay 200 ms) • When xt<Cthe queue size is small (C=line rate) • No packet drops, so TCP increases xt queue size[0-160 pkt] time [0-10 sec]
Large buffers (queueing delay 200 ms) • When xt<Cthe queue size is small (C=line rate) • No packet drops, so TCP increases xt • When xt>C the queue fills up and packets begin to get dropped queue size[0-160 pkt] time [0-10 sec]
Large buffers (queueing delay 200 ms) • When xt<Cthe queue size is small (C=line rate) • No packet drops, so TCPs increases xt • When xt>C the queue fills up and packets begin to get dropped • TCPs may ‘overshoot’, leading to synchronization queue size[0-160 pkt] time [0-10 sec]
Large buffers (queueing delay 200 ms) • Drop probability depends onboth traffic rate xt and queue size qt queue size[0-160 pkt] time [0-10 sec]
Analysis • Write down differential equations • for aggregate TCP traffic rate xt • for queue dynamics and loss prob pttaking account of buffer size • Calculate • average link utilization • average queue occupancy/delay • extent of synchronizationand consequent loss of utilization, and jitter[Gaurav Raina, PhD thesis, 2005]
Stability/instability analysis • For some values of C*RTT, the dynamical system is stable • For others it is unstable and there are oscillations(i.e. the flows are partially synchronized) • When it is unstable, we can calculate the amplitude of the oscillations trafficrate xt/C time
Instability plot traffic intensity x/C extent ofoscillationsin x/C TCP throughput equation log10 ofpkt lossprobability p queue equation
Instability plot traffic intensity x/C C*RTT=4pkts log10 ofpkt lossprobability p C*RTT=20 pkts C*RTT=100 pkts
Alternative buffer-sizing rules Intermediate buffers buffer = bandwidth*delay / sqrt(#flows)orLarge buffers buffer = bandwidth*delay Large buffers with AQM buffer=bandwidth*delay*{¼,1,4} Small buffers buffer={10,20,50} pkts Small buffers, ScalableTCP buffer={50,1000} pkts[Vinnicombe 2002] [T.Kelly 2002]
Conclusion • The network acts to solve an optimization problem. • We can choose which optimization problem, by choosing the right buffer size & by changing TCP’s code. • It may or may notattain the solution • In order to make sure the network is stable,we need to choose the buffer size & TCP code carefully.
Prescription • ScalableTCP in end-systemsneed to persuade Microsoft, Linus • Much smaller buffers in routersneed to persuade BT/AT&T ScalableTCP gives more weight to high-bandwidth flows. And it’s been shown to be stable. With small buffers,the network likes to run with slightly lower utilization, hence lower delay P(y,C) U(x) x y C