330 likes | 431 Views
Experience with Loss-Based Congestion Controlled TCP Stacks. Yee-Ting Li University College London. Introduction. Transport of Data for next generation applications Network hardware is capable of Gigabits per second
E N D
Experience with Loss-Based Congestion Controlled TCP Stacks Yee-Ting Li University College London
Introduction • Transport of Data for next generation applications • Network hardware is capable of Gigabits per second • Current ‘Vanilla’ TCP not capable over long distances and high throughputs • New TCP Stacks have been introduced to rectify problem • Investigation into the performance, bottlenecks and deploy-ability of new algorithms
Transmission Control Protocol • Connection orientated • Reliable Transport of Data • Window based • Congestion and Flow Control to prevent network collapse • Provides ‘fairness’ between competing streams • 20 Years old • Originally designed for kbit/sec pipes
TCP Algorithms • Based on two algorithms to determine rate at which data is to be sent • Slowstart: probe for initial bandwidth • Congestion Avoidance: maintain a steady state transfer rate • Focus on Steady State: probe for increases in available bandwidth, whilst backing off if congestion is detected (through loss). • Maintained through a ‘congestion window’ cwnd that regulates the number of unacknowledged packets allowed on connection. • Size of window approx equals Bandwidth delay product • Determines the appropriate window size to set to obtain a bandwidth under a certain delay • Window = Bandwidth x Delay
Algorithms • Congestion Avoidance • For every packet (ack) received by sender • Cwnd cwnd + 1/cwnd • For when loss is detected (through dupacks) • Cwnd cwnd / 2 • Growth of cwnd determined by: • the RTT of the connection • When rtt is high, cwnd grows slowly (because of acking) • The loss rate on the line • High loss means that cwnd never achieved a large value • Capacity of the link • Allows for large cwnd value (when low loss)
Advantages Achieves good throughput Not changes to kernels required Disadvantages Have to manually tune the number of flows May induce extra loss on lossy networks Need to reprogram/recompile software Current Methods of Achieving High Throughput
New TCP Stacks • Modify the congestion control algorithm to improve response times • All based on modifying the cwnd growth and decrease values • Define: • a = increase of data packets per window of acks • b = decrease factor upon congestion • To maintain compatibility (and hence network stability and fairness), for small cwnd values: • Mode switch from Vanilla to New TCP
HSTCP • Designed by Sally Floyd • Determine a and b as a function of cwnd • a a(cwnd) • b b(cwnd) • Gradual improvement in throughput as we approach larger bandwidth delay products • Current implementation focused on performance upto 10Gb/sec – set linear relation between loss and throughput (response function)
Scalable TCP • Designed by Tom Kelly • Define a and b to be constant: • a: cwnd cwnd + a (per ack) • b: cwnd cwnd – b x cwnd • Intrinsic scaling property that has the same performance over any link (beyond the initial threshold) • Recommended settings • a = 1/100 • b = 1/8
H-TCP • Designed by Doug Leith and Robert Shorten • Define a mode switch so that after congestion we do normal Vanilla • After a predefined period ∆L, switch to a high performance a • ∆i≤ ∆L: a = 1 • ∆I> ∆L: a = 1 + (∆ - ∆L) + [(∆ - ∆L)/20]2 • Upon loss drop by • | [Bimax(k+1) - Bimax(k)] / Bimax(k) | > 0.2: b = 0.5 • Else: b = RTTmin/RTTmax
Implementation • All New Stacks have own implementation • Small differences between implementations means that we are comparing the kernel differences rather than just the algorithmic differences • Lead to development of ‘test platform’ kernel altAIMD • Implements all three stacks via simple sysctl switch. • Also incorporates switches for certain undesirable kernel ‘features’ • moderate_cwnd() • IFQ • Added extra features for testing/evaluation purposes • Appropriate Byte Counting (RFC3465) • Inducible packet loss (at recv) • Web100 TCP logging (cwnd etc)
UCL Manchester StarLight CERN Cisco 7600 Cisco 7600 Juniper Cisco 7600 Cisco 7600 Cisco 7600 Networks Under Test • Networks MB-NG DataTAG Bottleneck Capacity 1Gb/sec RTT 120msec Bottleneck Capacity 1Gb/sec RTT 6msec
Graph/Demo • Mode switch between stacks on constant packet drop { { { Vanilla TCP Scalable TCP HS-TCP
Comparison against theory • Response function
Self Similar Background Tests • Results skewed • Not comparing differences in TCP algorithms! • Not useful results!
SACK … • Look into what’s happening at the algorithmic level: • Strange hiccups in cwnd only correlation is SACK arrivals Scalable TCP on MB-NG with 200mbit/sec CBR Background
SACKS • Supplies the sender information about what segments the recv has • Sender infers the missing packets to resend • Aids recovery during loss and prevents timeouts • Current implementation in 2.4 and 2.6 does a walk through the entire sack list for each SACK • Very cpu intensive • Can be interrupted by arrival of next SACK which causes the SACK implementation to misbehave • Tests conducted with Tom Kelly’s SACK fast-path patch • Improves SACK processing, but still not sufficient
Periods of web100 silence due to high cpu utilization Logging done in userspace – kernel time taken up by tcp sack processing TCP resets cwnd SACK Processing overhead
Congestion Window Moderation • Linux TCP implementation adds ‘feature’ of moderate_cwnd() • Idea is to prevent large bursts of data packets under ‘dubious’ conditions • When an ACK acknowledges more than 3 packets (typically 2) • Adjusts cwnd to known number of packets ‘in-flight’ (plus extra 3 packets) • Under large cwnd sizes (high bandwidth delay products), throughput can be diminished as result
90% TCP AF moderate_cwnd(): Vanilla TCP moderate_cwnd ON moderate_cwnd OFF CWND Throughput
moderate_cwnd(): HS-TCP moderate_cwnd OFF moderate_cwnd ON 70% TCP AF 90% TCP AF
moderate_cwnd(): Scalable-TCP moderate_cwnd OFF moderate_cwnd ON 70% TCP AF 90% TCP AF
Multiple Streams Aggregate BW CoV
10 TCP Flows versus Self-Similar Background Aggregate BW CoV
10 TCP Flows versus Self-Similar Background BG Loss per TCP BW
Impact • Fairness: ratio of throughput achieved by one stack against another • Means that a fairness against vanilla tcp is defined by how much more throughput a new stacks gets more than vanilla • Doesn’t really consider deploy-ability of the stacks in real life – how does these stacks affect the existing traffic? (mostly vanilla tcp) • Redefine fairness in terms of the Impact: • Consider the affect of the background traffic only under different stacks • Vary against number of TCP Flows to determine impact(vanilla flows) throughput of n-Vanilla flows • BW impact = throughput of (n-1) Vanilla flows + 1 new TCP flow
Impact of 1 TCP Flow Throughput Throughput Impact
1 New TCP Impact CoV
Impact of 10 TCP Flows Throughput Throughput Impact
Summary • Comparison of actual TCP differences through test platform kernel • Problems with SACK implementations mean that it is difficult under loss to maintain high throughput (>500Mbit/sec) • Other problems exist with kernel implementation that hinder performance • Compare stacks under different artificial (and hence repeatable) conditions • Single stream: • Multiple stream: • Need to study over wider range of networks • Move tests onto real production environments