170 likes | 322 Views
Implementing High Speed TCP (aka Sally Floyd’s). Gareth Fairey & Yee-Ting Li 12 th September 2002 @ Brighton. What is High Speed TCP?. Changes the way TCP behaves at high speed (ie large cwnd) Standard TCP has two modes Slow start (not very slow…) Congestion Avoidance
E N D
Implementing High Speed TCP (aka Sally Floyd’s) Gareth Fairey & Yee-Ting Li 12th September 2002 @ Brighton
What is High Speed TCP? • Changes the way TCP behaves at high speed (ie large cwnd) • Standard TCP has two modes • Slow start (not very slow…) • Congestion Avoidance • Focuses on Congestion Avoidance Region – ie when TCP knows (thinks it knows…) how well the network behaves… • BUT only when we are at high speeds, else do what normal Standard TCP does… • Readily deployable 1st step towards Equation Based Congestion Control
What does it do? • Standard TCP uses two parameters • Increase parameter, a • Decrease parameter, b • i.e. AIMD( a,b ) • Standard TCP uses • a=1 • b=0.5 • High Speed TCP introduces • a->a(cwnd) • b->b(cwnd) • i.e. The value of a and b depends on the current congestion window size • If we increase a more with larger cwnd we can get back up to our ‘optimal’ cwnd size for the network path • If we decrease b less we don’t lose as much bandwidth due to a small congestion window
What exactly does it do? • Based on the TCP response function • Relates loss and throughput • Uses the TCP response function to investigate certain parameters • High_Window, High_Loss; largest cwnd needed for x throughput and the required loss for that throughput • Low_Window, Low_Loss; smallest cwnd when we actually switch from Standard TCP and the required loss rate for that cwnd size • High_B; the smallest decrease in b when we are at a large cwnd • Equations to transform this information into a table for a(cwnd) and b(cwnd)
Implementation of High Speed TCP • It was decided to make this a compile-time option, so a corresponding option was added to the existing kernel configuration set-up. • There turned out to be only a few changes necessary to make to the kernel source to implement this: • Code for calculating the a and b values. • The existing code for changing the congestion window size (cwnd), during the Congestion Avoidance phase only. • The following slides show some details of our initial implementation, done against kernel 2.4.16.
Changing cwnd size. • From our inspection of the source, it became apparent that this only happens in the file net/ipv4/tcp_input.c, where each case is handled in a specific function: • Increasing the cwnd following receipt of an ACK happens in the function tcp_cong_avoid (this is where a will be used). • Decreasing the cwnd happens in the function tcp_cwnd_down (this is where b will be used). • On the following slides, I will describe those changes.
Calculating suitable a and b values. • To find suitable a and b values for a given cwnd at run-time, we use a look-up table, which we populate as follows: • We defined a structure, hstcp_entry, (in the file include/net/tcp.h) to contain cwnd and the corresponding a_val and b_val. [Note: b is between 0 and 1, so it is scaled to be between 0 and 256 and that value stored instead]. • For a selection of different congestion window sizes covering the expected range, we calculated the corresponding a_val and b_val. • We defined an array (in the file net/ipv4/tcp_input.c) to contain these hstcp_entrys , ordered by cwnd. • For a given cwnd, since the entries are stored in order in the table, we can use binary search to find a suitable hstcp_entry from it. This is done in the function get_hstcp_entry, defined in the file net/ipv4/tcp_input.c
Changes to tcp_cong_avoid • To achieve additive increase of cwnd during Congestion Avoidance, the TCP needs to receive enough ACKS for a full congestion window before cwnd is incremented. • In the Linux kernel this is achieved by counting the ACKs since the last change of cwnd and only incrementing it when this counter exceeds cwnd. • In High Speed TCP, cwnd would be increased by a instead; alternatively, it could be incremented a times as often while ACKs are received.
Changes to tcp_cong_avoid • Original loop code: static inline void tcp_cong_avoid(struct tcp_opt *tp) { if (tp->snd_cwnd <= tp->snd_ssthresh) { /* In "safe" area, increase. */ if (tp->snd_cwnd < tp->snd_cwnd_clamp) tp->snd_cwnd++; } else { /* In dangerous area, increase slowly. * In theory this is tp->snd_cwnd += 1 / tp->snd_cwnd */ if (tp->snd_cwnd_cnt >= tp->snd_cwnd) { if (tp->snd_cwnd < tp->snd_cwnd_clamp) tp->snd_cwnd++; tp->snd_cwnd_cnt=0; } else tp->snd_cwnd_cnt++; } tp->snd_cwnd_stamp = tcp_time_stamp; }
Changes to tcp_cong_avoid • Changed loop code: static inline void tcp_cong_avoid(struct tcp_opt *tp) { if (tp->snd_cwnd <= tp->snd_ssthresh) { /* In "safe" area, increase. */ if (tp->snd_cwnd < tp->snd_cwnd_clamp) tp->snd_cwnd++; } else { /* In dangerous area, increase slowly. * In theory this is tp->snd_cwnd += 1 / tp->snd_cwnd */ if ((tp->snd_cwnd_cnt * get_hstcp_val(tp->snd_cwnd).a_val) >= tp->snd_cwnd) { if (tp->snd_cwnd < tp->snd_cwnd_clamp) tp->snd_cwnd++; tp->snd_cwnd_cnt=0; } else tp->snd_cwnd_cnt++; } tp->snd_cwnd_stamp = tcp_time_stamp; }
Changes to tcp_cwnd_down • Once a congestion event has occurred, the cwnd is reduced to adapt to the observed state of the network. • Traditionally, it is halved; • With High Speed TCP, it is proposed that the proportion of the decrease will depend on cwnd.
Changes to tcp_cwnd_down • Original source: static void tcp_cwnd_down(struct tcp_opt *tp) { int decr = tp->snd_cwnd_cnt + 1; tp->snd_cwnd_cnt = decr&1; decr >>= 1; if (decr && tp->snd_cwnd > tp->snd_ssthresh/2) tp->snd_cwnd -= decr; tp->snd_cwnd = min(tp->snd_cwnd, tcp_packets_in_flight(tp)+1); tp->snd_cwnd_stamp = tcp_time_stamp; }
Changes to tcp_cwnd_down • Changed source: static void tcp_cwnd_down(struct tcp_opt *tp) { int decr = tp->snd_cwnd_cnt + 1; tp->snd_cwnd_cnt = decr&1; decr = (int)((decr * get_hstcp_val(tp->snd_cwnd).b_val)>>8); if (decr && tp->snd_cwnd > tp->snd_ssthresh/2) tp->snd_cwnd -= decr; tp->snd_cwnd = min(tp->snd_cwnd, tcp_packets_in_flight(tp)+1); tp->snd_cwnd_stamp = tcp_time_stamp; }
Initial results • Implemented on a P3 450Mhz • NOT Gigabit • Patched with Web100 alpha 1.2 • Conducted tests since…. Yesterday! • WAN tests from UCL to RAL, CERN, Daresbury, Amsterdam & Manchester • Early results… Basic Analysis
What next? • Need to develop an ‘advanced test program’ to fully explore the HSTCP performance space • GUY has thorough Network Simulator Analysis of stock HSTCP – need to compare results • Need to explore the parameter space with different values of Low_Loss, Low_Window; High_Window, High_Loss; High_B • Implement /proc hooks to enable easy configuration of HSTCP parameters • Investigate into performance issues on hosts of lookup table • More results! Especially on GigE. • Expand tests to America, esp. SLAC (high delay) • Investigate into fairness compared to other TCP implementations