910 likes | 932 Views
Learn about FAST TCP, an architecture and algorithm designed to optimize network communication for high throughput and stability. Explore experimental evaluations, loss recovery techniques, and the use of MaxNet and SUPA.FAST.
E N D
FAST TCP Bartek Wydrowski Steven Low netlab.CALTECH.edu
Acks & Collaborators • Internet2 • Almes, Shalunov • Abilene GigaPoP’s • GATech, NCSU, PSC, Seattle, Washington • Cisco • Aiken, Doraiswami, McGugan, Smith, Yip • Level(3) • Fernes • LANL • Wu • Caltech • Bunn, Choe, Doyle, Hegde, Jin, Li, Low Newman, Papadoupoulous, Ravot, Singh, Tang, J. Wang, Wei, Wydrowski, Xia • UCLA • Paganini, Z. Wang • StarLight • deFanti, Winkler • CERN • Martin • SLAC • Cottrell • PSC • Mathis
Outline • Background, motivation • FAST TCP • Architecture and algorithms • Experimental evaluations • Loss recovery • MaxNet, SUPA FAST
ns-2 simulation DataTAG Network: CERN (Geneva) – StarLight (Chicago) – SLAC/Level3 (Sunnyvale) average utilization 95% 1G 27% 19% txq=100 txq=100 txq=10000 Linux TCP Linux TCP FAST capacity = 1Gbps; 180 ms round trip latency; 1 flow C. Jin, D. Wei, S. Ravot, etc (Caltech, Nov 02) Performance at large windows 10Gbps capacity = 155Mbps, 622Mbps, 2.5Gbps, 5Gbps, 10Gbps; 100 ms round trip latency; 100 flows J. Wang (Caltech, June 02)
Average Queue vs Buffer Size Dummynet • capacity = 800Mbps • Delay =200ms • 1 flows • Buffer size: 50, …, 8000 pkts (S. Hedge, B. Wydrowski, etc, Caltech)
Congestion control Example congestion measure pl(t) • Loss (Reno) • Queueing delay (Vegas) pl(t) xi(t)
pl(t) • AQM: • DropTail • RED • REM/PI • AVQ xi(t) TCP: • Reno • Vegas TCP/AQM • Congestion control is a distributed asynchronous algorithm to share bandwidth • It has two components • TCP: adapts sending rate (window) to congestion • AQM: adjusts & feeds back congestion information • They form a distributed feedback control system • Equilibrium & stability depends on both TCP and AQM • And on delay, capacity, routing, #connections
ACK: W W + 1/W Loss: W W – 0.5W • Packet level • Flow level • Equilibrium • Dynamics pkts Packet & flow level Reno TCP (Mathis formula)
Reno TCP • Packet level • Designed and implemented first • Flow level • Understood afterwards • Flow level dynamics determines • Equilibrium: performance, fairness • Stability • Design flow level equilibrium & stability • Implement flow level goals at packet level
Reno TCP • Packet level • Designed and implemented first • Flow level • Understood afterwards • Flow level dynamics determines • Equilibrium: performance, fairness • Stability Packet level design of FAST, HSTCP, STCP guided by flow level properties
ACK: W W + 1/W Loss: W W – 0.5W • Reno AIMD(1, 0.5) ACK: W W + a(w)/W Loss: W W – b(w)W • HSTCP AIMD(a(w), b(w)) ACK: W W + 0.01 Loss: W W – 0.125W • STCP MIMD(a, b) • FAST Packet level
Flow level: Reno, HSTCP, STCP, FAST • Similarflow level equilibrium pkts/sec a = 1.225 (Reno), 0.120 (HSTCP), 0.075 (STCP)
Flow level: Reno, HSTCP, STCP, FAST • Commonflow level dynamics! window adjustment control gain flow level goal = • Different gain k and utility Ui • They determine equilibrium and stability • Different congestion measure pi • Loss probability (Reno, HSTCP, STCP) • Queueing delay (Vegas, FAST)
Implementation strategy • Commonflow level dynamics window adjustment control gain flow level goal = • Small adjustment when close, large far away • Need to estimate how far current state is wrt target • Scalable • Window adjustment independent of pi • Depends only on current window • Difficult to scale
Difficulties at large window • Equilibrium problem • Packet level: AI too slow, MD too drastic • Flow level: required loss probability too small • Dynamic problem • Packet level: must oscillate on binary signal • Flow level: unstable at large window 5
Problem: no target • Reno:AIMD (1, 0.5) ACK: W W + 1/W Loss: W W – 0.5W • HSTCP:AIMD (a(w), b(w)) ACK: W W + a(w)/W Loss: W W – b(w)W • STCP:MIMD (1/100, 1/8) ACK: W W + 0.01 Loss: W W – 0.125W
FAST Conv Slow Start Equil Loss Rec Solution: estimate target • FAST Scalable to any w*
Difficulties at large window • Equilibrium problem • Packet level: AI too slow, MD too drastic • Flow level: required loss probability too small • Dynamic problem • Packet level: must oscillate on binary signal • Flow level: unstable at large window
TCP Problem: binary signal oscillation
Solution: multibit signal FAST stabilized
Difficulties at large window • Equilibrium problem • Packet level: AI too slow, MD too drastic • Flow level: required loss probability too small • Dynamic problem • Packet level: must oscillate on binary signal • Flow level: unstable at large window Use multi-bit signal ! Stablize flow dynamics !
Outline • Background, motivation • FAST TCP • Architecture and algorithms • Experimental evaluations • Loss recovery • MaxNet, SUPA FAST
<RTT timescale RTT timescale Loss recovery Architecture
Architecture Each component • designed independently • upgraded asynchronously
Architecture Each component • designed independently • upgraded asynchronously Window Control
Window control algorithm • Full utilization • regardless of bandwidth-delay product • Globally stable • exponential convergence • Fairness • weighted proportional fairness • parameter a
Window control algorithm target backlog measured backlog
Outline • Background, motivation • FAST TCP • Architecture and algorithms • Experimental evaluations • Loss recovery • MaxNet, SUPA FAST
Dynamic sharing: 3 flows FAST Linux Dynamic sharing on Dummynet • capacity = 800Mbps • delay=120ms • 3 flows • iperf throughput • Linux 2.4.x (HSTCP: UCL)
Dynamic sharing: 3 flows FAST Linux Steady throughput HSTCP BIC
30min queue FAST Linux loss throughput Dynamic sharing on Dummynet • capacity = 800Mbps • delay=120ms • 14 flows • iperf throughput • Linux 2.4.x (HSTCP: UCL) HSTCP STCP
30min queue Room for mice ! FAST Linux loss throughput HSTCP HSTCP BIC
small window 800pkts large window 8000 Aggregate throughput Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts
Fairness Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts
stable in diverse scenarios Stability Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts
Responsiveness Dummynet: cap = 800Mbps; delay = 50-200ms; #flows = 1-14; 29 expts
I2LSR, SC2004 Bandwidth Challenge Harvey Newman’s group, Caltech http://dnae.home.cern.ch/dnae/lsr4-nov04 OC48 OC192 November 8, 2004 Caltech and CERN transferred • 2,881 GBytes in one hour (6.86Gbps) • between Geneva - US - Geneva (25,280 km) • through LHCnet/DataTag, Abilene and CENIC backbones • using 18 FAST TCP streams • on Linux 2.6.9 kernel with 9000KB MTU • at 174 Pbm/s
Internet2 Abilene Weather Map OC48 OC192 7.1G: GENV-PITS-LOSA-SNVA-STTL-DNVR-KSCY-HSTON-ATLA-WASH-NYCM-CHIN-GENV Newman’s group, Caltech
“Ultrascale” protocol development: FASTTCP FAST TCP • Based on TCP Vegas • Uses end-to-end delay and loss to dynamically adjust the congestion window • Defines an explicit equilibrium Capacity = OC-192 9.5Gbps; 264 ms round trip latency; 1 flow BW use 50% BW use 79% BW use 30% BW use 40% Linux TCP Westwood+BIC TCP FAST (Yang Xia, Caltech)
FAST backs off to make room for Reno Periodic losses every 10mins (Yang Xia, Harvey Newman, Caltech)
Linux Experiment by Yusung Kim KAIST, Korea, Oct 2004 • Dummynet • Capacity = 622Mbps • Delay=200ms • Router buffer size = 1BDP (11,000 pkts) • 1 flow • Application: iperf • BIC, FAST, HSTCP, STCP, Reno (Linux), CUBIC http://netsrv.csc.ncsu.edu/yskim/single_traffic/curves/
RTT RTT = 400ms double baseRTT FAST Throughput Yusung Kim, KAIST, Korea 10/2004 • All can achieve high throughput except Reno • FAST adds negligible queueing delay • Loss-based control (almost) fills buffer … • adding delay and reducing ability to absorb bursts HSTCP BIC
queue FAST FAST cwnd Yusung Kim, KAIST, Korea 10/2004 • FAST needs smaller buffer at both routers and hosts • Loss-based control limited at host in these expts HSTCP BIC
Outline • Background, motivation • FAST TCP • Architecture and algorithms • Experimental evaluations • Loss recovery • MaxNet, SUPA FAST
Loss Recovery Section Overview • Linux & TCP loss recovery has problems; esp. in non-congestion loss environments. • New Loss Architecture: • Determining packet loss & PIF • Decoupled window control • Testing in high loss environment • Receiver window issues • Forward Retransmission • SACK processing optimization • Reorder Detection • Testing in small buffer environment
New Loss Recovery Architecture • New Architecture for loss recovery motivated by new environments: • High loss wireless, 802.11, Satellite • Low loss, but large BDP • Measure of Path ‘difficulty’ should be extended • BDLP: Bandwidith x Delay x (1/(1-Loss))
Periodic losses every 10mins (Yang Xia, Harvey Newman, Caltech)
Haystack - 1 Flow (Atlanta-> Japan) • Iperf used to generate traffic. • Sender is a Xeon 2.6 Ghz • Window was constant: • Burstiness in rate due to • Host processing and ack spacing.