230 likes | 242 Views
This case study explores the factors affecting TCP performance over wireless networks, analyzing the impact of link-layer mechanisms and network protocol processing. It investigates issues such as bottleneck identification, ack holding, repeated data, and suppressed fast retransmit. The study also highlights problems with collisions and MAC-layer rate adaptation.
E N D
Diagnosing Wireless TCP Performance Problems: A Case Study Tianbo Kuang, Fang Xiao, and Carey Williamson University of Calgary
Agenda • Motivation • Background • TCP • IEEE 802.11b Wireless LAN (WLAN) • Universal Serial Bus (USB) • Experimental Methodology • Results
Motivation • TCP performance often degrades over wireless networks; reasons “well-known” • Solutions to improve TCP performance over wireless links exist, but how well do they work in a real wireless LAN environment? • How do link-layer mechanisms interact with TCP and affect the overall performance? • Where is the bottleneck in the network protocol processing path, and why?
Background - TCP • Widely used on the Internet (e.g. Web) • Connection-oriented, reliable byte stream • Window-based flow control • Slow start and congestion avoidance • Fast retransmission, fast recovery • Other extensions, including TCP SACK • Many different versions in use
Background – IEEE 802.11b • An “Ethernet-like” LAN standard (11 Mbps) • Infrastructure mode and ad hoc mode • Carrier-sense multiple access with collision avoidance (CSMA/CA) to reduce collisions • MAC-layer: positive acknowledgment and retransmissions (to recover from channel errors) • Dynamic rate adaptation: can choose data transmission rate of 1, 2, 5.5, or 11 Mbps
Background – USB • Widely used industry standard for connecting a computer to its peripherals (bus topology) • Lots of USB-based (wireless) network cards • Data transfers managed by Host Controller (HC) • Synchronous bus: 1 msec slots for transfers • Transfer requests are handled using vertical and horizontal linked-list data structures • Two processing modes for HC: • Breadth-First or Depth-First • High Speed Bandwidth Reclamation (HSBR)
BF DF Background – USB (cont’d) • Queued mode (keep HC busy) • Transfer size: 64 – 1023 bytes each
FSBR Background – USB (cont’d) • Queued mode (keep HC busy) • Transfer size: 64 – 1023 bytes each
Experimental Methodology • Experimental Setup (HW/SW) • Laptop – Compaq Evo 719c with multiport USB wireless card (Linux 2.4) • Access point – Lucent RG-1000 • Stationary host on Ethernet LAN – SunOS 5.8 • Run netperf on laptop and netserver on wired host • SnifferPro 4.6 wireless “sniffer” and tcpdump • Experimental Factors • USB mode, driver settings • Wireless channel (distance) between laptop and AP data netperf laptops tcpdump netserver AP Ethernet Sniffer acks
Initial Result • Windows 2000 implementation of TCP is more than 3 times faster than Linux TCP! • Reason: Linux driver bug (2 Mbps vs 11 Mbps) OS Throughput Linux 1.52 Mbps Windows 2000 5.11 Mbps
Results – USB Experiments • With FSBR disabled, USB is the bottleneck • With FSBR enabled (the default in Linux), the wireless network is the bottleneck • Queued mode makes no difference with FSBR on, but helps when FSBR is turned off • Queued mode (even with FSBR turned on) may be very important when higher speed wireless link is used (e.g. IEEE 802.11a)
Results – TCP Problems • The “ack holding” problem • A bug in the NIC firmware or interrupt driver of Linux OS causes excessive delays (> 100 ms) • This leads to a spurious TCP timeout • The retransmission of previously acked data! • Actually just an artifact of tcpdump observation • The lack of a TCP “fast retransmit” after receiving three duplicate Acks • A deliberate (but not well-known) feature of TCP
Results – TCP “ack holding” (laptop) (wired) (sniffer) (kernel)
Results – TCP “repeated data” • The spurious TCP timeout was not properly detected • Caused by initialization bug in Linux TCP implementation • The “repeated data” problem is an artifact induced by presence of link layer buffer
Results – TCP “suppressed FR” • This is a deliberate feature to prevent a false fast retransmit after a timeout • This situation is quite likely to occur in a wireless environment • It’s not a bug, but a feature! (correct)
Results – Wireless Problems • We observed unusually high collision rates on the wireless channel for TCP transfers, which we call the TCP data/ACK collision problem • Scenario: laptop and AP are 1 m apart • For TCP, MAC-layer retransmit rate: 4.58-4.73% • For UDP, MAC-layer retransmit rate: 0.47-0.98% • In general, a retransmission rate of 1.75%-7.2% has been seen for other vendor HW/SW (N = 1) • For TCP, disabling MAC-layer retransmission degrades throughput by 23%
Results – Wireless Problems • The MAC-layer rate adaptation problem • Scenario: laptop and AP are 100 m apart • Lousy TCP throughput, lots of retransmits • Reason: the multiplicative increase and multiplicative decrease (MIMD) bandwidth probing mechanism causes network thrashing and wastes battery power • The small congestion window causes temporary deadlock if the TCP receiver uses delayed Ack
Conclusions • TCP performance on WLAN can be wacky! (at least for Compaq Multiport 802.11b USB wireless card under Linux 2.4) • Several factors can affect overall performance • Poorly configured USB bus could be the bottleneck • Linux TCP implementation bug makes TCP unable to recognize the first spurious timeout • Poor MAC-layer rate adaptation algorithm can cause a “network thrashing” problem • TCP’s data/ACK structure may induce excessive collisions at the MAC layer on wireless LANs