400 likes | 558 Views
Fault Tolerance for WLAN. Speaker : Mark Yang. 93.04.27. Outline. Hardware Fault Tolerance Dependability enhancement for IEEE 802.11 wireless LAN with redundancy techniques Tolerance to Access-Point Failures in Dependable Wireless Local-Area Networks Comparison Software Fault Tolerance
E N D
Fault Tolerance for WLAN Speaker:Mark Yang 93.04.27
Outline • Hardware Fault Tolerance • Dependability enhancement for IEEE 802.11 wireless LAN with redundancy techniques • Tolerance to Access-Point Failures in Dependable Wireless Local-Area Networks • Comparison • Software Fault Tolerance • TCP-DCR: A novel protocol for tolerating wireless channel errors • Implementation of Explicit Wireless Loss Notification Using MAC-Layer Information • Comparison • Simulation • Conclusion 2 / 40
Dependability enhancement for IEEE 802.11 wireless LAN with redundancy techniques Dependable Systems and Networks, 2003. Proceedings. 2003 International Conference on , 22-25 June 2003 Pages:521 - 528 Hardware (1) 3 / 40
Hardware (1) – Abstract • Propose the alternate approach of tolerating the existence of "shadow regions" as opposed to prevention in order to enhance the connection dependability. • A redundantAP is placed in the shadow region to serve the mobile stations which roam into that region. • The secondary AP is connected to the same distribution system as the primary AP. (DS configuration) • The secondary AP acts as a wireless forwarding bridge for traffic to/from the mobile stations in the shadow region to the primary AP. (Forwardingconfiguration) 4 / 40
Hardware (1) – DS configuration • An additional AP is placed in the shadow area with the same frequency as the primary AP. • The secondary AP forwards the data between the mobile stations in the shadow area and the primary AP. The two APs communicate over thedistribution system. 5 / 40
Hardware (1) – Forwarding configuration • The secondary AP is placed at a certain location where it could communicate with both the mobile terminal in the shadow area and with the primary AP. • The secondary AP thus could forward the packet transmissions in both directions. 6 / 40
Hardware (1) – Aspects • Since the beacon interval is less than 100ms, the maximum detection delay for link failures is 100ms • A mobile station in the shadow region may only transmit data when it is granted a TXOP by the secondary AP • The primary AP sends the specification of TXOP to the secondary AP. • Simultaneously, the primary AP also broadcasts the same channel reservation message in the cell in the form of a QoS-Poll frame. • All stations in the non-shadowed area will receive the QoS-Poll frame and defer any transmission attempt until the channel reservation time is over. • With channel reserved, the secondary AP then sends the QoS-Poll frame to mobile stations in the shadow region sequentially so that they may send their data packets free of collisions. 7 / 40
Hardware (1) – Fault models • Reliability: • Availability: • Survivability: 8 / 40
Availability = 1-10-2 = 0.99 1 /λos ≈ 2.8 hours Hardware (1) – Numerical examples 9 / 40
Tolerance to Access-Point Failures in Dependable Wireless Local-Area Networks Object-Oriented Real-Time Dependable Systems, 2003. Proceedings. Ninth IEEE International Workshop on , 1-3 Oct. 2003 Pages:136 - 143 Hardware (2) 10 / 40
Hardware (2) – Abstract • Enhancing the dependability of wireless networks by focusing on tolerating AP failures and develop and evaluate a new fault-detection approach, based on signal-to-noise ratio. • Detection of AP Failures • Beacon-frame monitoring • Signal-to-noise ratio • Three techniques to recover from AP failures: • Access-Point Replication • Overlapping-Coverage • Link-Multiplexing 11 / 40
Hardware (2) – Beacon-frame monitoring • Handoff mechanism in 802.11 WLAN • Passive Scanning: A mobile station sweeps from channel-to-channel to detect the presence of Beacon frames which are periodically transmitted by the APs. • Active Scanning: A mobile station actively seeks out APs by broadcastingProbeRequest frames on every channel. • Need to distinguish between user mobility and AP's failure. • User mobility: • A few users trying to handoff to a new AP due to user mobility at a given point • of time. • Employ active scanning to discover new APs. • AP's failure: • The number of users trying to handoff to a new AP could be relatively large. • Using a passive scanning method instead to detect the presence of new AP. 12 / 40
Hardware (2) – Signal-to-noise ratio • Using the strength of the signal that a mobile station receives from an AP, as an indicator of the AP's "up/down" status. • Initial fault recovery mechanism if the signal-to-noise ratio (SNR) drops suddenly. 13 / 40
Hardware (2) – Access-Point Replication • Using an additional AP that is designated as a backup, and that can be activated once the primary AP fails. • Drawback: • The latency involved in detecting AP failures and performing the fail-over (authenticate ACK re-association request re-association response) is relatively large (7.03 seconds). • Additional infrastructural costs – might not necessarily be actively used under fault-free conditions. 14 / 40
Hardware (2) – Overlapping-Coverage • If one AP fails, mobile stations associated with that AP can be transferred over to another AP whose coverage area intersects with that of the failed AP. • In IEEE 802.11, the channels used by neighboring AP be separated by at least five channels, this limited availability of channels can result in shadow areas. • Drawback: • Requires that some spare capacity be reserved at each AP to take over the additional users that the AP will have to support in case a neighboring AP (with overlapping coverage) fails. • The latency involved in detecting an AP failure and switching to a functional AP is relatively large. 15 / 40
Hardware (2) – Link-Multiplexing • Using redundant communication paths from a mobile station, with each path connecting a distinct wireless network-interface card at the mobile station to a distinct AP. • Using link-multiplexing over link-replication • Total bandwidth used for communication can remain the same as that used by a single link. • Increase in the amount of average delay in message transmission due to the multiplexing and demultiplexing. 16 / 40
Hardware (2) – Link-Multiplexing (cont.) • Requires additional software be installed at the client & server. • Utilize a library interpositioning (interceptor) approach to capture the network layer calls made by the application, and can be embedded inside a middleware layer at both the client and the server. • Fault-detection. • Intercepting network layer calls made by the application. • Multiplexing/de-multiplexing data from/to the application. 17 / 40
Hardware – Comparison 18 / 40
TCP-DCR: A novel protocol for tolerating wireless channel errors Accepted for publication in IEEE Transactions on Mobile Computing (February 2004) http://www.crhc.uiuc.edu/wireless/groupPubs.html Software (1) 19 / 40
Software (1) – Abstract • TCP-DCR delay the triggering of congestion response algorithms for a small bounded period of time T to allow the link level retransmissions to recover the loss due to channel errors. • If the packet is not recovered by link level retransmission by the end of the delay period, TCP-DCR protocol triggers the congestion recovery algorithms of fast retransmission and recovery. • Through simulations, TCP-DCR • Does not impact the fairness towards the native implementations of TCP. • Significantly better performance when channel errors contribute more towards packet losses in the network. 20 / 40
Software (1) – Behavior 21 / 40
t0 t0+(RTT/2 – rtt/2) t0+RTT/2 Software (1) – Choice of T t0+RTT/2+rtt/2 BS receives indication that the packet is lost t0+RTT/2+rtt Packet is recovered at receiver t0+RTT+rtt Sender receives an ACK for the packet Sender would have to delay the congestion at least: (t0+RTT+rtt)-(t0+RTT) = rtt The interpacket delays are non-zero and the TCP sender may not know the value of rtt The lower bound of T is one RTT Retransmission timeout is usually set to RTT + 4 times. The choice of T should be such that unnecessary retransmission timeouts are avoided. The upper bound of T is one RTT. 22 / 40
Software (1) – Simulation No Congestion Losses 23 / 40
12 TCP-SACK flows & 12 TCP-DCR flows congestion 10Mbps 5ms Software (1) – Simulation(cont.) Only Congestion Losses 24 / 40
Software (1) – Simulation(cont.) Channel Errors & Congestion Losses 12 TCP-SACK flows & 12 TCP-DCR flows TCP-DCR flows can make use of the link bandwidth not utilized effectively by the TCPSACK flows. 25 / 40
Implementation of Explicit Wireless Loss Notification Using MAC-Layer Information Wireless Communications and Networking, 2003. WCNC 2003. 2003 IEEE , Volume: 2 , 16-20 March 2003 Pages:1339 - 1343 vol.2 Software (2) 26 / 40
Software (2) – Abstract • TCP suffers a significant degradation in performance over wireless networks because it does not distinguish wireless link loss from congestion loss. • To overcome this problem, the Explicit Wireless Loss Notification (EWLN) scheme is proposed to explicitly inform wireless link loss to the TCP sender. • EWLN scheme that deploys the information from the MAC layer and takes into account the interplay with the error recovery mechanism at the link layer. • The sender's congestion control mechanism can be decoupled from the retransmission mechanism and set to react only to congestion related losses. 27 / 40
Link-level retry MAC Protocol Comparing the seqNo of the current and buffered packets To mobile terminal To next node Software (2) – MAC Protocol 28 / 40
Ewln_flag = 1 Send duplicate ACK Ewln_flag = 0 Congestion error Wireless link error and link-level can't recovery Normal Software (2) – Receiver 29 / 40
To avoid transmission duplication, retransmit only when the first ACK + EWLN Retransmit the packet upon receiving the first ACK with EWLN set. Software (2) – Sender 30 / 40
Receive 1 Not receive 2 Not receive 2 Not receive 4 Not receive 4 Not receive 4 If lose again? Not receive 4 Receive 6 Software (2) – Example Two packet losses occur over a wirless link in a single transmission window 31 / 40
No wirelesslink error Software (2) – Simulation 32 / 40
Software (2) – Simulation (cont.) congestion 33 / 40
Software – Comparison 34 / 40
Simulation – Environment • Paper : • TCP-DCR: A novel protocol for tolerating wireless channel errors • Software : • Linux 9 + NS 2.26 (DCR: modify tcp-sack1.cc) • Topology : • Tcl (additional setup): • Error Model (exponential) • Link Level Retransmission (LL/LLSnoop) 35 / 40
1st dupack 70.2242-69.9014 < 0.553 LL retransmission 85.656-85.4978 > 0.144 Fast recovery Time out Simulation – DCR code verify ack no 1131 received at 69.7674, cwnd=19 ack no 1131 received at 69.9014, cwnd=19 dcr start at 69.9014 [ack no=1131, delay time=0.553] ack no 1131 received at 70.2204, cwnd=19 ack no 1131 received at 70.2242, cwnd=19 delay fast recovery at 70.2242! [ack no=1131] ack no 1140 received at 70.7637, cwnd=19 dcr cancel at 70.7637 [ack no=1140] ack no 4128 received at 85.4946, cwnd=19 ack no 4129 received at 85.4962, cwnd=19 ack no 4129 received at 85.4978, cwnd=19 dcr start at 85.4978 [ack no=4129, delay time=0.144] ack no 4129 received at 85.6545, cwnd=19 ack no 4129 received at 85.656, cwnd=19 fast recovery begin at 85.656, dcr cancel! [ack no=4129] ack no 4142 received at 85.6816, cwnd=9 ack no 1417 received at 87.2862, cwnd=22 ack no 1417 received at 87.4168, cwnd=22 dcr start at 87.4168 [ack no=1417, delay time=0.849] ack no 1429 received at 91.4158, cwnd=1 dcr cancel at 91.4158 [ack no=1429] ack no 1429 received at 91.4769, cwnd=2 36 / 40
Simulation – Performance (1) Almost all errors are recovered by "Link level retransmission“. Some of "Fast-recovery" & "Timeout" events stall happen. 37 / 40
Simulation – Performance (2) Almost all errors are recovered by "Link level retransmission“. Some of "Fast-recovery" & "Timeout" events stall happen. 38 / 40
Simulation – Performance (3) Almost all errors are recovered by "Link level retransmission“. Some of "Fast-recovery" & "Timeout" events stall happen. 39 / 40
Conclusion • Papers for WLAN Fault Tolerance • Hardware Fault Tolerance : less • Software Fault Tolerance : more • Simulation / Experiment method • Hardware : Numerical examples or Experiment • Software : NS (Network Simulation tool) 40 / 40