Improving Loss Resilience with Multi-Radio Diversity in Wireless Networks

Improving Loss Resilience with Multi-Radio Diversity in Wireless Networks by Allen Miu, Hari Balakrishnan and C.E. Koksal Appeared in ACM MOBICOM 2005, was considered as a candidate for the best paper award.

What is the paper about? • The paper looks at multi-radio diversity and uses the fact that the fading experienced at the different radios are different to improve performance in WLANs. • The major contribution of this paper is that it proposes a new technique called “Frame Combining” using which, it tries to combine two frames received by the two radios, in error, to reconstruct the transmitted frame. • The effort is “thorough”; the authors do an implementation of their proposed approach, provide some analytical insights and also do simulations to study the problem deeper.

What did I learn from this paper? • Obtained insights into how much, and why, multi-radio diversity can help improve performance. • How does any kind of diversity affect “rate control” ? (to some degree) • Some interesting interactions between the physical, MAC and network layers which I will highlight in the presentation.

The MRD System • The idea is to use multiple radios or multiple APs in a wireless LAN to simultaneously receive transmitted frames. • Why does this help ? -- The radios are not spatially collocated. • Thus, the wireless channel to the two radios differ (path diversity) • Thus the errors experienced at one radio, would differ from that at the other. • So, you would get two copies of a transmitted frame, but possibly at errors at different locations. • Of course, if one of the frames is error free, that can be delivered to the IP layer (selection diversity). • If not, can we combine the corrupted frames ?

A High level view • Note: Unless I state otherwise, all figures that I have are from the paper itself. • Of the two APs, one is called an active AP -- it is in communication with the client. • The others passively listen and try to gather frames.

What are the challenges ? • How does this frame combining (done at a layer that sits above the MAC) interact with the 802.11 MAC and the PHY layer ? • Specifically, the frame combining make take some time, and so how can you acknowledge and provide retransmissions of “reconstructed” or “salvaged” frames ? • How does this frame recombining work with the auto rate control ? • How are the bit errors distributed ? Can frames even be salvaged and if so how ? • The paper tries to address these questions while proposing a new technique for frame combining.

Frame Combining • The idea is simple. Divide each frame into NB blocks, each of fixed size (the last block may possibly have fewer bits). • Let us say we want to see if we can reconstruct the frame from two copies that are received. • Clearly if any of the copies is ok (CRC is ok), then, the packet is successful. • We look at those blocks that differ -- as an example, the ith block of the first copy might differ from that of the second copy. • Assemble a combined frame with different possibilities --choosing different blocks from either of the copies. If CRC passes, then a success.

More about frame combining • What should the block size be ? • Note that the previous combining method is simple, but its running time is exponential in terms of the number of differing blocks D. • If you have two copies, then you need about 2D CRC check operations. • So, clearly you want to keep D small; which means that you may want to reduce the total number of blocks i.e., increase B, the number of bits per block. • However, if you do this, then, the possibility of successfully recombining reduces (Why ?).

Analyzing Frame Combining • Let the frame combining failure probability be pf. • Let there be a bit error model characterized by “bursts” of “b” bit errors. • pf is the fraction of frames that cannot be corrected with combining out of those that could not be corrected by the soft selection. • The assumption made is that the loss rates observed at the two receivers are independent of each other. -- The paper corroborates this claim by experimental results. • The errors are clustered and occur with a periodicity.

Bit Errors • The authors claim that the error pattern is in line with what is used. • QAM-64 modulation on OFDM with a rate 2/3 code. • With QAM-64, you transmit 6 bits/symbol. • This means, that for each transmission on 50 OFDM carriers, you have ~ 50 x 6 = 300 bits. • Since you use a rate 2/3 code, you decode 3 symbols at a given time -- each carrier carries 3 symbols. • Thus, you have approximately 900 bit transmission patterns on the different carriers that repeat. • Since each carrier is likely to experience similar fades periodically (static), the error distribution repeats about 1000 s.

Assumptions made for computing pf • In order to compute the frame combining failure probability pf, the authors make the following assumptions: • The burst of errors is of fixed to “b” bits. • The number of bits per block “B” is much larger than b and thus: • the probability that two blocks have errors that overlap is negligible. • they ignore the possibility that the errors can spread over more than one block -- i.e., the errors are completely contained within a block. • the number of burst errors that a block can hold are not fixed. • Note that given that b ~ 300 bits and a block is more than say 200 bytes, these are reasonable assumptions.

Notations and some details • Db,i -- number of b-bit sequences with errors in a given frame received at receiver Ri. • N1 and N2 represent the sets of blocks that contain errors in frames received at receivers 1 and 2. • Note that two receivers are considered. • Then, the intersection of the two sets N1 and N2 represent those blocks that have errors in both frames. • Now, if this intersection (N1N2 ), contains no errors, then, it means that the frame can be decoded.

Computing pf • First, assume that bit error sequences (b bits in error) occur uniformly over the frame. • Let frame 1 have d1 errors and frame 2 have d2 errors. • Then, Why is this true ?

This is similar to the problem wherein we have NB buckets, and d1 balls; we put the balls (probably more than NB) into those NB buckets. • We want to compute the number of ways in which we can put balls into these buckets. • Note that some buckets may have multiple balls and so we can have empty buckets.

So, we have the first NB buckets, and in addition have (d1-1) dummy buckets. Note that there is at least one bucket which contains balls (all balls). • If a ball falls into a non-empty bucket, we put the ball into one of the dummy buckets. • Thus, the total ways we can do what we want is to choose d1 buckets out of the NB + d1 -1 buckets. • Numerator: • First place d red balls (first frame) and d blue balls (second frame) in d blocks. • Distribute the rest of the balls in all blocks. • Thus, we have at least d errors. • It is an upper bound since some combinations are counted more than once.

Given this.... • We remove the conditioning to get: where: • Note : This is an upper bound on the frame combining failure probability.

Looking at pf • Note that if the burst error size is small, errors are more uniform, and even for large NB, probability of combining successfully is small. • With bursty errors (as observed), pf gets lower with NB. • But beyond a certain point, increasing NB does not help much.

What does this mean? • It is necessary to keep NB small, so as to reduce complexity. • So, NB can be set to a small value (6-10) and still performance is ok.

Retransmissions • Clearly link layer ACKs can result in erroneous conclusions. • So the authors disable link layer retransmissions. • Retransmissions are always invoked by the MRD layer. • If packet reception is successful, synchronous ACK is received (at link layer). • Else, this indicates either a frame with errors or an ACK failure. • A frame with errors is either recovered using soft decision or frame combining; if this fails, the frame may be stored with the hope of trying to combine with later retransmitted versions. • The sender expects an ACK sometime in the future. If prior to a time-out no ACK is forthcoming, it can request for an ACK -- to explicitly denote success or a failure. • This is called the RFA (request for ACK).

RFA • The RFA needs to explicitly state which frame is in question. • Use of a flag in the frame header to indicate an RFA. • When an link layer ACK fails, the MRDS (sender) simply stores the packet and proceeds with subsequent transmissions. • It can perform upto a certain number (N) of future transmissions (from the first unACKed frame). • Frame removed from buffer after K retransmissions. • If the MRD-ACK indicates a frame recovery failure, the frame is retransmitted. • If no MRD-ACK (higher layer) is received, retransmission after a time-out.

Link layer ACKs ? • Why did they not disable Link Layer ACKs ? • Needed for carrier sensing (virtual). • Second, the synchronous ACKs have already a reserved channel. • Loss is less probable. • MRDS ACKs, on the other hand, need to contend for the channel. So they may be either lost or delayed.

Rate Adaptation • With autorate or rate adaptation, the data rate is lowered if loss is encountered and increased with successful packet delivery. • Not good with MRD -- not all radio receptions taken into account. • The authors implement their protocol on the Atheros 5212 chipset driven by the Multiband Atheros driver. • The authors modify the driver to make it fit with the MRD implementation.

How ? • The original driver MADWIFI -- invokes TXCALLBACK to update numtx and numtxok after each frame transmission. • Rate is adjusted every T seconds. • If frame delivery rate is above 90 % for S consecutive observation periods, then increase bit rate. • If frame delivery rate drops below threshold D, then reduce rate. • The authors introduce a new function MRD_CALLBACK. • Very simple fix -- count the number of transmissions ACKED by MRD_ACKs. • Note that MRD_ACKs are cumulative -- so they have a clear picture of what was received even if some were lost.

Implementation • Pentium PCs, Linux Kernel 2.4.20, 802.11 a/b/g wireless interfaces based on the Atheros 5212 chipset. • They have modified the MADWiFi driver. • Simple experiments -- they set retry limit to zero at the MAC layer (so no retransmissions there). • This however, disables CSMA backoff and they say that they will look at it in the future. • Some discussion on the implications. • Since MRD-ACKs are not ACKed, they are transmitted in broadcast mode.

Implementation (contd). • CTX -- Combiner Transmit header. • NTX -- number of attempted transmissions, 1 RFA bit to indicate that the sender has pending frames. • seq -seq number. • useq -- oldest transmitted data frame in MRDS that has not been ACKed.

Identifying combinable frames • Look at the MAC layer source address and seq number in CTX to identify copies of the same network layer frame. • Since MRDS has to correctly identify the frames, a 4 byte CRC protection is used to protect the CRC and CTX headers. • If either of these headers are corrupted, the frame is dropped.

MRDS-ACK Implementation • Magic value distinguishes the MRD-ACK packet from other downlink data payload. • N bit transmit state -- indicates the success failure of up to N consecutive frames. • seq number -- seq value of the first data frame in the bit vector being acknowledged. • Link layer checksum used to detect errors in this packet.

Experiments • The authors perform experiments with low variability (LOVAR) -- where client is static and high variability (HIVAR) where the client is mobile. • Two APs -- one is the Master and the other is the Monitor. • The client is run in the 802.11 Managed mode (i.e., it is in the LAN access config.). • They pick NB (number of blocks) to be six. (look at discussion on complexity tradeoffs).

Setup for HIVAR experiments

Some Results • Note that only a small fraction of frames recovered using the combining process. • Throughput with just R1 or R2 were 8.25 and 6.42 Mbps. • With MRD-R1 and MRD-R2, the average throughputs were 18.7 and 18.36 Mbps. • Still lower than 31 Mbps UDP throughput (computed theoretically).

Failure of frame combining • They do thorough simulations to see why this happens. • They argue that NB was too small. • They also look at complexity versus efficiency trade-offs that I will not discuss here.

Rest.. • I won’t discuss the rest of the paper but hopefully, this has shown what it contains. • The final set is on LOVAR experiments, what happens there and finally, some discussion on back-offs -- if CSMA backoffs were invoked, then they wrongly cause nodes to back off • The packets may still be recovered.

Improving Loss Resilience with Multi-Radio Diversity in Wireless Networks