220 likes | 369 Views
Rateless Wireless Networking Decoder. Mikhail Volkov Edison Achelengwa Minjie Chen. Cortex: a rateless wireless system. Very recent work here at CSAIL (Perry, 2011) Use a novel rateless code called spinal code
E N D
Rateless Wireless Networking Decoder Mikhail Volkov Edison Achelengwa Minjie Chen
Cortex: a rateless wireless system • Very recent work here at CSAIL (Perry, 2011) • Use a novel rateless code called spinal code • Encoder and decoder agree on a seed s0, a hash function h and an IQ constellation mapping
Spinal Encoder • Wish to transmit a message M = m1m2... mn • Break the message into k-bit segments Mi • Apply h to generate a spine
Spinal Encoder • Encoder performs passes over the spine, each time generating new constellation points • These constellation points are sent across an AWGN channel
Spinal Decoder • Decoder knows s0 so it can generate the 2kpossible candidate symbols s1using h • Each time decoder receives symbol y it keeps the B best symbols from 2k candidates using ML • The transmitted message is estimated as the one with the lowest ML cost
Objectives • Implement decoder on an FPGA • Evaluate feasibility of Cortex in a real communications system • Identify key performance bottleneck and develop a clear strategy for developing a practical Cortex system
Micro-architecture • Interface • Takes stream of constellation symbols as input • Outputs a message (192-bit packet) • Decoding Stages • Code Enumeration • Add-Compare-Select • Suggestion Update • Spine Evaluator Update • Get output message
Decoder Input bit Streams mkDecoder I curr_schedule toACSQ updateSymQ doACS outbitsQ rcv (put) Q curr_suggcosts suggupd doEnumerate Msg Symbol Vect(B*2^k, MarkedCost) getOutMsg out_msg (get) Vect(B*2^k, EnumResp) put Sorting module get Vect(B, MarkedCost) EnumReq Msg Schedule put get Vect(B, MarkedCost) getMsg Send_stat getSchedule Spine Evaluator updateTree Puncturing Scheduler backtrackMem mkSalsa, h(*) Symbol Mapper f(*) evalupd getBestMsgs get Vect(B, Mark) schedule params seeding parameters
Micro-architecture • Sub-modules • Puncturing Scheduler • Spine Evaluator • Sorter • Backtrack Memory
Decoder Input bit Streams mkDecoder I curr_schedule toACSQ updateSymQ doACS outbitsQ rcv (put) Q curr_suggcosts suggupd doEnumerate Msg Symbol Vect(B*2^k, MarkedCost) getOutMsg out_msg (get) Vect(B*2^k, EnumResp) put Sorting module get Vect(B, MarkedCost) EnumReq Msg Schedule put get Vect(B, MarkedCost) getMsg Send_stat getSchedule Spine Evaluator updateTree Puncturing Scheduler backtrackMem mkSalsa, h(*) Symbol Mapper f(*) evalupd getBestMsgs get Vect(B, Mark) schedule params seeding parameters
Practical Salsa Implementation • In practice we cannot have infinite precision floating point numbers • Salsa produces two outputs: a 64-bit spine and 512-bit arrays of symbol bits
Development and Testing • 3 point development and testing plan • Critical to our success with 3 people under time constraints Step 1: Develop Decoder backbone with dummy Sorter and Spine Evaluator. Develop Sorter and Spine Evaluator independently. - Sorter tested with MATLAB. - Spine Evaluator (and Salsa) tested with Python.
Development and Testing Step 2: Integrate Decoder with Sorter and Spine Evaluator. Ensure correctness at the architectural level: - Modules instantiate correctly - Rules fire as expected, no deadlocks etc. - Timing is correct - Bits flowing end-to-end
Development and Testing Step 3: Ensure correctness at the semantic level, i.e. “bit-by-bit debugging” - Encode string with Python encoder to produce symbols - Decode symbols and compare results out in AWGN Channel Python Decoder Python Encoder out Bluespec Decoder
Development and Testing • Finally, the algorithm was tested by adding noise to the transmitted symbols • Strictly not our concern, as long as our implementation agreed with the source code • Algorithm worked very well • Actually “outdid” the reference code at one point: the Python code crashed but our decoder correctly decoded the message!
Performance Analysis – FPGA frequency • The synthesized FPGA maximum frequency is 98.035 MHz. • Different Salsas gives the same FPGA frequency .
Performance Analysis - Area • Sorter and SpineEvaluator take the most area
Performance Analysis - Area • Our implementation actually fits on the FPGA. (roughly taking 30% of the total area) • Different Salsa implementation don’t vary too much on device utilization.
Performance Analysis - Code • The total lines of source code was 3104. Of these, the total lines of test code was 1135 (36.5%) and non-test code was 1969 (63.4%).
How much better can we do? • We used a naive O(n2) algorithm for the sorter module. We might be able to use other algorithm to reduce the cycle step from 149 to 32 in the best case, which brings a 5 times better performance and improve the bit rate ot 7.5Mbits/s. • Given the current space requirement of Salsa, we can have B(B=4) of seperate hashing modules running in parallel with each other. In this case, we can have 4 times of better performance and improve the bit rates to 7.5*4 = 30 Mbits/s. • Suppose we have sufficient area on the FPGA, we will be able to have B*2k = 32 of hash modules running in parallel with each other . This will bring 32 times of better performance and improve the bit rates to 7.5*32 = 240Mbits/s.