90 likes | 329 Views
BIT MATRIX TRANSPOSE WITH TENSOR PRODUCT AND PERFECT SHUFFLING FOR SOFTWARE DEFINED RADIO. ECE 734 VLSI Array Structures for Digital Signal Processing Jui-Chieh (Jerry) Lin University of Wisconsin-Madison May. 8 th , 2013. Bit Matrix Transpose. What is “bit matrix transpose?”
E N D
BIT MATRIX TRANSPOSE WITH TENSOR PRODUCT AND PERFECT SHUFFLINGFOR SOFTWARE DEFINED RADIO ECE 734 VLSI Array Structures for Digital Signal Processing Jui-Chieh (Jerry) Lin University of Wisconsin-Madison May. 8th, 2013
Bit Matrix Transpose • What is “bit matrix transpose?” • Transpose the matrix in which each element is one bit • Application of “bit matrix transpose” • Interleaver in software defined radio • Helps combat burst error in forward error correction • Not so efficient on word-based processor • A “word” is a register performs parallel bit-operations together • Inherent mismatch on data format
Related Work: Bit Operations on Processor is NOT Easy • Bit tricks • Using bit level manipulation tricks • Limited applications and high engineering overhead • b = ((b * 0x0802 & 0x22110) | (b * 0x8020 & 0x88440)) * 0x10101 >> 16; // reverse a byte in an integer b • Look up table • Store bits positions, i.e. pattern, in memory and look up • Expensive memory storage (a few bytes per bit location) • Wifi 288 bits interleaver needs 288 of 9-bit-entries • LTE-Advanced 6144 bit interleaver needs 6144 13bit-entries
Perfect Shuffling in Modern Processors • Perfect shuffling operator Sm,n • Transpose m×n matrix to n×mmatrix • Perfect shuffling (S16,2) on Texas Instruments’ C64 S16,2 =
Tensor Product: Proposed Framework for Bit Matrix Transpose • Tensor product • An efficient method to verify the implementation • Example perfect shuffle operator with tensor product
Matrix Transpose Implementation • Perfect shuffling in tensor product • S32,32 = ((S32,2I16)•(I32S16,2))5 • Corresponding implementation in ANSI C: for i :=0 to i :=4 { for j :=0 to j :=3 { _DEAL( reg [ j ] ) for k:=0 to k:=16 { reg [ k ] = PACKH2( reg [2k] , reg [2k+1] ) reg [ k+16] = PACK2( reg [ 2k ] , reg [ 2k+1]) } } }
Preliminary Results (Square Bit Matrix Transpose) • Algorithm transformation of Bit Matrix Transpose • Bit parallel algorithms perform over 9X cycle reduction Around 10x
Summary • Bit matrix transpose • 9x speed up comparing to naïve method on Texas Instruments' C64x • Future work • Rectangular bit-matrix transpose