10 likes | 157 Views
1. 111. 1. 11111111. Advanced Bit Manipulation Instructions for Commodity Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Laboratory for Multimedia and Security Department of Electrical Engineering, Princeton University. New Instructions. Background and Motivation.
E N D
1 111 1 11111111 Advanced Bit Manipulation Instructions for Commodity Processors Yedidya Hilewitz and Ruby B. LeePrinceton Architecture Laboratory for Multimedia and Security Department of Electrical Engineering, Princeton University New Instructions Background and Motivation New Shifter Architecture Applications (and Speedup) • Advanced bit manipulations are not well supported by commodity microprocessors • These operations are performed using “programming tricks” (see Hacker’s Delight) • Bit manipulations play a role in applications of increasing importance • We propose adding direct support for a few key bit manipulation operations to accelerate these applications • Permutation • Butterfly and Inverse Butterfly • Bit Gather and Bit Scatter • Parallel Extract and Parallel Deposit • Bit Matrix Multiply • Other bit manipulation instructions (not covered here) • Bit matrix transpose • Population count • Cryptography • Random number generation • Von Neumann Extractor • Toeplitz Matrix Multiply • Steganography • Cryptanalysis (Gaussian elimination) • Other applications: • Binary compression • Binary image morphology • Bioinformatics • Communications coding • FFT • Finite field arithmetic • Integer compression • Pattern matching • Other applications suggested by you! (up to 2.24× speedup) (9.9× speedup) (14.9× speedup) (2.92× speedup) • Brand new shifter architecturethat replaces the shifter with a new unit that directly supports bit manipulation operations • New shifter performs • basic shifter operations: • shift, rotate, extract and deposit • multimedia shift-permute operations: • mix • advanced bit manipulation operations: • bfly, ibfly, pex, pdep • Yedidya Hilewitz and Ruby B. Lee, “A New Basis for Shifters in General-Purpose Processors for Existing and Advanced Bit Manipulations,” to appear in IEEE Transactions on Computers. • Yedidya Hilewitz and Ruby B. Lee, “Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors,” Proceedings of 18th IEEE Symposium on Computer Arithmetic (ARITH-18), June 2007. Butterfly and Inverse Butterfly Example Applications • Butterfly • lg(n) stages of n 2:1 MUXes split into n/2 pairs that pass through or swap inputs • bfly+ibfly = general permutation network • Any of the n! permutations of n bits can be done with one pass of both instructions • Inverse Butterfly • Cryptography – permutations in ciphers and hash functions, e.g., TDES: • Random Number Generators – extract bits from source of entropy • Von Neumann Extractor (Intel RNG) – given bit-pair sequence {x2i, x2i+1} from entropy pool, extract x2iif the bits differ: • Toeplitz Matrix Multiply Extractor – multiply bit string from entropy pool by a binary Toeplitz matrix: • LSB Steganography – embed secret message in least significant bits of image or audio file: Ongoing and Future Work • Identify new applications where bit manipulation instructions are useful (e.g., LFSR and FCSR RNGs, software radio) • Implementation • Refine current circuit implementation • Integrate new shifter in scalable crypto co-processor (PAX) Parallel Extract and Parallel Deposit • Parallel Extract (bit gather) • extracts bits from r2 flagged by 1’s in r3 and compresses and right justifies in result register Bit Matrix Multiply • bmm.n C = B, A • A, B, C: n × n bit matrices: • C = A × B mod 2 • for i from 1 to n • for j from 1 to n • ci,j = ai,1b1,jai,2b2,j … ai,nbn,j • bmm.8 unit (pictured above) can be directly incorporated into the ALU (<¼ size) • Yedidya Hilewitz and Ruby B. Lee, “Achieving Very Fast Bit Matrix Multiplication in Commodity Microprocessors,” Princeton University Department of Electrical Engineering Technical Report CE-L2007-006, August 2007. r3 1 111 1 11111111 r2 Summary and Conclusions r1 • Advanced bit manipulations play an important role in many applications • We have introduced a few select bit manipulation instructions that speed up these applications • We have evolved the shifter to a new design using butterfly and inverse butterfly datapaths to support basic and advanced bit manipulation instructions • Advanced bit manipulations are no longer esoteric “programming tricks” but rather supported directly by microprocessors at only a marginal cost • Parallel Deposit (bit scatter) • Deposits in the result register, at positions flagged by 1’s in r3, the right justified bits from r2 • Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors,” to appear in Journal of VLSI Signal Processing Systems. • Yedidya Hilewitz and Ruby B. Lee, “Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions,” Proceedings of the IEEE 17th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), pp. 65-72, September 11-13, 2006 (Best Paper Award). r2 r1 r3