240 likes | 538 Views
Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors. Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Lab for Multimedia and Security Department of Electrical Engineering Princeton University 18 th IEEE Symposium on Computer Arithmetic (ARITH-18)
E N D
Performing Advanced Bit Manipulations Efficiently in General-Purpose Processors Yedidya Hilewitz and Ruby B. Lee Princeton Architecture Lab for Multimedia and Security Department of Electrical Engineering Princeton University 18th IEEE Symposium on Computer Arithmetic (ARITH-18) Montpellier, France, June 25-27, 2007
Background and Motivation • Advanced bit manipulations are not well supported by commodity microprocessors • These operations are performed using “programming tricks” (cf. Hacker’s Delight) • Bit manipulations play a role in applications of increasing importance • We propose a brand new shifter architecturethat replaces the shifter with a new unit that directly supports bit manipulation operations Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Outline • Background and motivation • Advanced bit manipulation operations • Delineation and example usage • New shift-permute functional unit • Summary and conclusions Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Advanced Bit Manipulation Instructions • Bit Permutation • Butterfly (bfly) and Inverse Butterfly (ibfly) • Bit Gather and Bit Scatter • Parallel Extract and Parallel Deposit Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Any of the n! permutations of n bits can be done with one pass of bfly and ibfly instructions • bfly+ibfly = general permutation circuit • 8-bit Butterfly • lg(n) stages of n 2:1 MUXes split into n/2 pairs that pass through or swap inputs • 8-bit Inverse Butterfly Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Bit Gather (Parallel Extract) and Bit Scatter (Parallel Deposit) • Parallel Extract • pex r1 = r2, r3 • extracts bits from r2 flagged by 1’s in r3 and compresses and right justifies in result register • Parallel extract maps to ibfly datapath • Parallel Deposit • pdep r1 = r2, r3 • deposits in the result register, at positions flagged by 1’s in r3, the right justified bits from r2 • Parallel deposit maps to bfly datapath Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Example Usage: Bioinformatics - DNA Sequence Reversal • DNA Bases A, C, G and T represented by two bit codes • Reversing DNA sequence is equivalent to reversing order of bit pairs • bfly or ibfly permutation • 1 ibfly instruction equivalent to 11-23 ALU and shifter instructions • 2×(and, and, shift, shift, or) + byte reverse instruction, at minimum Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Advanced Bit Manipulation Functional Unit • We propose adding a new functional unit to directly perform advanced bit manipulations • To minimize the cost, we intend for this new functional unit to replace the shifter unit • Shifter currently performs basic bit manipulation operations • Our new functional unit represents an evolution of shifter designs Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Basic Bit Manipulation Operations • shift r1 = r2, s • extract r1 = r2, pos, len • mix r1 = r2, r3 • rotate r1 = r2, s • deposit r1 = r2, pos, len Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Parallel Extract and Parallel Deposit • Parallel Extract • Parallel Deposit Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Barrel Shifter ? Evolution of Shifter Designs • Log Shifter • Our proposed design Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
New Shifter Design • Inverse butterfly (or butterfly) circuit enhanced with extra multiplexer stage is basis of new shifter design • We will show that either butterfly or inverse butterfly individually can do rotate • Rotations are the basic operation underlying shift, extract, deposit and mix • Model other basic bit manipulation operations as rotate + • zeroing • sign bit propagation or • merging Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
New Shift-Permute Functional Unit Implementation Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Configuring Inverse Butterfly for Rotations • Hard Problem: generating control bits for rotations on inverse butterfly circuit • We derive an expression for the control bits based on recursive function of shift amount, s, and stage number, j Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit • The input is right rotated by 5 after each stage within each subcircuit Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit • After stage 1, input is right rotated by 5 (mod 2) = 1 within each 2-bit subcircuit Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit • After stage 2, input is right rotated by 5 (mod 4) = 1 within each 4-bit subcircuit • Bits that wrapped at output of previous stage are swapped Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit • After stage 2, input is right rotated by 5 (mod 4) = 1 within each 4-bit subcircuit • Bits that wrapped at output of previous stage are swapped Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Example: Right Rotation by 5 on 8-bit Inverse Butterfly Circuit • After stage 3, input is right rotated by 5 • Bits that wrapped at output of previous stage are passed through Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Rotations in general on n-bit Inverse Butterfly Circuit • shift amount, s < n/2 → swap bits that wrapped • shift amount, s ≥ n/2 → pass through bits that wrapped Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Circuit Implementation of Rotation Control Bit Generator Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Comparison to Barrel and Log Shifters Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently
Summary and Conclusions • We proposed evolving the shifter to a new design using butterfly and inverse butterfly datapaths • New shifter subsumes basic shifter, multimedia shift-permute unit and advanced bit manipulation unit • We have shown how to perform basic shifter operations on these datapaths • Rotation control bit generator • Extra multiplexer stage for masking and merging • Use of the new shifter design in future microprocessor implementations allows for increased capabilities at only marginal cost Yedidya Hilewitz and Ruby B. Lee Performing Advanced Bit Manipulations Efficiently