280 likes | 299 Views
This presentation covers the optimization methods for Verilog modules used in Palindrome Checkers. It explores loop unrolling techniques and strategies for improved efficiency. Various strategies, including shift registers and non-blocking instructions, are discussed for enhancing runtime performance.
E N D
Reconfigurable Computing (EN2911X, Fall07) Lab 2 presentations Prof. Sherief Reda Division of Engineering, Brown University http://ic.engin.brown.edu
4.2 seconds 14 seconds 33 seconds 300 seconds 305 seconds 320 seconds Runtimes by different teams
Cesare Ferri Rotor Le Palindrome Checker
Part I : Verilog Module DECOMPOSE THE NUMBER IN DIGITS Room for loop unrolling here.. always @(posedge CLOCK_50) begin not_palindrome = 1'd0;len = 0; tmp = number; //reset for (i = 0; i<9 ; i = i + 4'd1) begin if (tmp > 0) begin modulo = tmp % 4'd10; tmp = tmp / 10; vector[len % 9] = modulo; len = len + 1; end end th = (len >> 1) ; for (j=0; j<th; j = j + 4'd1) begin tmp2 = (len-1) - j; tmp3 = vector[j];tmp4 = vector[tmp2]; if ( tmp3 != tmp4 ) not_palindrome = 1'b1; end result = ~(not_palidrome); end
Part I : Verilog Module COMPARE THE DIGITS STORED INTO THE VECTOR loop unrolling, again.. always @(posedge CLOCK_50) begin not_palindrome = 1'd0;len = 0; tmp = number; //reset for (i = 0; i<9 ; i = i + 4'd1) begin if (tmp > 0) begin modulo = tmp % 4'd10; tmp = tmp / 10; vector[len % 9] = modulo; len = len + 1; end end th = (len >> 1) ; for (j=0; j<th; j = j + 4'd1) begin tmp2 = (len-1) - j; tmp3 = vector[j];tmp4 = vector[tmp2]; if ( tmp3 != tmp4 ) not_palindrome = 1'b1; end result = ~(not_palidrome); end
Optimized Verilog Code Do loop unrolling to compare digits: if (digits[0] == digits[3] && digits[1] == digits[2]) not_palindrome = 1'd1;//reset
Unsolved things • Our running time now depends on the way that we extract digits from the number • Some ideas to improve? • Using shift register • Using non-blocking instructions
Palindrome Homework Summary ENGN2911X Aaron Mandle Bryant Mairs
Setup Two-cycle fixed length custom instruction Operates on 20 numbers at a time Returns total palindromes in that 20-number block
Process Combinatorial conversion from binary to BCD Check number of digits Compare digits based on length Total up number of valid palindromes
Binary to BCD Conversion • Built using blocks of conditional add-3 modules and shifts • Add-3 modules: • 4-bit input • Adds 3 if input was 5 or greater • Based on adding 6 numbers > 9
module checkPalindrome(data, result); input [31:0] data; output [31:0] result; wire [3:0] digits [10:0]; wire [3:0] digCount; bin2bcd({digits[9], digits[8], digits[7], digits[6], digits[5], digits[4], digits[3], digits[2], digits[1], digits[0]}, data); assign digCount = digits[9] != 0?10: digits[8] != 0?9: digits[7] != 0?8: digits[6] != 0?7: digits[5] != 0?6: digits[4] != 0?5: digits[3] != 0?4: digits[2] != 0?3: digits[1] != 0?2: 1; assign result = digCount == 1 || digCount == 2 && (digits[0] == digits[1]) || digCount == 3 && (digits[0] == digits[2]) || digCount == 4 && (digits[0] == digits[3] && digits[1] == digits[2]) || digCount == 5 && (digits[0] == digits[4] && digits[1] == digits[3]) || digCount == 6 && (digits[0] == digits[5] && digits[1] == digits[4] && digits[2] == digits[3]) || digCount == 7 && (digits[0] == digits[6] && digits[1] == digits[5] && digits[2] == digits[4]) || digCount == 8 && (digits[0] == digits[7] && digits[1] == digits[6] && digits[2] == digits[5] && digits[3] == digits[4]) || digCount == 9 && (digits[0] == digits[8] && digits[1] == digits[7] && digits[2] == digits[6] && digits[3] == digits[5]); endmodule
Finding the length of the decimal representation (# digits) by: typedef unsigned long UINT; inline UINT GetMSDFIndx(UINT n) { return (n >= 100000000 ? 8 : (n >= 10000000 ? 7 : (n >= 1000000 ? 6 : (n >= 100000 ? 5 : (n >= 10000 ? 4 : (n >= 1000 ? 3 : (n >= 100 ? 2 : (n >= 10 ? 1 : 0)))))))); } For all solutions
Times: On laptop (Intel 2333 MHz): 8 secs. On NIOS (100 MHz): 3500 secs. Inherently sequential Early false detection: quit the computation if we find two digits that do not match. Brings down expected # divide operations to less than 2.2 Software Only Solutions
Observations: 1. Detect whether the MSD is a given number without division MSD test: d is the MSD of number n of length L if and only if d*10L-1 ≤ n < (d+1)* 10L-1E.g 4*103 <= 4765 < 5*103 2. “Cut out” the MSD: 4665 – 4*103 = 665 and continue. Algorithm: find one LSD after another, compare with MSDs, quit early if not a palindrome. Runs in 8 seconds on laptop Software Only Solutions
On NIOS, division is really expensive Division free algorithm:Don’t test the MSD, find it with binary search Software Only Solutions
On NIOS, division is really expensive Algorithm: Start from left Find half of the digits Compute the palindrome whose left half matches these digits Compare to the tested number Loose the early false detection, but still better than division. Runs in 3500 secs on NIOS 100 MHz. Software Only Solutions
A general trick to divide by a constant without using division.Based on trick I read in “Hackers Delight” of how to divide by 3. Demonstrate on divide by 10: Given: number n < 230 Needed: floor(n/10) Algorithm:Multiply n by (231+2)/10 = 0xCCCCCCD, and then shift right 31 positions. Using the Hardware
Algorithm:Multiply n<230 by (231+2)/10 = 0xCCCCCCD, and then shift right 31 positions. Proof: The above algorithm outputs: floor(n/10) <= n/10 + 2*n/(10*231) < floor(n/10) + 1 < 1/10 Division Free divide by 10 floor[ n * ((231+2)/10) * 1/231 ] = floor [ n/10 + 2*n/(10*231) ] = floor(n/10) n < 230 implies: 2*n < 231 floor(n/10) <= n/10 <= floor(n/10) + 9/10
Similarly, to divide n by a constant C, we need to findP and R such that: 2P + R = 0 mod C. R*n < 2P And then multiply n by (2P + R)/C, and shiftright P positions. Found the constants to all powers of 10 needed.Algorithm worst register to register delay: 25 ns. Run Time: 33 secs. Divide by Constant
EN2911X Lab 2: Palindromes Brian Reggiannini and Chris Erway
Checking a palindrome All combinational logic! Step 1: Convert 30-bit integer to 37-bit binary-coded decimal (BCD) format Step 2: Detect the length of decimal number Step 3: Compare pairs of digits with XOR
Integration with Nios II Worst-case propagation delay: 43ns, 5 cycles Don’t want to wait! Use 32-bit PIO interface Array of 25 palindrome-checking units Write out 32-bit start value… Read back # of total palindromes found (from next 25) While Nios is waiting: increment loop counter
Results Original C program: 49.59s/billion Unoptimized Nios C program: 7842s/100million Final result: 4.2s/billion (420000036 cycles @ 100MHz) Total logic elements: 23,039 / 33,216 (69%)