170 likes | 276 Views
Function Evaluation Using Tables and Small Multipliers. CS252A, Spring 2005 Jason Fong. Overview. Want to obtain values of elementary functions sin(x), cos(x), e x Full lookup table would be too large Bipartite and multipartite tables
E N D
Function Evaluation Using Tables and Small Multipliers CS252A, Spring 2005 Jason Fong
Overview • Want to obtain values of elementary functions • sin(x), cos(x), ex • Full lookup table would be too large • Bipartite and multipartite tables • Split into multiple smaller tables and add values to obtain an approximation
Table Method With Small Multipliers • Similar to multipartite method • Approximate using 5th order Taylor expansion • Use a set of smaller tables and some small multipliers • Better precision for same amount of hardware when compared to bipartite and multipartite methods
Taylor Series • Approximates the value of f(x) near x = a • More terms give a better approximation • But not directly applicable for table values
Making a Taylor Series Useful • Split n-bit input x into x0, x1, x2, x3, x4 • x0, x1, x2, x3 are k-bits wide • x4 is p-bits wide • 4k+p = n • p < k • Use first 5 terms, and set a = x0 • Rearrange terms into groups that depend on only two parts of x • Reduces possible values for each group • Reduces number of rows in a group’s table of values
Resulting Formula • Each term depends on only two parts of x • Compute all possible values of each term and create a lookup table with those values • Lookup table row number obtained by concatenating input values • Some terms require small multiplications • Add together all terms to get the function value
Input Restrictions • x is in a fixed-point format • x is in the range [0,1) • Range reductions common in approximation methods • Apply transformation to reduce range of input • Obtain approximation • Apply another transformation to obtain final value
Area Reduction in Tables • n = 23, k = 5, p = 3 • Full lookup table: • 2n entries, each 4k+p bits • ~8 million rows • Smaller tables: • 22k entries of 4k+p+g bits (Table A) • 22k entries of 2k+p+g bits (Table B) • 2 x 22k entries of k+p+g bits (Tables C and E) • 2p+k entries of p+g bits (Table D) • ~5000 rows
Multipliers • Two small multipliers: • k x k+p+g • k x p+g • One operand less than ¼ size of input precision • Modern FPGA’s include small multipliers
Implementation • Java program calculates values of tables • Function evaluator implemented using Altera Quartus II • Size and delay measurements for Altera Stratix II FPGA
Building Table Values • Java program generates Verilog code implementing each lookup table • Iterate through each combination of (x0,x1), (x0, x2), etc. and calculate the corresponding value of the table • Check correctness by iterating through all values of x and comparing with function’s real value
Guard Bits • Can find worse-case number of guard bits required based on logic structure • May not actually need all the guard bits • Adjust guard bit value and find minimum needed for a particular function
Results • Synthesized for an Altera Stratix II • 12480 ALUTs • 96 DSP blocks (used as multipliers) • f(x) = ex, n=23 • 2143 ALUTs (17%) • 4 DSP blocks (4%) • 23 ns delay
Possible Improvements • Optimize final adder • Currently using a generic parallel adder • Not all operands are the same width • Can optimize by making a custom adder • Merge multiplications into the final adder • Move partial product arrays into the adder • Change splitting of the x input • Improves table size • More complicated formulas for table values
References • D. Defour, F de Dinechin, and J.-M. Muller, "A New Scheme fo Table-Based Evaluation of Functions," Proc. 36th Asilomar Conf. Signals, Systems, and Computers, Nov. 2002 • F. de Dinechin, A. Tisserand, "Multipartite Table Methods," IEEE Transactions on Computers, March 2005 • M. Ercegovac, T. Lang, Digital Arithmetic, Ch. 10