360 likes | 480 Views
Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms. Kyungtae Han Ph.D. Defense Committee Members: Prof. Ross Baldick (Dept. of ECE) Prof. Brian L. Evans (Dept. of ECE), advisor Prof. Margarida F. Jacome (Dept. of ECE)
E N D
Automating Transformations from Floating Point to Fixed Point for Implementing Digital Signal Processing Algorithms Kyungtae Han Ph.D. Defense Committee Members: Prof. Ross Baldick (Dept. of ECE) Prof. Brian L. Evans (Dept. of ECE), advisor Prof. Margarida F. Jacome (Dept. of ECE) Prof. Earl E. Swartzlander (Dept. of ECE) Prof. Robert A. van de Geijn (Dept. of CS) Computer Engineering Curriculum Track Dept. of Electrical and Computer Engineering The University of Texas at Austin May 9th, 2006
Outline • Introduction • Background • Contributions • Optimize fixed-point wordlengths • Reduce power consumption in arithmetic • Automate transformations of systems • Conclusion
L H Introduction Implementing Digital Signal Processing Algorithms Hardware Price Power* Floating- Point Processor $ Floating-Point Program Code Conversion Digital Signal Processing Algorithms Fixed- Point Processor Fixed Point (Uniform Wordlength) $ Wordlength Optimization L H Fixed- Point ASIC Fixed Point (Optimized Wordlength) $ L H ASIC: Application Specific Integrated Circuit * Power consumption
Introduction Transformations to Fixed Point • Advantages • Lower hardware complexity • Lower power consumption • Faster speed in processing • Disadvantages • Introduces distortion due toquantization error • Search for optimum wordlengthby trial & error is time-consuming • Research goals • Automate transformations to fixed point • Control distortion vs. complexity tradeoffs Floating-Point Program Code Conversion Transformation Wordlength Optimization Fixed Point (Optimized Wordlength)
Outline • Introduction • Background • Contributions • Optimize fixed-point wordlengths • Reduce power consumption in arithmetic • Automate transformations of systems • Conclusion
Wordlength S X X X X X Integer wordlength Fractional wordlength (Binary point) Background Fixed-Point Data Format • Integer wordlength (IWL) • Number of bits assigned to integer representation • Fractional wordlength (FWL) • Number of bits assigned to fraction • Wordlength (WL) SystemC format www.systemc.org π = 3.14159…(10) [Floating Point] 3.140625(10) = 011.001001(2)[WL=9; IWL=3; FWL=6] 3.141479492(10) = 011.00100100001110(2)[WL=16; IWL=3; FWL=13]
Background Distortion vs. Complexity Tradeoffs • Shorter wordlength may increase application distortion and decrease implementation complexity Applicationdistortion d(w) Feasible region Optimal tradeoff curve Implementation complexity c(w) • Minimize implementation cost • Minimize application distortion
Distortion constraint Complexity constraint Background Wordlength Optimization Constraints Application-specific distortion d(w) Application-specific distortion d(w) Dmax Cmax Implementation Complexity c(w) Implementation Complexity c(w) Enforcing both constraints bounds the search to a finite area region
Wordlengths of signals (variables) in digital system as vector Multiple objective optimization Background Wordlength Optimization • Single objective optimization
Function Evaluation New Gene Pool Genes w/ Measure Mutation Selection Mating Child Genes Parental Genes Background Genetic Algorithm • Evolutionary algorithm • Inspired by Holland 1975 • Mimic processes of plant and animal evolution • Find optimum of a complex function [From Greg Rohling’s Ph.D Defense 2004]
: Nondominated : Dominated Background Pareto Optimality • Pareto optimality: “best that could be achieved without disadvantaging at least one group”[Allan Schick 1970] • Pareto optimal set is set of nondominated solutions • E is dominated by C as all objectives for C are less than corresponding objectives for E • Solutions A, B, C, D are nondominated (not dominated by any solution) • Pareto front is boundary (tradeoff curve) that connects Pareto optimal set solutions Pareto Front I A G Objective 2 H B E C F D Objective 1
Outline • Introduction • Background • Contributions • Optimize fixed-point wordlengths • Reduce power consumption in arithmetic • Automate transformations of systems • Conclusion
Contribution #1 Search for Optimum Wordlength • Complete search • Search whole space • Impractical in systems with many variables • Gradient-based search • Utilizes gradient information to determine next candidates • Complexity measure (CM) [Sung and Kum, 1995] • Distortion measure (DM) [Han et al., 2001] • Complexity-and-distortion measure (CDM) [Han and Evans, 2004] • Guided random search • Genetic algorithm for single objective [Leban and Tasic, 2000] • Multiple objective genetic algorithm Proposed Proposed
Contribution #1 Complexity-and-Distortion Measure • Weighted combination of measures • Single objective function: • Gradient-based search • Initialization • Iterative greedy search based on complexity and distortiongradient information
b0 x[n] y[n] Delay b1 -a1 Contribution #1 Case Study: Filter Design • Infinite impulse response (IIR) filter • Complexity measure: Area model offield-programmable gate array (FPGA)[Constantinides, Cheung, and Luk 2003] • Distortion measure: Root mean square (RMS) error • Seven fixed-point variables (indicated by slashes)
Contribution #1 Case Study: Gradient-Based Search • CDM could lead to lower complexity and lower number of simulations compared to DM and CM * Maximumdistortion measured by root mean square (RMS) erroris 0.1 ** 167 = 268,435,456 (8.5 years, if 1second per 1 simulation)
Contribution #1 Case Study: Genetic Algorithm • Search Pareto optimal set (nondominated) • Handles multiple objectives: Error andArea Pareto Front 22,500 simulations 45,000 simulations 9,000 simulations 100th Generation 250th Generation 500th Generation * Population for one generation: 90 LUT: Lookup table
Contribution #1 Case Study: Comparison • Superpose gradient-based search (GS) results on GA results 50th Generation (4500 simulations) 500th Generation (45000 simulations) * Required RMSmax for gradient-based search areDmax{0.12, 0.1, 0.08} • GS methods can get stuck in a local minimum • GS methods reduce running time (CDM: 145 simulations)
Contribution #1 Comparison of Proposed Methods
Outline • Introduction • Background • Contributions • Optimize fixed-point wordlengths • Reduce power consumption in arithmetic • Automate transformations of systems • Conclusion
Contribution #2 Lower Power Consumption in DSP • Minimize power dissipation due to limited battery power and cooling system • Multipliers often a major source of dynamic power consumption in typical DSP applications • Multi-precision multipliers can select smaller multipliers (8, 16 or 24 bits) to reduce power consumption • Wordlength reduction to select any word size[Han, Evans, and Swartzlander 2004] Proposed
Contribution #2 Wordlength Reduction in Multiplication • Input data wordlength reduction • Smaller bits enough to represent, e.g. π x π ≈ 9 • Truncation • Signed right shift • Move toward the least significant bit (LSB) • Signed bit extended for arithmetic right shift Sign bit
Contribution #2 Power Reduction via Wordlength Reduction • Power dissipation • Switching power consumption • Static power consumption • Switching power consumption • Switching activity parameter, α • Reduce α by wordlength reduction Relationship between reduced wordlength and switching parameter α in power consumption?
L bits M bits N bits S … … S … … S S … S S … Contribution #2 Analytical Method • Consider stream of data for one of the multiplicands • Compare two adjacent numbers in stream after reduction • Expectation of bitswitching, x, withprobability Px • L-bit input data • Truncate input datato M bits (N bits areremoved) • N-bit signed rightshift in L-bit input(Y is sign bit)
L bits M bits N bits S … … S … … S S … S S … Contribution #2 Analytical Method No Reduction Reduction Wordlength (L) = 16
Contribution #2 Dynamic Power Consumption for Wallace Multiplier (1 MHz) Reduction (56%) 16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA) Truncate 1st arg Truncate 2nd arg (recode,nonrecode) Truncation- First Truncation- Second Wallace multiplier used in TI 320C64 DSP
Contribution #2 Dynamic Power Consumption for Radix-4 Modified Booth Multiplier (1 MHz) Sensitive (13%) Reduction (31%) 16-bit x 16-bit multiplier (Simulated on Xilinx XC3S200-5FT256 FPGA) Truncate 1st arg Truncate 2nd arg (recode,nonrecode) Swapping could have benefit Radix-4 modified Booth multiplier used in TI 320C62 DSP
Contribution #2 Summary of Contribution #2 • Truncation to 8 bits reduces est. power consumption by 56% in Wallace and 31% in Booth 16-bit multipliers • Signed right shift exhibits no est. power reduction in Wallace multiplier (for any shift) and 25% reduction in Booth multipliers (for 8-bit shift) • Power consumption in tree-based multiplier • Highly depends on input data • Simulation of all switching activity matches analysis of switching activity in reduced multiplicands in Wallace mult. • Operand swapping can reduce power consumption • In Booth multiplier, non-recoded operand 13% more sensitive in power consumption
Outline • Introduction • Background • Contributions • Optimize fixed-point wordlengths • Reduce power consumption in arithmetic • Automate transformations of systems • Conclusion
Fixed-point tools • SNU gFix, Autoscaler • CoWare SPW HDS • Synopsys CoCentric • MATLAB Fixed-point toolbox • MATLAB Fixed-point blockset • AccelChip DSP synthesis • Catalytic RMS, MCS Contribution #3 Automating Transformations from Floating Point to Fixed Point • Existing fixed-point tools • Support fixed-point simulation • Convert floating-point code to raw fixed-point code • Manually find optimum wordlength by trial and error • Automating transformations • Fully automate conversion and wordlength optimization process (Proposed) Floating-Point Program Code Conversion Wordlength Optimization Wordlength-Optimized Fixed-Point Program
Contribution #3 Automatic Transformation Flow • Code generation • Parse floating-point program • Generate a raw fixed-point program and auxiliary programs (top, objective, cost, etc.) • Range estimation • Estimate range to avoid overflow (Analytical/Simulation) • Determine integer wordlength (IWL) • Wordlength optimization • Optimize wordlength according to given input, and error specification (Analytical/Simulation) • Determine fractional wordlength (FWL) Code Generation Range Estimation Wordlength Optimization
Contribution #3 Code Generation for Fixed-Point Program • Adder function in MATLAB Function [c] = adder_fx(a, b) c = 0; a = fi (a, 1,32,16); b = fi (b, 1,32,16); c = fi (c, 1,32,16); c(:) = a + b; Function [c] = adder(a, b) c = 0; c = a + b; Determined by designers with trial and error (a) Floating point program for adder (b) Raw fixed-point program Function [c] = adder_fx(a, b, numtype) c = 0; a = fi (a, numtype.a); b = fi (b, numtype.b); c = fi (c, numtype.c); c(:) = a + b; WL S FWL fi(a, S,WL,FWL) is a constructor function for a fixed-point object in fixed-point toolbox [S: Signed, WL: Wordlength, FWL: Fraction length] (c) Converted fixed-point program for automating optimization (Proposed)
Contribution #3 Automating Transformation Environment for Wordlength Optimization Input Data Top Program Floating-Point Program Optimum Wordlength Evaluation Program (Objectives) Search Engine Fixed-Point Program Gradient-based or Genetic algorithm Range Estimation Complexity Estimation Error Estimation • Given floating-point program and options, • auxiliary programs are automatically generated • Given input data, optimum wordlength is searched
Contribution #3 Demo of Released Software
Conclusion Conclusion • Search for optimum wordlength • Gradient-based search reduces execution time with complexity-and-distortion measure method while solutions could be trappedin local optimum • Genetic algorithm can finddistortion vs. complexity tradeoff curve, but it requires longer execution time • Reduce power consumption by data wordlength reduction of multiplicands • Automate transformations from floating-point programs to fixed-point programs • Free software release is available at www.ece.utexas.edu/~bevans/projects/wordlength/converter/
End Thank you!