Alexandru Bârleanu, Vadim Băitoiu and Andrei Stan

Floating-point to fixed-point code conversion with variable trade-off between computational complexity and accuracy loss Alexandru Bârleanu, Vadim Băitoiu and Andrei Stan Technical University “Gh. Asachi”, Iaşi, Romania 15th International Conference on System Theory, Control and Computing (Joint conference of SINTES15, SACCS11, SIMSIS15) October 14-16, 2011 Sinaia, ROMANIA

Motivation • Embedded microprocessors: • No hardware dedicated to floating-point • Limited processing capabilities • Emulated floating-point arithmetic: • Unnecessary high accuracy • Long execution time • Fixed-point code written manually: • Error-prone • Important accuracy loss

3/13 Existing work • For FPGA • The main problem is fractional word-length optimization • The search space grows exponentially with the number of fixed-point variables • Search techniques (often sophisticated) are necessary: • Greedy algorithms • Genetic algorithms • Simulated annealing • Optimization objectives: accuracy loss, area • For microcontrollers, C language • Existing solutions: • Fixed-point format is supplied by the user (in annotations, for example) • Fixed-point format is determined through simulations, taking into consideration for example some accuracy constraints • Available integer types types in C: only 16/32/64-bit signed/unsigned • Optimization objectives: accuracy loss, number of (scaling) operations

Problem formulation The problem is constructed from practical considerations: • Input – a digital filter: • Filter structure: Direct-Form I • Constant floating-point coefficients • Known input bounds (low/high values) • Output – ANSI-C integer code: • ideally the result must be the same as if floating-point code would have been used

Building the dataflow • Initial state – very long fractional parts • Multiply operators overflow • Add operators have unaligned terms • Changing the dataflow – making nodes representable in C • Resolving overflows in any operator • Aligning summation terms Recursive method calls – bottom-up action Example: making node run-time integer interval smaller (scaling) Run-time integer interval: [0; 4 400 000 000] Fractional word-length: 27 Datatype: none (using only 16/32 bit integers) Floating-point interval: [0; 32.782...] Run-time integer interval: [0; 2 200 000 000] Fractional word-length: 26 Datatype: unsigned long Floating-point interval: [0; 32.782...]

Dataflow transformation philosophy Overflow avoidance (not optional!) Run-time integer interval reduction (together with FWL) Discarding of least significant bits (multiple ways)

Selecting the optimal dataflow transformation Increase or decrease node run-time integer interval Construct multiple dataflow transformation variants (alternative dataflow fragments) Ideal values Number of cycles SQNR loss, error distribution... Compare candidate dataflow transformation variants using a linear cost function Size of error interval Number of operators Analitycally computed values

Varying the cost function coefficients (example) Filter Response type: bandpass Type: FIR Order: 40 Target/Compilation Processor: ARM Cortex-M3 Compiler: IAR C/C++ 5.41 for ARM (Kickstart) Optimizations: medium SQNR loss Time (cycles) For comparison – the floating-point code takes 3984-4078 cycles 4 dataflows shown from 18 total found

Implementation insights • Language: Java SE 1.6 • Techniques: OOP, polymorphism • Analitycal estimation of run-time integer intervals, dataflow complexity, and node error intervals • Dataflows are transformed using Change instances (not by copying large dataflow portions and modifying them). • Change instances are invertible (apply/undo) • Changeinstances can be combined in logical AND and OR • Dataflow vizualization: dot (graph description language)

Usage example Filter properties Response type: highpass Type: FIR Order: 30 Designed with: Matlab FDATool Conversion information Number of dataflows produced by varying the cost function coefficients: 158 (18 different) Total transformation time: 2.44s Performance of fixed-point function #7 Distortion (SQNR loss): 3.1e-05dB Speed test: Device: MSP430F149 Compiler: IAR 5.10 (Kickstart) Compiler opt.: High speed Factor: 11.5

Testing

Results Accuracy loss Variable trade-off between complexity and accuracy Floating-point code SQNR loss: 1e-5...1e-1 dB Number of cycles Speed factor: 3...15 (or more if compiler optimizations are applied) Constant execution time (no jitter – more determinism)

Conclusions An innovative floating-point to fixed-point conversion method for C language is proposed: • Very good speed factor is obained (integer code compared with floating-point code). • Very good accuracy is obtained for FIR filters. • The conversion algorithm is designed to use variable cost functions. It is possible to specify, for example, that complexity is importantand accuracy loss is unimportantwhen building the integer dataflow. • The conversion time is very short. This happens because: • Dataflow metrics are estimated analytically • Dataflow nodes have cache information (run-time integer interval, error interval) • The automatic search of dataflows algorithm uses a heuristic to generate as few as possible identical dataflows

Alexandru Bârleanu, Vadim Băitoiu and Andrei Stan

Alexandru Bârleanu, Vadim Băitoiu and Andrei Stan

Presentation Transcript

Pointer and Escape Analysis for Multithreaded Programs

GMM and the CAPM

Digoxin Toxicity

PS-21

Disaster Ready…or Not? Stan Szpytek, AzHCA Consultant Life Safety / Disaster Planning

TORAH LIGHT MINISTRIES P.O. BOX 2500 OROVILLE WA 98844 Dr. Stan Chester “Science and the “Sign ”

(IaaS) Cloud Resource Management: An Experimental View from TU Delft

Motion

DIAGNOSTICS FOR OBSERVATION and DAMPING of E-P INSTABILITY

(IaaS) Cloud Benchmarking: Approaches, Challenges, and Experience