Profile-directed speculative optimization of reconfigurable floating point data paths

Profile-directed speculative optimization ofreconfigurable floating point data paths Workshop on Reconfigurable Computing at 2008 Ashley Brown, 27th Jan 2007 27th Jan 2008 | Ashley Brown

Introduction • Computational science requires reproducible and accurate results • IEEE-754 is a compromise • Broad range of values • Many special cases • Idea: use profiling to reduce range and remove special cases • Generate floating-point data-paths for FPGAs which are smaller and faster • BUT KEEP RESULTS CONSISTENT WITH IEEE-754 27th Jan 2008 | Ashley Brown

Advantages of Smaller Floating Point • Embedded Systems • Do the same work for a lower cost • Implement IEEE-754 compliant floating point where it may not have been possible before • High performance • Do more work with the same hardware • Increase in parallel execution on FPGAs • No need to sacrifice IEEE-754 compliance 27th Jan 2008 | Ashley Brown

Four Pictures to Explain: #1 27th Jan 2008 | Ashley Brown

Four Pictures to Explain: #2 27th Jan 2008 | Ashley Brown

Four Pictures to Explain: #3

Four Pictures to Explain: #4 Pre-optimisation Post-optimisation 27th Jan 2008 | Ashley Brown

Optimisation Technique • Remove features from the floating-point unit: • Operand alignment • Normalisation • Operand swap • If these were required, detect and fall-back to alternative solution: • Software-based on embedded/host processor • Hardware-based full implementation for larger designs 27th Jan 2008 | Ashley Brown

Optimisation Options 27th Jan 2008 | Ashley Brown

The stages of optimisation • Profile target application with training datasets • Source usually FORTRAN, C • Identify frequently-executed blocks • Check for good value-locality • Generate reduced-size floating point datapath • Reduced operand alignment hardware • Reduced normalisation hardware • Error checking: execute with additional datasets, check error rates 27th Jan 2008 | Ashley Brown

FloatWatch Profiler • Valgrind-based value profiler • Can return some metrics of interest here: • Floating point value ranges • Ratio of floating point operands • Each has uses for optimisation! 27th Jan 2008 | Ashley Brown

VFLOAT Library • VHDL variable-precision floating-point library • Initially developed by Belanovic at Northeastern, continued development under the supervision of Leeser • Allows basic customisation of precision, exponent bit widths • Further customisations added for our optimisations: • Operand alignment • Normalisation • Performance is lower than vendor-specific libraries 27th Jan 2008 | Ashley Brown

Data-path Generator • Takes user-selected data-path and generates VHDL implementation • Assembles modified version of the RPL library – customised to allow removal of various items • Builds hardware/software integration layer • C library for software • VHDL for hardware • Does not modify the software source automatically (yet) 27th Jan 2008 | Ashley Brown

Proof-of-Concept Testing • Original application modified to call C library (usually from FORTRAN) • Data sent to hardware, calculated, and returned • Software waits for response • No data-aggregation or hardware-side error detection occurs • Software layer performs same calculation for verification • Overall error rate reported 27th Jan 2008 | Ashley Brown

‘ydl_pij’ • ‘ydl_pij’ is an iterative solver for quantum mechanics, using the “Molecular Mechanics – Valence Bond” method • Datasets of various sizes available, allowing a variety of test cases be used • Initial profiling and testing use separate datasets 27th Jan 2008 | Ashley Brown

‘ydl_pij’: Profiling (Hot Code Section) Narrow value ranges 27th Jan 2008 | Ashley Brown

‘ydl_pij’: Identification • FloatWatch identifies the regions of code executing the most operations • In this case, these show narrow value ranges • Create optimised datapaths for testing • Maximum operand alignment reduced to 2n, where n is in the range [1, 6] • Normalisation hardware modified similarly 27th Jan 2008 | Ashley Brown

‘ydl_pij’ Error Rate Not profiled

‘ydl_pij’: Error Rate and Size • 20% size reduction with negligible re-execution rate (< 0.5%) • 27% size reduction with 3% re-execution rate • Size reduction permits ~40% increase parallelism due to better space usage

ydl_pij: Area saving for one F.P. adder/subtractor Pre-optimisation Post-optimisation 27th Jan 2008 | Ashley Brown

Coming Soon • Per-operation optimisations • Currently only at data-path level • Optimisation of operand-swap hardware • Per-operation exponent customisation (size, bias) • Performance evaluation using state-of-the-art FPGA accelerator hardware • Implementation of error detection and re-execution • Potential for even greater size reductions 27th Jan 2008 | Ashley Brown

Profile-directed speculative optimization of reconfigurable floating point data paths

Profile-directed speculative optimization of reconfigurable floating point data paths

Presentation Transcript

Floating Point

Floating point

Floating Point

Floating Point

Floating Point

Automatic Synthesis and Optimization of Floating Point Hardware

Feasibility Of Floating-Point Arithmetic In Reconfigurable Computing Systems

Precision Modeling and Bitwidth Optimization of Floating-Point Applications

Floating Point

Floating Point

Profiling floating point value ranges for reconfigurable implementation

Directed paths decomposition of complete multidigraph

Floating Point

Floating point

Floating Point

Floating point

Floating Point

Floating Point

Floating Point

Floating Point

Floating Point