210 likes | 291 Views
Profile-directed speculative optimization of reconfigurable floating point data paths. Workshop on Reconfigurable Computing at 2008 Ashley Brown, 27 th Jan 2007. Introduction. Computational science requires reproducible and accurate results IEEE-754 is a compromise
E N D
Profile-directed speculative optimization ofreconfigurable floating point data paths Workshop on Reconfigurable Computing at 2008 Ashley Brown, 27th Jan 2007 27th Jan 2008 | Ashley Brown
Introduction • Computational science requires reproducible and accurate results • IEEE-754 is a compromise • Broad range of values • Many special cases • Idea: use profiling to reduce range and remove special cases • Generate floating-point data-paths for FPGAs which are smaller and faster • BUT KEEP RESULTS CONSISTENT WITH IEEE-754 27th Jan 2008 | Ashley Brown
Advantages of Smaller Floating Point • Embedded Systems • Do the same work for a lower cost • Implement IEEE-754 compliant floating point where it may not have been possible before • High performance • Do more work with the same hardware • Increase in parallel execution on FPGAs • No need to sacrifice IEEE-754 compliance 27th Jan 2008 | Ashley Brown
Four Pictures to Explain: #1 27th Jan 2008 | Ashley Brown
Four Pictures to Explain: #2 27th Jan 2008 | Ashley Brown
Four Pictures to Explain: #4 Pre-optimisation Post-optimisation 27th Jan 2008 | Ashley Brown
Optimisation Technique • Remove features from the floating-point unit: • Operand alignment • Normalisation • Operand swap • If these were required, detect and fall-back to alternative solution: • Software-based on embedded/host processor • Hardware-based full implementation for larger designs 27th Jan 2008 | Ashley Brown
Optimisation Options 27th Jan 2008 | Ashley Brown
The stages of optimisation • Profile target application with training datasets • Source usually FORTRAN, C • Identify frequently-executed blocks • Check for good value-locality • Generate reduced-size floating point datapath • Reduced operand alignment hardware • Reduced normalisation hardware • Error checking: execute with additional datasets, check error rates 27th Jan 2008 | Ashley Brown
FloatWatch Profiler • Valgrind-based value profiler • Can return some metrics of interest here: • Floating point value ranges • Ratio of floating point operands • Each has uses for optimisation! 27th Jan 2008 | Ashley Brown
VFLOAT Library • VHDL variable-precision floating-point library • Initially developed by Belanovic at Northeastern, continued development under the supervision of Leeser • Allows basic customisation of precision, exponent bit widths • Further customisations added for our optimisations: • Operand alignment • Normalisation • Performance is lower than vendor-specific libraries 27th Jan 2008 | Ashley Brown
Data-path Generator • Takes user-selected data-path and generates VHDL implementation • Assembles modified version of the RPL library – customised to allow removal of various items • Builds hardware/software integration layer • C library for software • VHDL for hardware • Does not modify the software source automatically (yet) 27th Jan 2008 | Ashley Brown
Proof-of-Concept Testing • Original application modified to call C library (usually from FORTRAN) • Data sent to hardware, calculated, and returned • Software waits for response • No data-aggregation or hardware-side error detection occurs • Software layer performs same calculation for verification • Overall error rate reported 27th Jan 2008 | Ashley Brown
‘ydl_pij’ • ‘ydl_pij’ is an iterative solver for quantum mechanics, using the “Molecular Mechanics – Valence Bond” method • Datasets of various sizes available, allowing a variety of test cases be used • Initial profiling and testing use separate datasets 27th Jan 2008 | Ashley Brown
‘ydl_pij’: Profiling (Hot Code Section) Narrow value ranges 27th Jan 2008 | Ashley Brown
‘ydl_pij’: Identification • FloatWatch identifies the regions of code executing the most operations • In this case, these show narrow value ranges • Create optimised datapaths for testing • Maximum operand alignment reduced to 2n, where n is in the range [1, 6] • Normalisation hardware modified similarly 27th Jan 2008 | Ashley Brown
‘ydl_pij’ Error Rate Not profiled
‘ydl_pij’: Error Rate and Size • 20% size reduction with negligible re-execution rate (< 0.5%) • 27% size reduction with 3% re-execution rate • Size reduction permits ~40% increase parallelism due to better space usage
ydl_pij: Area saving for one F.P. adder/subtractor Pre-optimisation Post-optimisation 27th Jan 2008 | Ashley Brown
Coming Soon • Per-operation optimisations • Currently only at data-path level • Optimisation of operand-swap hardware • Per-operation exponent customisation (size, bias) • Performance evaluation using state-of-the-art FPGA accelerator hardware • Implementation of error detection and re-execution • Potential for even greater size reductions 27th Jan 2008 | Ashley Brown