1 / 21

Profile-directed speculative optimization of reconfigurable floating point data paths

Profile-directed speculative optimization of reconfigurable floating point data paths. Workshop on Reconfigurable Computing at 2008 Ashley Brown, 27 th Jan 2007. Introduction. Computational science requires reproducible and accurate results IEEE-754 is a compromise

Download Presentation

Profile-directed speculative optimization of reconfigurable floating point data paths

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Profile-directed speculative optimization ofreconfigurable floating point data paths Workshop on Reconfigurable Computing at 2008 Ashley Brown, 27th Jan 2007 27th Jan 2008 | Ashley Brown

  2. Introduction • Computational science requires reproducible and accurate results • IEEE-754 is a compromise • Broad range of values • Many special cases • Idea: use profiling to reduce range and remove special cases • Generate floating-point data-paths for FPGAs which are smaller and faster • BUT KEEP RESULTS CONSISTENT WITH IEEE-754 27th Jan 2008 | Ashley Brown

  3. Advantages of Smaller Floating Point • Embedded Systems • Do the same work for a lower cost • Implement IEEE-754 compliant floating point where it may not have been possible before • High performance • Do more work with the same hardware • Increase in parallel execution on FPGAs • No need to sacrifice IEEE-754 compliance 27th Jan 2008 | Ashley Brown

  4. Four Pictures to Explain: #1 27th Jan 2008 | Ashley Brown

  5. Four Pictures to Explain: #2 27th Jan 2008 | Ashley Brown

  6. Four Pictures to Explain: #3

  7. Four Pictures to Explain: #4 Pre-optimisation Post-optimisation 27th Jan 2008 | Ashley Brown

  8. Optimisation Technique • Remove features from the floating-point unit: • Operand alignment • Normalisation • Operand swap • If these were required, detect and fall-back to alternative solution: • Software-based on embedded/host processor • Hardware-based full implementation for larger designs 27th Jan 2008 | Ashley Brown

  9. Optimisation Options 27th Jan 2008 | Ashley Brown

  10. The stages of optimisation • Profile target application with training datasets • Source usually FORTRAN, C • Identify frequently-executed blocks • Check for good value-locality • Generate reduced-size floating point datapath • Reduced operand alignment hardware • Reduced normalisation hardware • Error checking: execute with additional datasets, check error rates 27th Jan 2008 | Ashley Brown

  11. FloatWatch Profiler • Valgrind-based value profiler • Can return some metrics of interest here: • Floating point value ranges • Ratio of floating point operands • Each has uses for optimisation! 27th Jan 2008 | Ashley Brown

  12. VFLOAT Library • VHDL variable-precision floating-point library • Initially developed by Belanovic at Northeastern, continued development under the supervision of Leeser • Allows basic customisation of precision, exponent bit widths • Further customisations added for our optimisations: • Operand alignment • Normalisation • Performance is lower than vendor-specific libraries 27th Jan 2008 | Ashley Brown

  13. Data-path Generator • Takes user-selected data-path and generates VHDL implementation • Assembles modified version of the RPL library – customised to allow removal of various items • Builds hardware/software integration layer • C library for software • VHDL for hardware • Does not modify the software source automatically (yet) 27th Jan 2008 | Ashley Brown

  14. Proof-of-Concept Testing • Original application modified to call C library (usually from FORTRAN) • Data sent to hardware, calculated, and returned • Software waits for response • No data-aggregation or hardware-side error detection occurs • Software layer performs same calculation for verification • Overall error rate reported 27th Jan 2008 | Ashley Brown

  15. ‘ydl_pij’ • ‘ydl_pij’ is an iterative solver for quantum mechanics, using the “Molecular Mechanics – Valence Bond” method • Datasets of various sizes available, allowing a variety of test cases be used • Initial profiling and testing use separate datasets 27th Jan 2008 | Ashley Brown

  16. ‘ydl_pij’: Profiling (Hot Code Section) Narrow value ranges 27th Jan 2008 | Ashley Brown

  17. ‘ydl_pij’: Identification • FloatWatch identifies the regions of code executing the most operations • In this case, these show narrow value ranges • Create optimised datapaths for testing • Maximum operand alignment reduced to 2n, where n is in the range [1, 6] • Normalisation hardware modified similarly 27th Jan 2008 | Ashley Brown

  18. ‘ydl_pij’ Error Rate Not profiled

  19. ‘ydl_pij’: Error Rate and Size • 20% size reduction with negligible re-execution rate (< 0.5%) • 27% size reduction with 3% re-execution rate • Size reduction permits ~40% increase parallelism due to better space usage

  20. ydl_pij: Area saving for one F.P. adder/subtractor Pre-optimisation Post-optimisation 27th Jan 2008 | Ashley Brown

  21. Coming Soon • Per-operation optimisations • Currently only at data-path level • Optimisation of operand-swap hardware • Per-operation exponent customisation (size, bias) • Performance evaluation using state-of-the-art FPGA accelerator hardware • Implementation of error detection and re-execution • Potential for even greater size reductions 27th Jan 2008 | Ashley Brown

More Related