300 likes | 408 Views
Profiling floating point value ranges for reconfigurable implementation. Workshop on Reconfigurable Computing at 2007 Ashley Brown, 28 th Jan 2007. Floating Point on FPGAs. Two distinct sets of requirements Embedded systems (often as/alongside DSPs)
E N D
Profiling floating point value ranges for reconfigurable implementation Workshop on Reconfigurable Computing at 2007 Ashley Brown, 28th Jan 2007 28th Jan 2007 | Ashley Brown
Floating Point on FPGAs • Two distinct sets of requirements • Embedded systems (often as/alongside DSPs) • High precision often not important (video/audio processing) • Fixed point implementations possible • Scientific computation • High precision extremely important • Reduction in precision or conversion to single prec. must be done with great care 28th Jan 2007 | Ashley Brown
Our Focus • Scientific applications • MORPHY: “automated topological analysis of a molecular electron density” • ‘ydl_pij’ (MMVB): Iterative solver for computational chemistry • SPECFP95 benchmarks • Only mildly interesting – do not have multiple datasets • SPECFP2000 to follow 28th Jan 2007 | Ashley Brown
The Problem • D.P. floating point on FPGAs uses a lot of area • Density is improving: but still want to squeeze more in! • Re-using hardware can reduce concurrency • Scientific applications: typically 64-bit floating point • Often full precision is (believed to be) required • Is this really the case? • We have more options than single or double 28th Jan 2007 | Ashley Brown
Current Solutions for F.P. minimisation • Finding ‘minimal precision’: • Tools such as BitSize • Select precision for some operands, tool calculates the rest • Test vectors used to gauge errors • Reducing hardware area: • Replacing floating point by fixed point, transparent to user (Cheung et al.) • Solution above would make the scientists cry. • Any butchery of the floating point hardware must be justified and checked 28th Jan 2007 | Ashley Brown
FloatWatch • Valgrind-based value profiler • Can return a number of metrics: • Floating point value ranges • Variation between 32-bit and 64-bit F.P. executions • Difference in magnitude between F.P. operations • Each metric has uses for optimisation! 28th Jan 2007 | Ashley Brown
What does this tell us? • Alpha is constant (but could have found that from source) • Memory operands all fall within the same range • Result falls within the same range as memory operands • Intermediate values result in a shift in the range • Optimisation: we do not need double precision • A custom floating point format would suffice 28th Jan 2007 | Ashley Brown
FloatWatch • Operates on x86 binaries under Valgrind • x86 machine code converted to simplified SSA • FloatWatch inserts instrumentation code after floating point operations • SSA converted back to x86 and cached • Outputs a data file with selected metrics • Processing script produces HTML+JavaScript report 28th Jan 2007 | Ashley Brown
Report • Dynamic HTML interface • Copy HTML file from computing cluster to desktop, no installation required • Select/deselect source lines, SSA “instructions” • Dynamic in-page graph • Table for exporting to GNU-plot, Excel etc. • View value ranges at instruction, source line, function, file and application levels. 28th Jan 2007 | Ashley Brown
Optimisation Opportunities • Reduce floating point unit • Reduced precision • Restricted normalisation • Use an alternative representation • Non-standard floating point (e.g. 48-bit) • Fixed point • Dual fixed-point • Minimisation of redundancy • Remove denormal handling unless required • Remove or predict zero-value calculations 28th Jan 2007 | Ashley Brown
Reduce Hardware • Example using MORPHY • F.P. values are interesting • Most confined to a narrow range • Different data sets to not vary the range • Full range of double precision floating point not required • Reduce Exponent 28th Jan 2007 | Ashley Brown
Reduce Hardware – Alignment/Normalisation • Most expensive step: shifting for add/subtract • Operand alignment • Normalisation • Set limits on alignment to reduce hardware size • Trap to software to perform other alignments • Provisional results: only shift-by-4 required for some applications 28th Jan 2007 | Ashley Brown
Alternative Representations #1: Custom Floating Point • No need to use 64- or 32-bit • Use a compromise instead, maybe 48-bit is enough? IEEE Double 1 exp(11) mantissa(52) Custom 1 exp(9) mantissa(38) IEEE Single 1 exp(8) mantissa(23) • Maybe we can we drop the sign bit? 28th Jan 2007 | Ashley Brown
Alternative Representations #2: Fixed Point • For very narrow ranges, fixed point may be an option • Must be treated with extreme care • Dual fixed-point format provides another possibility • Two different formats: different fixed point positions • 1 bit reserved to switch between formats 28th Jan 2007 | Ashley Brown
“Pipeline Prediction” • Similar concept to branch prediction • Build a selection of pipelines with different performance characteristics • Slow but generic version • Fast version with limited range, reduced operand alignment • Compromise in between • Predict which version is best to use (how?) 28th Jan 2007 | Ashley Brown
True Reconfiguration – Temporal Profiling • Value ranges can vary for different application phases • Potential to reconfigure hardware as phases change • Test applications have not shown this behaviour so far • Small kernels only • Full applications would be expected to show this behaviour 28th Jan 2007 | Ashley Brown
Profiling Results – SPECFP95 ‘mgrid’ Operations producing zero Two ranges: similar shapes 28th Jan 2007 | Ashley Brown
Range Close-up 28th Jan 2007 | Ashley Brown
Profiling Results – SPECFP95 ‘swim’ Sawtooth caused by multiplication 28th Jan 2007 | Ashley Brown
‘swim’ Close-up 28th Jan 2007 | Ashley Brown
Profiling Results – MMVB As with MORPHY, ranges similar between datasets 28th Jan 2007 | Ashley Brown
Problems with this approach • No guarantees that values do not occur outside identified ranges • Not all applications will demonstrate behaviour similar to MORPHY • Value ranges could vary wildly with different datasets • Valgrind is slow 28th Jan 2007 | Ashley Brown
Future Work • State-based profiling: • profile functions based on call-stack • allows context-dependent configurations • Active simulation • Test new representations to check for rounding errors • Use results in practice • FPGA implementations for real applications • Modelling of large-scale deployments 28th Jan 2007 | Ashley Brown
Jezebel 1916 Dennis ‘N’ Type Fire Engine Royal College of Science Motor Club Imperial College Union, SW7 Any Questions? 28th Jan 2007 | Ashley Brown