1 / 26

Binary-Level Tools for Floating-Point Correctness Analysis

Binary-Level Tools for Floating-Point Correctness Analysis. Michael Lam LLNL Summer Intern 2011 Bronis de Supinski , Mentor. Background. Floating-point represents real numbers as (± sgnf × 2 exp ) Sign bit Exponent Significand ( “ mantissa ” or “ fraction ” )

clark
Download Presentation

Binary-Level Tools for Floating-Point Correctness Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Binary-Level Tools forFloating-PointCorrectness Analysis Michael Lam LLNL Summer Intern 2011 Bronis de Supinski, Mentor

  2. Background • Floating-point represents real numbers as (± sgnf × 2exp) • Sign bit • Exponent • Significand (“mantissa” or “fraction”) • Floating-point numbers have finite precision • Single-precision: 24 bits (~7 decimal digits) • Double-precision: 53 bits (~16 decimal digits) 8 4 32 0 16 IEEE Single Exponent (8 bits) Significand (23 bits) 8 4 64 32 0 16 IEEE Double Exponent (11 bits) Significand (52 bits)

  3. Example π  3.141592… Single-precision Double-precision Images courtesy of BinaryConvert.com

  4. Example 1/10 0.1 Single-precision Double-precision Images courtesy of BinaryConvert.com

  5. Motivation • Finite precision causes round-off error • Compromises ill-conditioned calculations • Hard to detect and diagnose • Increasingly important as HPC scales • Need to balance speed (singles) and accuracy (doubles) • Double-precision may still fail on long-running computations

  6. Previous Solutions • Analytical (Wilkinson, et al.) • Requires numerical analysis expertise • Conservative static error bounds are largely unhelpful • Ad-hoc • Run experiments at different precisions • Increase precision where necessary • Tedious and time-consuming

  7. Our Approach • Run Dyninst-based mutator • Find floating-point instructions • Insert new code or a call to shared library • Run instrumented program • Analysis augments/replaces original program • Store results in a log file • View output with GUI

  8. Advantages • Automated (vs. manual) • Minimize developer effort • Ensure consistency and correctness • Binary-level (vs. source-level) • Include shared libraries without source code • Include compiler optimizations • Runtime (vs. compile time) • Dataset and communication sensitivity

  9. Previous Work • Cancellation detection • Logs numerical cancellation of binary digits • Alternate-precision analysis • Simulates re-compiling with different precision

  10. SummerContributions • Cancellation detection • Improved support for multi-core analysis • Overflow detection • New tool for logging integer overflow • Possibilities for extension and incorporation into floating-point analysis • Alternate-precision analysis • New “in-place” analysis • Much-improved performance and robustness

  11. Cancellation • Loss of significant digits during subtraction operations • Cancellation is a symptom, not the root problem • Indicates that a loss of information has occurred that may cause problems later 1.613647 (7) 1.613647 (7) - 1.613635 (7) - 1.613647 (7) 0.000012 (2) 0.000000 (0) (5 digits cancelled) (all digits cancelled) 1.6136473 - 1.6136467 0.0000006

  12. Cancellation Detector • Instrument every addition and subtraction • Simple exponent-based test for cancellation • Log the results to an output file

  13. Contributions • Better support for multi-core • Log to multiple files • Future work: exploring GUI aggregation schemes • Ran experiments on AMG2006

  14. Contributions • New proof-of-concept tool • Instruments all instructions that set OF (the overflow flag) • Log instruction pointer to output • Works on integer instructions • Introduces ~10x overhead • Future work • Pruning false positives • Overflow/underflow detection on floating-point instructions • NaN/Inf detection on floating-point instructions

  15. Alternate-precision Analysis • Previous approach • Replace floating-point values with a pointer • “Shadow” values allocated on heap • Disadvantages • Major change in program semantics (copying vs. aliasing) • Lots of pointer-related bugs • Required function calls and use of a garbage collector • Large performance impact (>200-300x) • Increased memory usage (>1.5x)

  16. Contributions • New shadow-value analysis scheme • Narrowed focus: doubles  singles • In-place downcast conversion (no heap allocations) • Flag in the high bits to indicate replacement 8 4 64 32 0 16 Double downcast conversion 8 4 64 32 0 16 Replaced Double 7 F F 4 D E A D Non-signalling NaN 8 4 32 0 16 Single

  17. Contributions • Simpler analysis • Instrument instructions w/ double-precision operands • Check and replace operands • Replace double-precision opcodes • Fix up flags if necessary • Streamlined instrumentation • Insert “binary blobs” of optimized machine code • Pre-generated by mini-assembler inside mutator • Prevents overhead of added function calls • No memory overhead

  18. Example gvec[i,j] = gvec[i,j] * lvec[3] + gvar 1 movsd 0x601e38(%rax, %rbx, 8)  %xmm0 2 mulsd -0x78(%rsp)  %xmm0 3 addsd -0x4f02(%rip)  %xmm0 4 movsd %xmm0 0x601e38(%rax, %rbx, 8)

  19. Example gvec[i,j] = gvec[i,j] * lvec[3] + gvar 1 movsd 0x601e38(%rax, %rbx, 8)  %xmm0 check/replace -0x78(%rsp) and %xmm0 2 mulss -0x78(%rsp)  %xmm0 check/replace -0x4f02(%rip) and %xmm0 3 addss -0x20dd43(%rip)  %xmm0 4 movsd %xmm0 0x601e38(%rax, %rbx, 8)

  20. Challenges • Currently handled • %rip- and %rsp-relative addressing • %rflags preservation • Math functions from libm • Bitwise operations (AND/OR/XOR/BTC) • Size and type conversions • Compiler optimization levels • Packed instructions 0 32 128 64 XMM register IEEE Single IEEE Single IEEE Single IEEE Single IEEE Double IEEE Double downcast conversion downcast conversion 0x7FF4DEAD IEEE Single 0x7FF4DEAD IEEE Single

  21. Challenges • Future work • 80-bit “long double” precision • 16-bit IEEE half-precision • 128-bit IEEE quad-precision • Width-dependent random number generation • Non-gcc compilers • Arcane floating-point hacks • Sqrt: (1<<29) + (tmp >> 1) - (1<<22) • Fast InvSqrt: 0x5f3759df – (val >> 1)

  22. Results • Runs correctly on Sequoia kernels and other examples: AMGmk 4x CrystalMk 4x IRSmk 7x UMTmk 3x LULESH 4x • “Real” code with manageable overhead • Future work: more optimization • Future work: run on full benchmarks

  23. Conclusion • Cancellation detection • Improved support for multi-core analysis • Overflow detection • New tool for logging integer overflow • Possibilities for extension and incorporation into floating-point analysis • Alternate-precision analysis • New “in-place” analysis • Much-improved performance and robustness

  24. Future Goals • Selective analysis • Data-centric (variablesor matrices) • Control-centric (basic blocksor functions) • Analysis search space • Minimize precision • Maximize accuracy • Goal: Tool for automated floating-point precision analysis and recommendation

  25. Acknowledgements Jeff Hollingsworth, University of Maryland (Advisor) Bronis de Supinski, LLNL (Mentor) Tony Baylis, LLNL (Supervisor) Barry Rountree, LLNL Matt Legendre, LLNL Greg Lee, LLNL Dong Ahn, LLNL Thank you!

  26. Bitfield Templates 8 4 64 32 0 16 8 4 64 32 0 16 Double 8 4 32 0 16 Single 0 32 128 64 XMM register IEEE Single IEEE Single IEEE Single IEEE Single IEEE Double IEEE Double downcast conversion downcast conversion 0x7FF4DEAD IEEE Single 0x7FF4DEAD IEEE Single

More Related