Dynamic Floating-Point Cancellation Detection

Dynamic Floating-Point Cancellation Detection Michael O. Lam (Presenter) Jeffrey K. Hollingsworth G. W. Stewart University of Maryland, College Park

Background(Floating-Point Representation 101) • Floating-point represents real numbers as (± sig × 2exp) • Sign bit • Significand (“mantissa” or “fraction”) • Exponent • Floating-point numbers have finite binary precision • Single-precision: 24 binary digits (~7 decimal digits) • Double-precision: 53 binary digits (~16 decimal digits) • Examples: • π 3.141592… 11.0010010… • 1/10  0.1 0.0001100110… Image from Wikipedia (“Single precision”)

Motivation • Finite precision causes round-off error • Compromises ill-conditioned calculations • Hard to detect and diagnose • Increasingly important as HPC scales • Need to balance speed and accuracy • Lower precision is faster • Higher precision is more accurate • Industry-standard double precision may still fail on long-running computations

Previous Solutions • Analytical • Requires numerical analysis expertise • Conservative static error bounds are largely unhelpful • Ad-hoc • Run experiments at different precisions • Increase precision where necessary • Tedious and time-consuming

Instrumentation Solution • Automated (vs. manual) • Minimize developer effort • Ensure consistency and correctness • Binary-level (vs. source-level) • Include shared libraries without source code • Include compiler optimizations • Runtime (vs. compiletime) • Dataset and communication sensitivity

Solution Components • Dyninst-based instrumentation utility (“mutator”) • Cross-platform • No special hardware required • Stack walking and binary rewriting • Shared library with runtime analysis routines • Flexibility and ease of development • Java-based log viewer GUI • Cross-platform • Minimal development effort

Analysis Process • Run mutator • Find floating-point instructions • Insert calls to shared library • Run instrumented program • Executes analysis alongside original program • Stores results in a log file • View output with GUI

Analysis Types • Cancellation detection • Shadow-value analysis

Cancellation • Loss of significant digits during subtraction operations • Cancellation is a symptom, not the root problem • Indicates that a loss of information has occurred that may cause problems later 1.613647 (7) 1.613647 (7) - 1.613635 (7) - 1.613647 (7) 0.000012 (2) 0.000000 (0) (5 digits cancelled) (all digits cancelled) 1.6136473 - 1.6136467 0.0000006

Detecting Cancellation • For each addition/subtraction: • Extract value of each operand • Calculate result and compare magnitudes (binary exponents) • If eans < max(ex,ey) there is a cancellation • For each cancellation event: • Calculate “priority:” max(ex,ey) - eans • If above threshold, save event information to log • For some events, record operand values

Experiments • Gaussian elimination • Benefits of partial pivoting • Differing runtime behavior of popular algorithms

Gaussian Elimination A  [L,U] • Partial pivoting • Nominally to avoid division by zero • Also avoids inaccurate results from small pivots • This can be detected using cancellation swap

pivot loss of data cancellation

Gaussian Cancellation Cancellation Counts

Gaussian Elimination • This suggests that cancellation can be used to detect the effects of a small pivot • Useful in sparse elimination with limited ability to pivot • Threshold must be kept high enough

Gaussian Elimination A  [L,U] Classical Bordered

Classical Bordered Size of diagonal elements Iterations of algorithm

Gaussian Elimination • Classical method: many small cancellations • Bordered method: fewer but larger cancellations • Our tool can detect these differences and inform the developer, who can then make decisions regarding which algorithm to use

Other Results • Approximate nearest neighbor • More cancellations in denser point sets • SPEC benchmarks milc and lbm • Cancellations in error calculations indicate good results • SPEC benchmark povray • Cancellations indicate color black

Conclusions • It is important to vary the threshold • Most calculations have background cancellations • Small cancellations can hide large ones • Cancellation results require interpretation by someone who is familiar with the algorithm • Properly employed, cancellation detection can help find “trouble spots” in numerical codes

Ongoing Research • Shadow value analysis • Replace floating-point numbers with pointers to auxiliary information (higher precision, etc.) double x = 1.0; void func() { double y = 4.0; x = x + y; } printf(“%f”, x); 1.000 “shadow value” 4.000 5.000

Shadow Value Analysis • Current status: allows programmers to automatically test their entire program in different precisions • Next step: selectively instrument particular code blocks or data structures • Goal: automated floating-point analysis and recommendation framework

Thank you! • Code available upon request • Questions?

Classical Bordered Size of diagonal elements Iterations of algorithm

Gaussian Cancellation

Dynamic Floating-Point Cancellation Detection

Dynamic Floating-Point Cancellation Detection

Presentation Transcript

Floating Point

Floating point

Dynamic Floating-Point Error Detection

Floating Point

Floating Point

Floating Point

Floating Point

IA32 Floating Point

Floating Point

Floating Point

Floating point

Floating Point Representation

Floating Point

Floating point

Floating Point

Integer Arithmetic Floating Point Representation Floating Point Arithmetic

Floating Point

Floating Point Arithmetic

Floating Point

Floating Point

Floating Point