270 likes | 386 Views
Dynamic Floating-Point Cancellation Detection. Michael O. Lam (Presenter) Jeffrey K. Hollingsworth G. W. Stewart University of Maryland, College Park. Background ( Floating-Point Representation 101). Floating-point represents real numbers as (± sig × 2 exp ) Sign bit
E N D
Dynamic Floating-Point Cancellation Detection Michael O. Lam (Presenter) Jeffrey K. Hollingsworth G. W. Stewart University of Maryland, College Park
Background(Floating-Point Representation 101) • Floating-point represents real numbers as (± sig × 2exp) • Sign bit • Significand (“mantissa” or “fraction”) • Exponent • Floating-point numbers have finite binary precision • Single-precision: 24 binary digits (~7 decimal digits) • Double-precision: 53 binary digits (~16 decimal digits) • Examples: • π 3.141592… 11.0010010… • 1/10 0.1 0.0001100110… Image from Wikipedia (“Single precision”)
Motivation • Finite precision causes round-off error • Compromises ill-conditioned calculations • Hard to detect and diagnose • Increasingly important as HPC scales • Need to balance speed and accuracy • Lower precision is faster • Higher precision is more accurate • Industry-standard double precision may still fail on long-running computations
Previous Solutions • Analytical • Requires numerical analysis expertise • Conservative static error bounds are largely unhelpful • Ad-hoc • Run experiments at different precisions • Increase precision where necessary • Tedious and time-consuming
Instrumentation Solution • Automated (vs. manual) • Minimize developer effort • Ensure consistency and correctness • Binary-level (vs. source-level) • Include shared libraries without source code • Include compiler optimizations • Runtime (vs. compiletime) • Dataset and communication sensitivity
Solution Components • Dyninst-based instrumentation utility (“mutator”) • Cross-platform • No special hardware required • Stack walking and binary rewriting • Shared library with runtime analysis routines • Flexibility and ease of development • Java-based log viewer GUI • Cross-platform • Minimal development effort
Analysis Process • Run mutator • Find floating-point instructions • Insert calls to shared library • Run instrumented program • Executes analysis alongside original program • Stores results in a log file • View output with GUI
Analysis Types • Cancellation detection • Shadow-value analysis
Cancellation • Loss of significant digits during subtraction operations • Cancellation is a symptom, not the root problem • Indicates that a loss of information has occurred that may cause problems later 1.613647 (7) 1.613647 (7) - 1.613635 (7) - 1.613647 (7) 0.000012 (2) 0.000000 (0) (5 digits cancelled) (all digits cancelled) 1.6136473 - 1.6136467 0.0000006
Detecting Cancellation • For each addition/subtraction: • Extract value of each operand • Calculate result and compare magnitudes (binary exponents) • If eans < max(ex,ey) there is a cancellation • For each cancellation event: • Calculate “priority:” max(ex,ey) - eans • If above threshold, save event information to log • For some events, record operand values
Experiments • Gaussian elimination • Benefits of partial pivoting • Differing runtime behavior of popular algorithms
Gaussian Elimination A [L,U] • Partial pivoting • Nominally to avoid division by zero • Also avoids inaccurate results from small pivots • This can be detected using cancellation swap
pivot loss of data cancellation
Gaussian Cancellation Cancellation Counts
Gaussian Elimination • This suggests that cancellation can be used to detect the effects of a small pivot • Useful in sparse elimination with limited ability to pivot • Threshold must be kept high enough
Gaussian Elimination A [L,U] Classical Bordered
Classical Bordered Size of diagonal elements Iterations of algorithm
Gaussian Elimination • Classical method: many small cancellations • Bordered method: fewer but larger cancellations • Our tool can detect these differences and inform the developer, who can then make decisions regarding which algorithm to use
Other Results • Approximate nearest neighbor • More cancellations in denser point sets • SPEC benchmarks milc and lbm • Cancellations in error calculations indicate good results • SPEC benchmark povray • Cancellations indicate color black
Conclusions • It is important to vary the threshold • Most calculations have background cancellations • Small cancellations can hide large ones • Cancellation results require interpretation by someone who is familiar with the algorithm • Properly employed, cancellation detection can help find “trouble spots” in numerical codes
Ongoing Research • Shadow value analysis • Replace floating-point numbers with pointers to auxiliary information (higher precision, etc.) double x = 1.0; void func() { double y = 4.0; x = x + y; } printf(“%f”, x); 1.000 “shadow value” 4.000 5.000
Shadow Value Analysis • Current status: allows programmers to automatically test their entire program in different precisions • Next step: selectively instrument particular code blocks or data structures • Goal: automated floating-point analysis and recommendation framework
Thank you! • Code available upon request • Questions?
Classical Bordered Size of diagonal elements Iterations of algorithm