60 likes | 199 Views
Should We Dump Flop/s?. David H Bailey Lawrence Berkeley National Laboratory, USA This talk is available at: http://crd.lbl.gov/~dhbailey/dhbtalks/flops.pdf. Using Flop/s As A Metric for Performance. Advantages:
E N D
Should We Dump Flop/s? David H Bailey Lawrence Berkeley National Laboratory, USA This talk is available at: http://crd.lbl.gov/~dhbailey/dhbtalks/flops.pdf
Using Flop/s As A Metric for Performance Advantages: • Its usage is traditional and well-understood in the HPC community -- data is available for several decades of progress. • The flop count for a given algorithm or application is fairly well defined, although care has to be taken to avoid abuse -- i.e., we should base the flop count on the best practical serial algorithm. Disadvantages: • A focus on flop/s at the expense of other system parameters can lead to system designs that are poorly balanced for real workloads. • Using measured flop count (i.e. by a hardware performance monitor) may lead to perverse outcomes, such as inefficient algorithms that exhibit artificially high flop/s rates.
Using Mop/s as a Performance Metric Advantages: • A focus on memory operations per second in comparing systems may result in systems better suitedfor many real-world scientific computation. Disadvantages: • There is NO objective system-independent way to assess the mop count for a given algorithm or architecture. • A focus on mop/s at the expense of other system parameters can lead to system designs that are poorly balanced for real workloads. • Using measured memory operation counts (i.e. by a hardware performance monitor) may lead to perverse outcomes, such as grossly cache-inefficient algorithms that exhibit artificially high mop/s rates.
How Do We Define Mop Count for a Given Application? • The mop count is inextricably tried to the architecture. • Mop count can vary by a factor of 100 depending on how much cache is available. • Unit stride, constant-stride and random stride data are handled very differently from system to system. • Naive schemes to count mops for a given algorithm or implementation (ie number of flops performed x 3) reduce to using an inflated flop count as the metric. • One possibility: Using Erich Strohmaier’s APEX-map as the basis for the mop count -- it measures the distribution of the distance of one memory operation to the next. • But using APEX-map to perform these measurements is very expensive, and the resulting figure is highly one-dimensional.
Bottom Line: Don’t Dump Flop/s • There is NO intrinsic memory operation count for a given algorithm or architecture. • Mop/s, if anything, has significantly more potential for abuse than flop/s. • Perhaps in the future someone can devise an architecture-independent metric to assess the “work done” in a large scientific application. • Until then, flop/s is the best we have.