Benchmarking FORTRAN / C / C++

Benchmarking FORTRAN / C / C++ For LINUX 2.0 (Debian and Red Hat), and Windows NT 4.0 C. Leggett

Compilers • Linux: Debian 2.0 • g++: egcs-2.91.57 19980901 (egcs-1.1 release) • KCC: 3.3c -- June 24, 1998 • f77 • g77: egcs-2.91.57 19980901 (egcs-1.1 release) • Linux: Red Hat 5.1 • g++: egcs-2.90.27 980315 (egcs-1.0.2 release) • g++: egcs-2.91.57 19980901 (egcs-1.1 release) • KCC: 3.3c -- June 24, 1998 • g77: egcs-2.90.27 980315 (egcs-1.0.2 release) • Windows NT 4.0 (sp 3) • Micro$loth Visual C++ v6.0

Haney Kernels • Measures relative performance of FORTRAN, C, and C++ • C code is compiled by C++ compiler • 3 Kernels: • Complex Matrix Multiply • Use complex classes and operator overloading • Real Matrix Multiply • Use real matrix classes with storage management and indexing • Vector Operations • Use array classes and operator overloading

Haney KernelsRed Hat (egcs 1.0)

Haney KernelsDebian (egcs 1.1)

HaneyKernels • FORTRAN is usually faster than C which is usually faster than C++ • g77 is faster than f77, which makes heavy use of f2c • Debian’s version of g++ (egcs-1.1) is more recent and considerably faster than the Red Hat 5.1 version (egcs-1.0.2) • The KAI compiler is not all its cracked up to be

Bench++ Suite • Written by Joe Orost <joseph.orost@att.com> http://www.research.att.com/~orost/bench_plus_plus.html • Benchmarks price of various C++ features, and compares C/C++ performance • Incorporates many ‘Standard’ benchmarks such as Drystone, Whetstone, Hennessey, OOPACK, and Stepanov benchmarks

Bench++ • Drystone • Whetstone • Hennesy benchmarks (11)

Bench++Composite • Tracker (float) • Tracker (double) • Tracker (float + int) • Orbit • Kalman • Centroid

Bench++Dynamic Allocation • malloc & free: 1000 ints • malloc & init & free: 1000 ints • new & delete: 1000 ints • new & init & delete: 1000 ints • alloca: 1000 ints (FAIL) • alloca & init: 1000 ints

Bench++Exceptions • Local exception caught • class method exception caught • procedure exception caught: 3 deep • procedure exception caught: 4 deep • declared proc exception caught: 4 deep • proc exception caught: 4 deep re-thrown at each level • proc exception caught: implmnt using setjmp/longjmp

Bench++Coding Style • boolean assignment • boolean if • 2-way if/else • 2-way switch • 10-way if/else • 10 way switch • 10 way sparse switch • 10 way virtual function call

Bench++I/O Timing • iostream.getline: 20 char buffer • iostream.>>: 20 chars in loop • iostream.<<: 20 char buffer • iostream.<<: 20 chars in loop • istrstream.>>: int • istrstream.>>: float • fstream.open/fstream.close

Bench++Machine Level Features • Packed bit arrays • unpacked bit arrays • packed bit ops in loop • unpacked bit ops in loop • int conversion • 10 float conversion • bit fields • bit fields and packed bit arrays • pack and unpack class objects

Bench++Loop Overhead • “for” loop • “while” loop • infinite loop w/ break • 5-iteration loop

Bench++Optimizer Performance • Constant propagation • local common sub-expression • global common sub-expression • unnecessary copy • code motion • induction variable • reduction in strength • dead code • loop jamming • redundant code • unreachable code • string ops

Bench++Procedure Calls • procedure call: no args • procedure call: no args, catches exceptions • static class method call: no args, catches exceptions • inline procedure call: no args • static class method call: 1 int arg: catches exception • static class method call: 1 int *arg: catches exception • static class method call: 1 int &arg: catches exception • procedure call: no pars, called thru pointer, catch exception • procedure call: 10 int arg: catch exception • procedure call: 20 int arg: catch exception • procedure call: 10 (3-int) arg: catch exception • procedure call: 20 (3-int) arg; catch exception • class method call: 1 this arg: catch exception • virtual class method call: 1 this arg: catch exception • virtual const class method call: 1 this arg: catch exception • ibid, called in loop to check lookup optimization

Bench++Abstraction • max: C++ style • max: C style • matrix: C++ style • matrix: C style • iterator: C++ style • iterator: C style • complex: C++ style • complex: C style • Stepanov C++ Abstraction

Bench++ • float matrix multiply vs integer • double covariance matrix vs float • float & int covariance matrix vs float • new/delete vs malloc/free

Bench++ • 4 deep exception handled vs 3 deep • declared exception handled vs not declared • 4 deep rethrown exception vs 4 deep • 4 deep setjmp/longjmp vs 4 deep exception

Bench++ • if test vs logical equation • 2-way switch vs 2-way if/else • 10-way switch vs 10-way if/else • 10-way sparse switch vs 10-way if/else • 10-way sparse switch vs 10-way switch • 10-way virtual function vs 10-way switch

Bench++ • 20-iostream.>> vs 20 char iostream.getline & gcount • 20-iostream.<< vs 20 char iostream.<< • istrstream.>> a float from local string vs int • boolean operations on bit arrays vs byte arrays • boolean operations on bits in loop vs bit arrays • boolean operations on bytes in loop vs byte arrays • while loop vs for loop • simple loop w/break vs for loop

Bench++ • Constant Propagation • Local Common-sub • Global Common-sub • Unnecessary copy • Code Motion • Induction Variable • Reduction in Strength • Dead Code • Loop Jamming • Redundant Code • Unreachable Code Hand optimized vs compiler optimized (higher is better)

Bench++

Bench ++ • Static class method call vs local procedure call • Inline procedure call vs inlineable local procedure call • Static class method call w/ 1 int* par vs int par • Static class method call w/ 1 int& par vs int par • Call thru a procedure variable vs local procedure call • Static class method call w/ 10 int pars vs 1 int par • Static class method call w/ 20 int pars vs 1 int par • Static class method call w/ 10 3-int pars vs 10 1-int pars • Static class method call w/ 20 3-int pars vs 20 1-int pars • Class method call w/ this par vs static class method call w/ int par • Virtual class method call vs class method call • Virtual const class method call vs class method call • Loop of virtual const class method call vs no loop

Bench++ • C++ style Max vs C style • C++ style Matrix vs C style • C++ style Iterator vs C style • C++ style Complex vs C style

Bench++ • Stepanov abstraction level n(12 .. 1) vs level 0

Bench++Stepanov Abstraction • Level 0: Use a simple Fortran-like loop • Level 1,3,4,5,9,11: use doubles • Level 2,4,6,7,10,12: use Double - double wrapped in a class • Level 1,2: use regular pointers • Level 3,4: use pointers wrapped in a class • Level 5,6: use pointers wrapped in a reverse-iterator adapter • Level 7,8: use wrapped pointers wrapped in a reverse-iterator adapter • Level 9,10: use pointers wrapped in a reverse- iterator adapter wrapped in a reverse-iterator adapter • Level 11,12 use wrapped pointers wrapped in a reverse iterator adapter wrapped in a reverse- iterator adapter

Bench ++Some Conclusions • There is a significant increase in speed between versions 1.0 and 1.1 of egcs. • vC++ handles exceptions very well, g++ and KCC do not, though KCC does better at declared exceptions • alloca is much faster than new/delete. Too bad you can’t use it portably. • switch is much faster than if/else, but virtual functions don’t have much of a penalty • KCC does not do well with I/O (c/f g++), but reading characters as strings (iostream.getline) improves performance • KCC has trouble optimizing some simple loop structures • KCC handles procedure calls very badly, but shows a lower overhead than g++ or vC++ when large numbers of parameters are passed • C++ does not optimize such things as dead and redundant code, code motion, and local common sub-expressions on Linux platforms. • Abstraction has serious consequences, but KCC tends to handle complex class well.

Observations on VC++ • Visual C++ 6.0 is not obviously superior. Neither is it incredibly inferior. Much like the other compilers, it does well in some areas, and poorly in others: • vC++ handles abstraction and procedure calls badly • vC++ handles exceptions well • vC++ does I/O well, except for opening and closing files • vC++ handles simple optimization well (ie constant propagation, common sub-expressions, redundant code, etc)

Alternatives to Improving Compiler Optimization • Decrease the level of abstraction - write kernels in low level languages such as C or Fortran • Put C++ wrappers around low level kernels to retain some advantages of C++. Only useful for large chunks of code. • Use macros and templates cleverly. Details at: http://annwm.lbl.gov/~leggett/bench/

Benchmarking FORTRAN / C / C++

Benchmarking FORTRAN / C / C++

Presentation Transcript

Fortran

FORTRAN

Introduction to Fortran and Fortran Compiling

FORTRAN

FORTRAN

FORTRAN: Exercises

Fortran

FORTRAN 77

FORTRAN

FORTRAN Essentials

Fortran

Fortran 77

FORTRAN

Fortran

FORTRAN

Tutorial : Fortran

FORTRAN

Fortran 2003

FORTRAN

FORTRAN