1 / 31

Benchmarking FORTRAN / C / C++

Benchmarking FORTRAN / C / C++. For LINUX 2.0 (Debian and Red Hat), and Windows NT 4.0. C. Leggett. Compilers. Linux: Debian 2.0 g++: egcs-2.91.57 19980901 (egcs-1.1 release) KCC: 3.3c -- June 24, 1998 f77 g77: egcs-2.91.57 19980901 (egcs-1.1 release) Linux: Red Hat 5.1

inocencia
Download Presentation

Benchmarking FORTRAN / C / C++

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Benchmarking FORTRAN / C / C++ For LINUX 2.0 (Debian and Red Hat), and Windows NT 4.0 C. Leggett

  2. Compilers • Linux: Debian 2.0 • g++: egcs-2.91.57 19980901 (egcs-1.1 release) • KCC: 3.3c -- June 24, 1998 • f77 • g77: egcs-2.91.57 19980901 (egcs-1.1 release) • Linux: Red Hat 5.1 • g++: egcs-2.90.27 980315 (egcs-1.0.2 release) • g++: egcs-2.91.57 19980901 (egcs-1.1 release) • KCC: 3.3c -- June 24, 1998 • g77: egcs-2.90.27 980315 (egcs-1.0.2 release) • Windows NT 4.0 (sp 3) • Micro$loth Visual C++ v6.0

  3. Haney Kernels • Measures relative performance of FORTRAN, C, and C++ • C code is compiled by C++ compiler • 3 Kernels: • Complex Matrix Multiply • Use complex classes and operator overloading • Real Matrix Multiply • Use real matrix classes with storage management and indexing • Vector Operations • Use array classes and operator overloading

  4. Haney KernelsRed Hat (egcs 1.0)

  5. Haney KernelsDebian (egcs 1.1)

  6. HaneyKernels • FORTRAN is usually faster than C which is usually faster than C++ • g77 is faster than f77, which makes heavy use of f2c • Debian’s version of g++ (egcs-1.1) is more recent and considerably faster than the Red Hat 5.1 version (egcs-1.0.2) • The KAI compiler is not all its cracked up to be

  7. Bench++ Suite • Written by Joe Orost <joseph.orost@att.com> http://www.research.att.com/~orost/bench_plus_plus.html • Benchmarks price of various C++ features, and compares C/C++ performance • Incorporates many ‘Standard’ benchmarks such as Drystone, Whetstone, Hennessey, OOPACK, and Stepanov benchmarks

  8. Bench++ • Drystone • Whetstone • Hennesy benchmarks (11)

  9. Bench++Composite • Tracker (float) • Tracker (double) • Tracker (float + int) • Orbit • Kalman • Centroid

  10. Bench++Dynamic Allocation • malloc & free: 1000 ints • malloc & init & free: 1000 ints • new & delete: 1000 ints • new & init & delete: 1000 ints • alloca: 1000 ints (FAIL) • alloca & init: 1000 ints

  11. Bench++Exceptions • Local exception caught • class method exception caught • procedure exception caught: 3 deep • procedure exception caught: 4 deep • declared proc exception caught: 4 deep • proc exception caught: 4 deep re-thrown at each level • proc exception caught: implmnt using setjmp/longjmp

  12. Bench++Coding Style • boolean assignment • boolean if • 2-way if/else • 2-way switch • 10-way if/else • 10 way switch • 10 way sparse switch • 10 way virtual function call

  13. Bench++I/O Timing • iostream.getline: 20 char buffer • iostream.>>: 20 chars in loop • iostream.<<: 20 char buffer • iostream.<<: 20 chars in loop • istrstream.>>: int • istrstream.>>: float • fstream.open/fstream.close

  14. Bench++Machine Level Features • Packed bit arrays • unpacked bit arrays • packed bit ops in loop • unpacked bit ops in loop • int conversion • 10 float conversion • bit fields • bit fields and packed bit arrays • pack and unpack class objects

  15. Bench++Loop Overhead • “for” loop • “while” loop • infinite loop w/ break • 5-iteration loop

  16. Bench++Optimizer Performance • Constant propagation • local common sub-expression • global common sub-expression • unnecessary copy • code motion • induction variable • reduction in strength • dead code • loop jamming • redundant code • unreachable code • string ops

  17. Bench++Procedure Calls • procedure call: no args • procedure call: no args, catches exceptions • static class method call: no args, catches exceptions • inline procedure call: no args • static class method call: 1 int arg: catches exception • static class method call: 1 int *arg: catches exception • static class method call: 1 int &arg: catches exception • procedure call: no pars, called thru pointer, catch exception • procedure call: 10 int arg: catch exception • procedure call: 20 int arg: catch exception • procedure call: 10 (3-int) arg: catch exception • procedure call: 20 (3-int) arg; catch exception • class method call: 1 this arg: catch exception • virtual class method call: 1 this arg: catch exception • virtual const class method call: 1 this arg: catch exception • ibid, called in loop to check lookup optimization

  18. Bench++Abstraction • max: C++ style • max: C style • matrix: C++ style • matrix: C style • iterator: C++ style • iterator: C style • complex: C++ style • complex: C style • Stepanov C++ Abstraction

  19. Bench++ • float matrix multiply vs integer • double covariance matrix vs float • float & int covariance matrix vs float • new/delete vs malloc/free

  20. Bench++ • 4 deep exception handled vs 3 deep • declared exception handled vs not declared • 4 deep rethrown exception vs 4 deep • 4 deep setjmp/longjmp vs 4 deep exception

  21. Bench++ • if test vs logical equation • 2-way switch vs 2-way if/else • 10-way switch vs 10-way if/else • 10-way sparse switch vs 10-way if/else • 10-way sparse switch vs 10-way switch • 10-way virtual function vs 10-way switch

  22. Bench++ • 20-iostream.>> vs 20 char iostream.getline & gcount • 20-iostream.<< vs 20 char iostream.<< • istrstream.>> a float from local string vs int • boolean operations on bit arrays vs byte arrays • boolean operations on bits in loop vs bit arrays • boolean operations on bytes in loop vs byte arrays • while loop vs for loop • simple loop w/break vs for loop

  23. Bench++ • Constant Propagation • Local Common-sub • Global Common-sub • Unnecessary copy • Code Motion • Induction Variable • Reduction in Strength • Dead Code • Loop Jamming • Redundant Code • Unreachable Code Hand optimized vs compiler optimized (higher is better)

  24. Bench++

  25. Bench ++ • Static class method call vs local procedure call • Inline procedure call vs inlineable local procedure call • Static class method call w/ 1 int* par vs int par • Static class method call w/ 1 int& par vs int par • Call thru a procedure variable vs local procedure call • Static class method call w/ 10 int pars vs 1 int par • Static class method call w/ 20 int pars vs 1 int par • Static class method call w/ 10 3-int pars vs 10 1-int pars • Static class method call w/ 20 3-int pars vs 20 1-int pars • Class method call w/ this par vs static class method call w/ int par • Virtual class method call vs class method call • Virtual const class method call vs class method call • Loop of virtual const class method call vs no loop

  26. Bench++ • C++ style Max vs C style • C++ style Matrix vs C style • C++ style Iterator vs C style • C++ style Complex vs C style

  27. Bench++ • Stepanov abstraction level n(12 .. 1) vs level 0

  28. Bench++Stepanov Abstraction • Level 0: Use a simple Fortran-like loop • Level 1,3,4,5,9,11: use doubles • Level 2,4,6,7,10,12: use Double - double wrapped in a class • Level 1,2: use regular pointers • Level 3,4: use pointers wrapped in a class • Level 5,6: use pointers wrapped in a reverse-iterator adapter • Level 7,8: use wrapped pointers wrapped in a reverse-iterator adapter • Level 9,10: use pointers wrapped in a reverse- iterator adapter wrapped in a reverse-iterator adapter • Level 11,12 use wrapped pointers wrapped in a reverse iterator adapter wrapped in a reverse- iterator adapter

  29. Bench ++Some Conclusions • There is a significant increase in speed between versions 1.0 and 1.1 of egcs. • vC++ handles exceptions very well, g++ and KCC do not, though KCC does better at declared exceptions • alloca is much faster than new/delete. Too bad you can’t use it portably. • switch is much faster than if/else, but virtual functions don’t have much of a penalty • KCC does not do well with I/O (c/f g++), but reading characters as strings (iostream.getline) improves performance • KCC has trouble optimizing some simple loop structures • KCC handles procedure calls very badly, but shows a lower overhead than g++ or vC++ when large numbers of parameters are passed • C++ does not optimize such things as dead and redundant code, code motion, and local common sub-expressions on Linux platforms. • Abstraction has serious consequences, but KCC tends to handle complex class well.

  30. Observations on VC++ • Visual C++ 6.0 is not obviously superior. Neither is it incredibly inferior. Much like the other compilers, it does well in some areas, and poorly in others: • vC++ handles abstraction and procedure calls badly • vC++ handles exceptions well • vC++ does I/O well, except for opening and closing files • vC++ handles simple optimization well (ie constant propagation, common sub-expressions, redundant code, etc)

  31. Alternatives to Improving Compiler Optimization • Decrease the level of abstraction - write kernels in low level languages such as C or Fortran • Put C++ wrappers around low level kernels to retain some advantages of C++. Only useful for large chunks of code. • Use macros and templates cleverly. Details at: http://annwm.lbl.gov/~leggett/bench/

More Related