1 / 13

Compiling and Using the “best” R

Compiling and Using the “best” R. Vipin Sachdeva IBM Computational Science Division. Improving R performance. Performance improvements: Hardware (Number of cores etc.) Intel quad-core @2.4 Ghz Intel Q6600 Compilers Intel versus GNU Compiler flags (unoptimized versus optimized)

nay
Download Presentation

Compiling and Using the “best” R

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compiling and Using the “best” R Vipin Sachdeva IBM Computational Science Division

  2. Improving R performance • Performance improvements: • Hardware (Number of cores etc.) • Intel quad-core @2.4 Ghz Intel Q6600 • Compilers • Intel versus GNU • Compiler flags (unoptimized versus optimized) • Libraries (BLAS) • netlib BLAS, GotoBLAS2, Intel MKL, Intel MKL-SMP

  3. Benchmark for R • R-benchmark-25.R • http://r.research.att.com/benchmarks/R-benchmark-25.R • Measures timings for • B= A’ *A, • C = A/B’ • Eigenvalues, Determinant, Cholesky, Inverse (BLAS) • Needs SuppDists package • ./Rscript --vanilla R-benchmark-25.R

  4. Base R • ./configure –prefix=/home/vsachde/R-install Source directory: . Installation directory: /home/vsachde/R-project/all-R/GNU-R/R-native-unoptimized C compiler: gcc -std=gnu99 -g -O2 Fortran 77 compiler: gfortran -g -O C++ compiler: g++ -g -O2 Fortran 90/95 compiler: gfortran -g -O Obj-C compiler: Interfaces supported: X11, tcltk External libraries: readline Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: static R library, shared BLAS, R profiling, Java Recommended packages: yes Compiler flags GNU Compilers External libraries being used

  5. Somewhat Optimized R • export optim_flags=“-O3 -funroll-loops -ffast-math -march=core2” • CC="gcc" CFLAGS=$optim_flags CXX="g++" CXXFLAGS=$optim_flags F77="gfortran" FFLAGS=$optim_flags FC="gfortran" FCFLAGS=$optim_flags ./configure –prefix=$installdir C compiler: gcc -std=gnu99 -O3 -funroll-loops -ffast-math -march=core2 Fortran 77 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 C++ compiler: g++ -O3 -funroll-loops -ffast-math -march=core2 Fortran 90/95 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 • Compilers can be changed by variables CC, CXX, F77 • CC=icc CXX=icpc F77=ifort will use Intel compilers.

  6. Linking external BLAS with R • R uses unoptimized routines to do linear algebra if not linked with external BLAS. • ./configure –-with-blas=<location of BLAS lib> • Various sources of BLAS • Netlib BLAS - Generic and unoptimized • GotoBLAS2 – Optimized and multi-threaded • Intel MKL – Optimized library from Intel (sequential) • Intel MKL-SMP (Multi-threaded) • Many others including ACML, Atlas. • Performance of kernels change on different libraries used. Tries to link the BLAS library

  7. Linking external BLAS with R • If everything goes well: Source directory: . Installation directory: /home/vsachde/R-project/all-R/GNU-R/R-netlib-blas C compiler: gcc -std=gnu99 -O3 -funroll-loops -ffast-math -march=core2 Fortran 77 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 C++ compiler: g++ -O3 -funroll-loops -ffast-math -march=core2 Fortran 90/95 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 Obj-C compiler: Interfaces supported: X11, tcltk External libraries: readline, BLAS(generic) Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: static R library, R profiling, Java Recommended packages: yes BLAS was linked in properly

  8. Linking external BLAS with R • What does –-with-blas do ? • Link and run R with dgemm. configure:28567: checking for dgemm_ in /home/vsachde/R-project/all-blas/GNU-blas/netlib-blas/libblas_GNU.a configure:28588: gcc -std=gnu99 -o conftest -g -O2 -I/usr/local/include -L/usr/local/lib64 conftest.c /home/vsachde/R-project/all-blas/GNU-blas/netlib-blas/libblas_GNU.a -lgfortran -lm -ldl -lm >&5 configure:28595: result: yes • If the above linking step fails • Installation won’t fail, but BLAS will not be linked in. • Summary at end won’t show external BLAS linking. • Search for dgemm in config.log and look for errors. • Advice: Compile static libraries as they are easier to link

  9. Linking with different BLAS • Netlib-BLAS • Download source from netlib.org, unoptimized. • GotoBLAS2 • Download from TACC website • Optimized and multi-threaded • Turn off CPU throttling to compile. • Intel MKL • Sequential and SMP • Linking step is same for most BLASes except Intel libs

  10. Linking with Intel MKL libs • export MKLPATH=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/ • Intel MKL sequential: --with-blas="-Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -lpthread“ • Intel MKL SMP --with-blas="-Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread" Intel MKL SMP and GotoBLAS2 should show performance improvements in quad-core (run 4 threads)

  11. Performance – Single-thread BLAS

  12. Performance –BLAS Performance went down by 15-20X through compilers, compiler options and hardware (4 threads) Revolution R uses Intel MKL-SMP

  13. Results • Generic R can be optimized for performance. • Intel MKL libraries give best performance results with freely available GotoBLAS2 a close second. • Experiment with LAPACK as well. • Question: How much is performance important for R users ?

More Related