130 likes | 232 Views
Compiling and Using the “best” R. Vipin Sachdeva IBM Computational Science Division. Improving R performance. Performance improvements: Hardware (Number of cores etc.) Intel quad-core @2.4 Ghz Intel Q6600 Compilers Intel versus GNU Compiler flags (unoptimized versus optimized)
E N D
Compiling and Using the “best” R Vipin Sachdeva IBM Computational Science Division
Improving R performance • Performance improvements: • Hardware (Number of cores etc.) • Intel quad-core @2.4 Ghz Intel Q6600 • Compilers • Intel versus GNU • Compiler flags (unoptimized versus optimized) • Libraries (BLAS) • netlib BLAS, GotoBLAS2, Intel MKL, Intel MKL-SMP
Benchmark for R • R-benchmark-25.R • http://r.research.att.com/benchmarks/R-benchmark-25.R • Measures timings for • B= A’ *A, • C = A/B’ • Eigenvalues, Determinant, Cholesky, Inverse (BLAS) • Needs SuppDists package • ./Rscript --vanilla R-benchmark-25.R
Base R • ./configure –prefix=/home/vsachde/R-install Source directory: . Installation directory: /home/vsachde/R-project/all-R/GNU-R/R-native-unoptimized C compiler: gcc -std=gnu99 -g -O2 Fortran 77 compiler: gfortran -g -O C++ compiler: g++ -g -O2 Fortran 90/95 compiler: gfortran -g -O Obj-C compiler: Interfaces supported: X11, tcltk External libraries: readline Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: static R library, shared BLAS, R profiling, Java Recommended packages: yes Compiler flags GNU Compilers External libraries being used
Somewhat Optimized R • export optim_flags=“-O3 -funroll-loops -ffast-math -march=core2” • CC="gcc" CFLAGS=$optim_flags CXX="g++" CXXFLAGS=$optim_flags F77="gfortran" FFLAGS=$optim_flags FC="gfortran" FCFLAGS=$optim_flags ./configure –prefix=$installdir C compiler: gcc -std=gnu99 -O3 -funroll-loops -ffast-math -march=core2 Fortran 77 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 C++ compiler: g++ -O3 -funroll-loops -ffast-math -march=core2 Fortran 90/95 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 • Compilers can be changed by variables CC, CXX, F77 • CC=icc CXX=icpc F77=ifort will use Intel compilers.
Linking external BLAS with R • R uses unoptimized routines to do linear algebra if not linked with external BLAS. • ./configure –-with-blas=<location of BLAS lib> • Various sources of BLAS • Netlib BLAS - Generic and unoptimized • GotoBLAS2 – Optimized and multi-threaded • Intel MKL – Optimized library from Intel (sequential) • Intel MKL-SMP (Multi-threaded) • Many others including ACML, Atlas. • Performance of kernels change on different libraries used. Tries to link the BLAS library
Linking external BLAS with R • If everything goes well: Source directory: . Installation directory: /home/vsachde/R-project/all-R/GNU-R/R-netlib-blas C compiler: gcc -std=gnu99 -O3 -funroll-loops -ffast-math -march=core2 Fortran 77 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 C++ compiler: g++ -O3 -funroll-loops -ffast-math -march=core2 Fortran 90/95 compiler: gfortran -O3 -funroll-loops -ffast-math -march=core2 Obj-C compiler: Interfaces supported: X11, tcltk External libraries: readline, BLAS(generic) Additional capabilities: PNG, JPEG, TIFF, NLS, cairo Options enabled: static R library, R profiling, Java Recommended packages: yes BLAS was linked in properly
Linking external BLAS with R • What does –-with-blas do ? • Link and run R with dgemm. configure:28567: checking for dgemm_ in /home/vsachde/R-project/all-blas/GNU-blas/netlib-blas/libblas_GNU.a configure:28588: gcc -std=gnu99 -o conftest -g -O2 -I/usr/local/include -L/usr/local/lib64 conftest.c /home/vsachde/R-project/all-blas/GNU-blas/netlib-blas/libblas_GNU.a -lgfortran -lm -ldl -lm >&5 configure:28595: result: yes • If the above linking step fails • Installation won’t fail, but BLAS will not be linked in. • Summary at end won’t show external BLAS linking. • Search for dgemm in config.log and look for errors. • Advice: Compile static libraries as they are easier to link
Linking with different BLAS • Netlib-BLAS • Download source from netlib.org, unoptimized. • GotoBLAS2 • Download from TACC website • Optimized and multi-threaded • Turn off CPU throttling to compile. • Intel MKL • Sequential and SMP • Linking step is same for most BLASes except Intel libs
Linking with Intel MKL libs • export MKLPATH=/opt/intel/Compiler/11.1/072/mkl/lib/em64t/ • Intel MKL sequential: --with-blas="-Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_sequential.a $MKLPATH/libmkl_core.a -Wl,--end-group -lpthread“ • Intel MKL SMP --with-blas="-Wl,--start-group $MKLPATH/libmkl_intel_lp64.a $MKLPATH/libmkl_intel_thread.a $MKLPATH/libmkl_core.a -Wl,--end-group -liomp5 -lpthread" Intel MKL SMP and GotoBLAS2 should show performance improvements in quad-core (run 4 threads)
Performance –BLAS Performance went down by 15-20X through compilers, compiler options and hardware (4 threads) Revolution R uses Intel MKL-SMP
Results • Generic R can be optimized for performance. • Intel MKL libraries give best performance results with freely available GotoBLAS2 a close second. • Experiment with LAPACK as well. • Question: How much is performance important for R users ?