120 likes | 375 Views
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK , PLASMA, MAGMA. Shirley Moore svmoore@utep.edu CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012. Learning Objectives. After completing this lesson, you should be able to
E N D
Linear Algebra Libraries: BLAS, LAPACK, ScaLAPACK, PLASMA, MAGMA Shirley Moore svmoore@utep.edu CPS5401 Fall 2013 svmoore.pbworks.com November 12, 2012
Learning Objectives • After completing this lesson, you should be able to • List and describe advantages of using linear algebra libraries • List types of computations performed by linear algebra libraries • Describe functionality of the BLAS • Locate and use documentation on linear algebra libraries for your platform • Insert calls to linear algebra library routines into your program and compile and run the resulting program • Describe current research on numerical linear algebra for multicore and heterogeneous architectures
Numerical Linear Algebra • Algorithms for performing matrix operations on computers • Widely used in scientific, engineering, and financial applications • Fundamental algorithms • Basic matrix and vector operations • LU decomposition • QR decomposition • Singular value decomposition • Eigenvalues
BLAS • Basic Linear Algebra Subprograms • De facto standard (all implementations use the same calling interface) • First published in 1979 • http://www.netlib.org/blas/ • BLA Quick Reference Guide: http://www.netlib.org/lapack/lug/node145.html • Tuned versions implemented by vendors (Intel MKL, AMD ACML, Cray LibSci, IBM ESSL) • Routines to perform basic operations such as vector and matrix multiplication
BLAS Functionality and Levels • Level 1 This level contains vector operations of the form as well as scalar dot products and vector norms, among other things. • Level 2 This level contains matrix-vector operations of the form as well as solving for with being triangular, among other things. • Level 3 This level contains matrix-matrix operations of the form as well as solving for triangular matrices , among other things. This level contains the widely used General Matrix Multiply (GEMM) operation.
General Matrix Multiply (GEMM) • where TRANSA and TRANSB determine if the matrices A and B are to be transposed • M is the number of rows in matrix C and, depending on TRANSA, the number of rows in the original matrix A or its transpose. • N is the number of columns in matrix C and, depending on TRANSB, the number of columns in the matrix B or its transpose. • K is the number of columns in matrix A (or its transpose) and rows in matrix B (or its transpose). • LDA, LDB and LDC specify the size of the first dimension of the matrices, as laid out in memory; meaning the memory distance between the start of each row/column, depending on the memory structure. • Precision (x) – S for single, D for double, C for complex single, Z for complex double
LAPACK • Linear Algebra PACKage • www.netlib.org/lapack/ • De facto standard • Successor to the linear equations and linear least-squares routines of LINPACK and the eigenvalue routines of EISPACK • Routines for solving systems of linear equations, linear least squares, eigenvalue problems, and singular value decomposition • Routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition • Handles real and complex matrices in both single and double precision • Depends on the BLAS to effectively exploit caches on modern cache-based architectures • Tuned versions implemented in vendor libraries (e.g., AMD ACML, Intel MKL, Cray LibSci, IBM ESSL)
LAPACK Naming Scheme • A LAPACK subroutine name is in the form pmmaaa, where: • p is a one-letter code denoting the type of numerical constants used. S, D stand for real floating point arithmetic respectively in single and double precision, while C and Z stand for complex arithmetic with respectively single and double precision. • mm is a two-letter code denoting the kind of matrix expected by the algorithm. The actual data are stored in a different format depending on the specific kind; e.g., when the code DI is given, the subroutine expects a vector of length n containing the elements on the diagonal, while when the code GE is given, the subroutine expects an n×n array containing the entries of the matrix. • aaa is a one- to three-letter code describing the actual algorithm implemented in the subroutine, e.g. SV denotes a subroutine to solve linear system, while R denotes a rank-1 update. • For example, the subroutine to solve a linear system with a general (non-structured) matrix using real double-precision arithmetic is called DGESV. • For details, see the LAPACK User’s Guide at www.netlib.org/lapack/lug/
ACML • AMD Core Math Library • http://developer.amd.com/tools/cpu-development/amd-core-math-library-acml/ • ACML consists of the following main components: • A full implementation of Level 1, 2 and 3 Basic Linear Algebra Subprograms (BLAS), with optimizations for AMD Opteron processors. • A full suite of Linear Algebra (LAPACK) routines. • A comprehensive suite of Fast Fourier transform (FFTs) in single-, double-, single-complex and double-complex data types. • Fast scalar, vector, and array math transcendental library routines • Random Number Generators in both single- and double-precision • /shared/acml-5.0.0 on Griffin
ScaLAPACK • Scalable Linear Algebra PACKage • www.netlib.org/scalapack/ • Library of high-performance linear algebra routines for parallel distributed memory machines • Solves dense and banded linear systems, least squares problems, eigenvalue problems, and singular value problems • Key ideas • block cyclic data distribution for dense matrices and a block data distribution for banded matrices, parameterizableat runtime • block-partitioned algorithms to ensure high levels of data reuse • Efficient low-level communication implemented by BLACS (Basic Linear Algebra Communication Subprograms) • Will run on any machine with BLAS, LAPACK, and BLACS
Current Efforts • Parallel Linear Algebra Software for Multicore Architectures (PLASMA) • www.netlib.org/plasma/ • icl.cs.utk.edu/plasma/ • Matrix Algebra on GPU and Multicore Architectures (MAGMA) • icl.cs.utk.edu/magma/ • OpenBLAS • http://c2.com/cgi/wiki?OpenBlas