180 likes | 344 Views
LAPACK3E – A Fortran 90-enhanced version of LAPACK. Edward Anderson SAIC Technology Solutions Anderson.Edward@epa.gov August 7, 2003. LAPACK overview. A Fortran 77 library for numerical linear algebra operations Developed 1988-1992 with support from NSF and DOE
E N D
LAPACK3E – A Fortran 90-enhanced version of LAPACK Edward Anderson SAIC Technology Solutions Anderson.Edward@epa.gov August 7, 2003
LAPACK overview • A Fortran 77 library for numerical linear algebra operations • Developed 1988-1992 with support from NSF and DOE • LAPACK Users’ Guide published by SIAM • Several updates: LAPACK 2 (1995), LAPACK 3 (1999) • Follow-on packages: ScaLAPACK (1997), LAPACK95 (2000) • 20 million hits at www.netlib.org/lapack • Adopted by NAG, IMSL, and Mathworks • Incorporated into scientific libraries by Cray, SGI, Sun, HP, Digital, … • … but not IBM
Why not IBM? • ESSL has much of the same functionality as LAPACK • ESSL was developed concurrently and released first • 23 routines from LAPACK have been adopted into ESSL • Several unfortunate name space clashes Example: SGEEV finds the eigenvalues and eigenvectors of a real general matrix • Same name in both LAPACK and ESSL • Same operation • Different argument lists
Why not use LAPACK 3? • LAPACK 3 has some known bugs. • The last update was in 2000. • LAPACK 3 is not thread safe. • Differences in default data sizes not handled well by the Fortran 77 interface: • SGESV on Cray T3E becomes DGESV on IBM SP • But could be SGESV if you compile with –qrealsize=8 • Fortran 90 interfaces in LAPACK95 require features that were incompletely implemented in LAPACK 3.
What is LAPACK3E? Project to integrate my performance enhancements with LAPACK 3 using LAPACK95-style generic interfaces Suggested performance improvements LAPACK 3 (1999) LAPACK95 (2000) Thread safety & other good stuff Recent bug fixes LAPACK3E (2002)
Key LAPACK3E features • Thread safety (remove SAVE statements) • Parameterized constants • Common source for single and double precision • Generic interfaces as in LAPACK95 • Fully compatible with LAPACK 3 • Working towards full compatibility with LAPACK95 LAPACK3E must be built using Fortran 90!
SAVE statements in LAPACK Two contexts: • Reverse communication Add arguments to calling list and rename • Computed constants, e.g., LOGICAL FIRST DATA FIRST / .TRUE. / SAVE FIRST, … IF( FIRST ) THEN … FIRST = .FALSE. END IF Make the computed constants parameters to reduce overhead Compute constants first time only to reduce overhead
Replicated constants The LAPACK auxiliary routine SLAMCH computes floating point model parameters that are intrinsics in Fortran 90: EPS = SLAMCH( ‘Epsilon’ ) @ EPSILON( 1.0 ) SAFMIN = SLAMCH( ‘Safe minimum’ ) @ TINY( 1.0 ) SAFMAX = SLAMCH( ‘Overflow’ ) @ HUGE( 1.0 ) Many scaling parameters are used repeatedly, but not defined consistently. SMLNUM is variously computed as SAFMIN SAFMIN*( N / EPS ) SAFMIN*REAL( MAX( 1, N ) ) SQRT( SAFMIN / EPS ) SAFMIN / EPS SQRT( SAFMIN ) / EPS
New module: LA_CONSTANTS (also: LA_CONSTANTS32 for 32-bit constants) KIND for floating point data: WP (=8 for LA_CONSTANTS, =4 for LA_CONSTANTS32) Floating point real and complex constants: ZERO, ONE, TWO, CZERO, CONE, … Floating point model parameters: EPS, SAFMIN, SAFMAX, ULP Scaling constants derived from model parameters: SMLNUM, BIGNUM, RTMIN, RTMAX Character prefixes of type-specific names for error handling: SPREFIX, CPREFIX
Common source for different KINDs For common source LAPACK subroutines: • Include a file of preprocessor name mappings and USE the LA_CONSTANTS module: #include “lapacknames.inc” SUBROUTINE SGETRF( … ) USE LA_CONSTANTS • Parameterize all declarations by KIND: REAL(WP) instead of “REAL” or “DOUBLE PRECISION” • Parameterize all floating-point constants by KIND Most defined in LA_CONSTANTS, others specified as, e.g., REAL(WP), PARAMETER :: FUDGE = 2.8_WP Rename sgetrf.f to sgetrf.F to use the preprocessor
Compiling two versions from one source On IBM, files with .F extension invoke the preprocessor: xlf –WF,-DLA_REALSIZE=4 –c sgetrf.F xlf –o dgetrf.o –c sgetrf.F On Cray, files with .F or .F90 extension invoke the preprocesor: f90 –F –DLA_REALSIZE=4 –o hgetrf.o –c sgetrf.F f90 –c sgetrf.F If the defined constant LA_REALSIZE is not set, the 64-bit version is compiled.
#include “lapacknames.inc” MODULE LA_SCFOO INTERFACE LA_FOO SUBROUTINE SFOO( X ) USE LA_CONSTANTS, ONLY: WP REAL(WP), INTENT(INOUT) :: X(*) END SUBROUTINE SFOO SUBROUTINE CFOO( X ) USE LA_CONSTANTS, ONLY: WP COMPLEX(WP), INTENT(INOUT) :: X(*) END SUBROUTINE CFOO END INTERFACE ! LA_FOO END MODULE LA_SCFOO Common source generic interfaces Following LAPACK95, create generic interfaces (in a module) for all BLAS and LAPACK routines: Preprocessorinstructions Genericname Specificinterfaces
Gluing together type-specific interfaces Compile la_scfoo.o and la_dzfoo.o from common source. Compile la_xfoo.f: MODULE LA_XFOO USE LA_SCFOO USE LA_DZFOO END MODULE LA_XFOO Use the generic interface: PROGRAM MAIN PROGRAM MAIN USE LA_CONSTANTS32 EXTERNAL SFOO USE LA_XFOO REAL X(100) REAL(WP) :: X(100) CALL SFOO(X) CALL LA_FOO(X)
Pros and cons of generic interface • Provides compile time type matching /argument checking • Supports simpler interfaces through overloading • Allows same call for different default real sizes • Interfaces must match all arguments in type, kind, and rank • Adds overhead of extra call for non-default interface
Mismatched interfaces The calling site must match one of the interface specs for every argument exactly in type, kind, and rank. If it doesn’t match, you can • Match the interface to the call • Match the call to the interface LAPACK3E modules define both the natural interace and a “point” interface for BLAS and LAPACK generic interfaces. • Natural interface: just like the subroutine definition • Point interface: all arrays are indexed (such as A(I,J) or X(1)) • If the calling site doesn’t match the natural interface, index all the arrays to use the point interface • Point interface is default – natural interface is a wrapper to it
Point and natural interfaces Point interfaces allow argument matching by position and type without rank for use with indexed arrays. MODULE LA_XCOPY INTERFACE LA_COPY ! Point interface for xCOPY1 SUBROUTINE SCOPY1( N, X, Y ) USE LA_CONSTANTS32, ONLY: WP INTEGER, INTENT(IN) :: N REAL(WP), INTENT(IN) :: X REAL(WP), INTENT(OUT) :: Y END SUBROUTINE SCOPY1 MODULE PROCEDURE SCOPY1_NAT END INTERFACE ! LA_COPY PRIVATE SCOPY1_NAT CONTAINS ! Natural interface for xCOPY1 SUBROUTINE SCOPY1_NAT( N, X, Y ) USE LA_CONSTANTS32, ONLY: WP INTEGER, INTENT(IN) :: N REAL(WP), INTENT(IN) :: X(*) REAL(WP), INTENT(OUT) :: Y(*) CALL SCOPY1( N, X(1), Y(1) ) END SUBROUTINE SCOPY1_NAT END MODULE LA_XCOPY
Performance improvements • Parallel linear system solves with NRHS > 1, using OpenMP • Vastly better SLASSQ • Cleaner SLARTG, SLARFG • Faster SGEBAL • Faster SSTEIN (using MGS) • UPLO argument to CPTSV/CTPSVX (only incompatibility with LAPACK 3) • Call Level 3 LAPACK routines, not Level 2 directly Reference: LAPACK Working Note 158, December 2002 www.netlib.org/lapack/lawns/downloads
Availability and future work What’s available: • Version 1.1 at http://www.netlib.org/lapack3e • Compiled libraries for CRAY T3E, IBM SP, and Sun • Earlier versions of this talk • Soon: Common source generic interfaces In progress/Future work: • Add modernized interfaces of LAPACK95 • Simplify installation process • Extend Fortran 90 improvements to test/timing code