1 / 28

Towards the Design of an Automatically Tuned Linear Algebra Library

Towards the Design of an Automatically Tuned Linear Algebra Library. Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia SPAIN.

bayard
Download Presentation

Towards the Design of an Automatically Tuned Linear Algebra Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards the Design of an Automatically Tuned Linear Algebra Library Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia SPAIN

  2. Linear Algebra: highly optimizable operations, but optimizations are Platform Specific Traditional method: Hand-Optimization for each platform Time-consuming Incompatible with Hardware Evolution Incompatible with changes in the system (architecture and basic libraries) Unsuitable for systems with variable workload Misuse by non expert users Current Situation of Linear Algebra Parallel Routines

  3. Some groups and projects: ATLAS, GrADS, LAWRA, FLAME, I-LIB But the problem is very complex. Solutions to this situation?

  4. Our approach • Routines Parameterised: System parameters, Algorithmic parameters • System parameters obtained at installation time Analytical model of the routine and simple installation routines to obtain the system parameters A reduced number of executions at installation time • Algorithmic parameters From the analytical model with the system parameters obtained in the installation process

  5. D E S I G N I N S T A L L A T I O N LIBRARY LAR-DESIGNER Our approach: the scheme LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF INCLUSION PROCESS SYSTEM MANAGER

  6. D E S I G N LAR-DESIGNER Design: Modelling the LAR LAR MODELLING LAR LAR-MOD

  7. LAR-MOD:Analytical Model of LAR The behaviour of the algorithm on the platform is defined Texec = f (SPs, n, APs) • SPs = f(n, APs)System Parameters • APsAlgorithmic Parameters • nProblem Size

  8. LAR-MOD:Analytical Model of LAR System Parameters (SPs): Hardware Platform • Physical Characteristics • Current Conditions Basic libraries LARs Performance

  9. LAR-MOD:Analytical Model of LAR System Parameters (SPs): Hardware Platform • Physical Characteristics • Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs) Arithmetic System Parameters (ASPs) LARs Performance

  10. LAR-MOD:Analytical Model of LAR System Parameters (SPs): Hardware Platform • Physical Characteristics • Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs): ts start-up time tw word-sending time Arithmetic System Parameters (ASPs) LARs Performance

  11. LAR-MOD:Analytical Model of LAR System Parameters (SPs): Hardware Platform • Physical Characteristics • Current Conditions Basic libraries Two Kinds of SPs: Communication System Parameters (CSPs) Arithmetic System Parameters (ASPs): tc arithmetic cost. Using BLAS: k1 k2 and k3 LARs Performance

  12. LAR-MOD:Analytical Model of LAR System Parameters (SPs): Hardware Platform • Physical Characteristics • Current Conditions Basic libraries How to estimate each SP? 1º.- Obtain the kernel of performance cost of LAR 2º.- Make an Estimation Routine from this kernel LARs Performance

  13. D E S I G N LAR-DESIGNER Design LAR MODELLING LAR LAR-MOD

  14. D E S I G N LAR-DESIGNER Design: Making the LAR-ERs LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs

  15. Arithmetic System Parameters (ASPs): Computation Kernel of the LAR Estimation Routine • Similar storage scheme • Similar quantity of data Communication System Parameters (CSPs): Communication Kernel of the LAR Estimation Routine • Similar kind of communication • Similar quantity of data LAR-ERs: Estimation Routines

  16. D E S I G N LAR-DESIGNER Design LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs

  17. D E S I G N LAR-DESIGNER HAND-MADE ONLY ONCE Design: Process has finished LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs

  18. D E S I G N I N S T A L L A T I O N LAR-DESIGNER Installation: Runing the LAR-ERs LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs BL EXECUT. OF LAR-ERs LAR-IF LAR-SPF SYSTEM MANAGER

  19. D E S I G N I N S T A L L A T I O N LAR-DESIGNER Installation: obtaining the OAP LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF SYSTEM MANAGER

  20. Installation: obtaining the OAP Algorithmic Parameters (APs) Known the SPs values, the Optimum Values for the APs are calculated (OAP): b block size p number of processors rclogical topology grid configuration (logical 2D mesh)

  21. D E S I G N I N S T A L L A T I O N LAR-DESIGNER Installation LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF SYSTEM MANAGER

  22. D E S I G N I N S T A L L A T I O N LIBRARY LAR-DESIGNER Installation: putting it all together LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF INCLUSION PROCESS SYSTEM MANAGER

  23. D E S I G N I N S T A L L A T I O N LIBRARY LAR-DESIGNER Installation process finished LAR MODELLING LAR IMPLEMEN. OF LAR-ERs LAR-MOD LAR-ERs BL EXECUT. OF LAR-ERs LAR-IF OAP SELECTION LAR-SPF LAR-OAPF INCLUSION PROCESS SYSTEM MANAGER

  24. LAR: Least Squares Toeplitz Routine. Platform: Network of PCs • LAR: One-sided Block Jacobi Method to solve the Symmetric Eigenvalue Problem. Platform: SGI Origin 2000 • LAR: Gaussian elimination. Platform: NoW (heterogeneous system) • LAR: block LU factorization. Platforms: IBM SP2, SGI Origin 2000, NoW Basic Libraries: reference BLAS, machine BLAS, ATLAS Experiments

  25. LU on IBM SP2 Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 and 8 processors.

  26. LU on Origin 2000 Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4, 8 and 16 processors.

  27. LU on NoW Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 processors. Using machine BLAS and ATLAS as basic libraries.

  28. Future Works • We try to develop a methodology valid for a wide range of systems, and to include it in the design of linear algebra libraries: it is necessary to analyse the methodology in more systems and with more routines • The Basic Linear Algebra Library to use can be considered as another parameter • An installation strategy common to a set of routines must be developed • At the moment we are analysing routines individually, but it could be preferable to analyse algorithmic schemes • We are working in the design of a strategy for the parameters election in dynamic systems

More Related