330 likes | 449 Views
Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load. Jack Dongarra Kenneth Roche. Javier Cuenca Domingo Giménez José González. Optimisation of Linear Algebra Routines. Traditional method: Hand-Optimisation for each platform Time-consuming
E N D
Automatic Optimisation of Parallel Linear Algebra Routines in Systems with Variable Load Jack Dongarra Kenneth Roche Javier Cuenca Domingo Giménez José González
Optimisation of Linear Algebra Routines • Traditional method: Hand-Optimisation for each platform • Time-consuming • Incompatible with Hardware Evolution • Incompatible with changes in the system • (architecture and basic libraries) • Unsuitable for systems with variable load • Misuse by non expert users
Our Approach D E S I G N R U N - T I M E Modelling the Linear Algebra Routine (LAR): Texec = f (SP, AP, n) SP: System Parameters AP: Algorithmic Parameters n: Problem size Execution of LAR Selection of AP values I N S T A L L A T I O N Estimation of SP
Our Approach Static Model of LAR: Situation of platform at installation time LARs Jacobi methods for the symmetric eigenvalue problem Gauss elimination LU factorisation QR factorisation Platforms Cluster of Workstations Cluster of PCs SGI Origin 2000 IBM SP2
Our Approach Static Model of LAR: Situation of platform at installation time DynamicModel of LAR: Situation of platform at run-time. LARs Jacobi methods for the symmetric eigenvalue problem Gauss elimination LU factorisation QR factorisation Platforms Cluster of Workstations Cluster of PCs SGI Origin 2000 IBM SP2
DESIGN PROCESS D E S I G N LAR LAR: Linear Algebra Routine Made by the LAR Designer Example of LAR: Parallel Block LU factorisation
Modelling the LAR D E S I G N LAR Modelling the LAR MODEL
Modelling the LAR D E S I G N LAR Made by the LAR-Designer Only once per LAR Modelling the LAR MODEL SP: System Parameters AP: Algorithmic Parameters n : Problem size MODEL Texec = f (SP, AP, n)
Modelling the LAR D E S I G N LAR SP: k3, k2, ts, tw AP: p, b n : Problem size Modelling the LAR MODEL MODEL LAR: Parallel Block LU factorisation
Implementation of SP-Estimators D E S I G N LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators
Implementation of SP-Estimators D E S I G N LAR Modelling the LAR Estimators of Arithmetic-SP Computation Kernel of the LAR Similar storage scheme Similar quantity of data Estimators of Communication-SP Communication Kernel of the LAR Similar kind of communication Similar quantity of data MODEL Implementation of SP-Estimators SP-Estimators
INSTALLATION PROCESS D E S I G N LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators I N S T A L L A T I O N Installation Process Only once per Platform Done by the System Manager
Estimation of Static-SP D E S I G N LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators I N S T A L L A T I O N Basic Libraries Installation-File Estimation of Static-SP Static-SP-File
Estimation of Static-SP D E S I G N Basic Libraries Basic Communication Library: MPI PVM Basic Linear Algebra Library: reference-BLAS machine-specific-BLAS ATLAS LAR Modelling the LAR Installation File SP values are obtained using the information (n and AP values) of this file. MODEL Implementation of SP-Estimators SP-Estimators I N S T A L L A T I O N Basic Libraries Installation-File Estimation of Static-SP Static-SP-File
Estimation of Static-SP D E S I G N Platform:Cluster of Pentium III + Fast Ethernet Basic Libraries: ATLAS and MPI LAR Modelling the LAR Estimation of the Static-SP k3-static (in sec) Block size 16 32 64 128 k3-static0.0038 0.0033 0.0030 0.0027 MODEL Implementation of SP-Estimators SP-Estimators I N S T A L L A T I O N Estimation of the Static-SP tw-static (in sec) Message size (Kbytes) 32 256 1024 2048 tw-static0.700 0.690 0.680 0.675 Basic Libraries Installation-File Estimation of Static-SP Static-SP-File
RUN-TIME PROCESS D E S I G N R U N - T I M E LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators I N S T A L L A T I O N Basic Libraries Installation-File Estimation of Static-SP Static-SP-File
RUN-TIME PROCESS: Static approach D E S I G N R U N - T I M E LAR Modelling the LAR Optimum-AP MODEL Selection of Optimum AP Implementation of SP-Estimators SP-Estimators I N S T A L L A T I O N Basic Libraries Installation-File Estimation of Static-SP Static-SP-File
RUN-TIME PROCESS: Static approach D E S I G N R U N - T I M E LAR Execution of LAR Modelling the LAR Optimum-AP MODEL Selection of Optimum AP Implementation of SP-Estimators SP-Estimators I N S T A L L A T I O N Basic Libraries Installation-File Estimation of Static-SP Static-SP-File
RUN-TIME PROCESS:Dynamic Approach D E S I G N R U N - T I M E LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators I N S T A L L A T I O N Basic Libraries Installation-File Estimation of Static-SP Static-SP-File
Call to NWS D E S I G N R U N - T I M E LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators I N S T A L L A T I O N Basic Libraries Installation-File NWS Information Estimation of Static-SP Call to NWS Static-SP-File
Call to NWS R U N - T I M E The NWS is called and it reports: ·the fraction of available CPU (fCPU) ·the current word sending time (tw-current) for a specific n and AP values (n0, AP0). Then the fraction of available network is calculated: NWS Information Call to NWS
Call to NWS D E S I G N R U N - T I M E LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators I N S T A L L A T I O N Basic Libraries Installation-File NWS Information Estimation of Static-SP Call to NWS Static-SP-File
Dynamic Adjustment of SP D E S I G N R U N - T I M E LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Current-SP I N S T A L L A T I O N Dynamic Adjustment of SP Basic Libraries Installation-File NWS Information Estimation of Static-SP Call to NWS Static-SP-File
Dynamic Adjustment of SP R U N - T I M E The values of the SP are adjusted, according to the current situation: Current-SP Dynamic Adjustment of SP NWS Information Call to NWS Static-SP-File
Dynamic Adjustment of SP D E S I G N R U N - T I M E LAR Modelling the LAR MODEL Implementation of SP-Estimators SP-Estimators Current-SP I N S T A L L A T I O N Dynamic Adjustment of SP Basic Libraries Installation-File NWS Information Estimation of Static-SP Call to NWS Static-SP-File
Selection of Optimum AP D E S I G N R U N - T I M E LAR Modelling the LAR Optimum-AP MODEL Selection of Optimum AP Implementation of SP-Estimators SP-Estimators Current-SP I N S T A L L A T I O N Dynamic Adjustment of SP Basic Libraries Installation-File NWS Information Estimation of Static-SP Call to NWS Static-SP-File
Execution of LAR D E S I G N R U N - T I M E LAR Execution of LAR Modelling the LAR Optimum-AP MODEL Selection of Optimum AP Implementation of SP-Estimators SP-Estimators Current-SP I N S T A L L A T I O N Dynamic Adjustment of SP Basic Libraries Installation-File NWS Information Estimation of Static-SP Call to NWS Static-SP-File
Platform load: different situations studied nodo1 nodo2 nodo3 nodo4 nodo5 nodo6 nodo7 nodo8 Situation A CPU avail. 100% 100% 100% 100% 100% 100% 100% 100% tw-current0.7sec Situation B CPU avail. 80% 80% 80% 80% 100% 100% 100% 100% tw-current0.8sec 0.7sec Situation C CPU avail. 60% 60% 60% 60% 100% 100% 100% 100% tw-current1.8sec 0.7sec Situation D CPU avail. 60% 60% 60% 60% 100% 100% 80% 80% tw-current1.8sec 0.7sec 0.8sec Situation E CPU avail. 60% 60% 60% 60% 100% 100% 50% 50% tw-current1.8sec 0.7sec 4.0sec
Optimum AP for the different situations studied Block size Situations of the Platform Load n A B C D E 1024 32 32 64 64 64 2048 64 64 64 128 128 3072 64 64 128 128 128 Number of nodes to use p = r c Situations of the Platform Load n A B C D E 1024 42 42 22 22 21 2048 42 42 22 22 21 3072 42 42 22 22 21
Conclusions and Future Work • The use of the proposed methodology is viable in systems where the load is stable or variable. • Software like NWS is suitable for the adjustment of the system parameters’ values obtained at installation time. • The heterogeneous load case offers many more possibilities than the one studied.