100 likes | 244 Views
Automatically Tuned Linear Algebra Software (ATLAS). R. Clint Whaley. University of Tennessee www.netlib.org/atlas. What is ATLAS. A package that adapts to differing architectures via AEOS techniques Initially, supply BLAS
E N D
Automatically Tuned Linear Algebra Software (ATLAS) R. Clint Whaley University of Tennessee www.netlib.org/atlas
What is ATLAS • A package that adapts to differing architectures via AEOS techniques • Initially, supply BLAS • Automated Empirical Optimization of Software (AEOS) • Machine searches opt space • Finds application-apparent architecture • AEOS requires: • Method of code variation • Code generation • Multiple implement. • Parameterization • Sophisticated Timers • Robust search heuristic
Why ATLAS is needed • BLAS require many man-hours / platform • Only done if financial incentive is there • Many platforms will never have an optimal version • Lags behind hardware • May not be affordable by everyone • Improves vendor code • Allows for portably optimal codes • Obsolescence insurance • Operations may be important, but not general enough for standard
ATLAS Software • Coming soon • pthread support • Open source kernels • SSE & 3DNOW! • GOTO ev5/6 BLAS • Performance for banded and packed • More LAPACK • Coming not-so-soon • Sparse support • User customization • Currently provided • Full BLAS (C & F77) • Level 3 BLAS • Generated GEMM • 1-2 hours install time per precision • Recursive GEMM-based L3 BLAS • Antoine Petitet • Level 2 BLAS • GEMV & GER ker • Level 1 BLAS • Some LAPACK • LU, LLt
N K N A M B M K * NB C Algorithmic Approach for Matrix Multiply • Only generated code is on-chip multiply • All BLAS operations written in terms of generated on-chip multiply • All transpose cases coerced through data copy to 1 case of on-chip multiply • Only 1 case generated per platform
Algorithmic approach for Level 3 BLAS Recursive TRMM • Recur down to L1 cache block size • Need kernel at bottom of recursion • Use gemm-based kernel for portability 0 0 0 0 0 0 0