420 likes | 516 Views
A Case for Source-Level Transformations in MATLAB. Vijay Menon and Keshav Pingali Cornell University. The MaJic Project at Illinois/Cornell. George Almasi Luiz De Rose David Padua. MATLAB. High-Level Interpreted Language for Numerical Computing Matrix is 1st class type
E N D
A Case for Source-Level Transformations in MATLAB Vijay Menon and Keshav Pingali Cornell University The MaJic Project at Illinois/Cornell George Almasi Luiz De Rose David Padua
MATLAB • High-Level Interpreted Language for Numerical Computing • Matrix is 1st class type • Library of numerical functions • Application Domains • Image Processing • Structural Mechanics • Computational Finance
The Problem • Development is fast... • ~10X as concise as C/Fortran • Performance is slow! • ~10X as slow as C/Fortran • Conventional Approach: • Rewrite • Compile
Our Approach: Source-Level Optimization • Apply high-level transformations directly on MATLAB codes • Significant performance benefit for: • interpreted code • compiled code
Outline • Overheads in MATLAB • Conventional Compilation • Source-Level Optimization • Comparison • Implementation Status
Outline • Overheads in MATLAB • Type/Shape Checking • Memory Management • Array Bounds Checking • Conventional Compilation • Source-Level Optimization • Comparison • Implementation Status
MATLAB has no type/shape declarations Consider: A * B Interpreter checks to perform multiply (*) Shape Scalar*Scalar Scalar*Matrix Matrix*Matrix Type/Shape Checking • Type • Real*Real • Real*Complex • Complex*Complex
Consider: for i = 1:n y = y + a * x(i) end Loops perform redundant checks magnify interpreter overhead Type/Shape Checking
Memory Management: Dynamic Resizing • Consider: x(10) = 10; • C/Fortran: x must have >= 10 elements • MATLAB: x is resized if needed • Memory reallocated • Data copied
Memory Management: Dynamic Resizing • MATLAB dynamically grows arrays: for i = 1 : 1000 x(i) = i; end • Every iteration triggers resize! • 1,000 memory allocations • ~500,000 elements copied • Execution Time: • x is undefined: 14.2 seconds • x is already defined: 0.37 seconds
Array Bounds Checking • Consider array indexing: x(i) = y(i); • Failed Bounds Check on • x(i) can trigger resize • y(i) can trigger error
Array Bounds Checking • In a loop: for i = 3:100 x(i) = x(i-1) + x(i-2); end • Interpreter performance redundant checks • Compiler work: • Nonresizable arrays: Gupta PLDI’90 • Resizable arrays: more difficult
Common Theme • Loops magnify overheads • every iteration: redundant checks, resizes, … • MATLAB interprets naively • computes as is • no reorganization to optimize
Outline • Overheads in MATLAB • Conventional Compilation • Compile to C/Fortran • Rely on C/Fortran compiler for optimization • Source-Level Optimization • Comparison • Implementation Status
MATLAB Compilers • Compile to C/C++/Fortran • MCC -> C (The MathWorks) • MATCOM -> C++ (Mathtools) • FALCON -> F90 (U of Illinois) • Native compiler generates executable code: • Link back into MATLAB environment • Run as stand-alone program
The MCC Compiler • Safe Optimization: • Type Inference - no declarations in MATLAB • Eliminate Type Checks / Reduce Storage • Specialize for real input variables • Always legal! • Unsafe Optimization: • Assume all data is real • Eliminate all bounds checks - disallow resizing • User must ensure legality!
Falcon Benchmarks • Collected by DeRose from MATLAB users at Illinois/NCSA • Element/Loop Intensive • CN - Crank-Nicholson PDE Solver • Di - Dirichlet PDE Solver • FD - Finite Difference PDE Solver • Ga - Galerkin PDE Solver • IC - Incomplete Cholesky Factorization • Memory Intensive • AQ - Adaptive Quadrature w/ Simpson’s Rule • EC - Euler-Cromer 2 body problem • RK - Runga Kutta 2 body problem • Library Intensive • CG - Conjugate Gradients Iterative Solver • Mei - 3D surface Generation • QMR - Quasi-Minimal Residual • SOR - Successive Over-Relaxation AQ
MCC: Unsafe Optimizations Note: User must ensure legality!
Outline • Overheads in MATLAB • Conventional Compilation • Source-Level Optimization • Vectorization • Preallocation • Expression Optimization • Comparison • Implementation Status
Vectorization • Loops are expensive • Overheads are magnified • Idea: Eliminate Loops • Map loops to higher-level matrix operations • Interpreter uses efficient libraries • BLAS • LINPACK/EISPACK
Example of Vectorization • In Galerkin, 98% of execution spent in: for i = 1:N for j = 1:N phi(k) += a(i,j)*x(i)*y(i); end end
Vectorized Code • In Optimized Galerkin: phi(k) += x*a*y’; • Fragment Speedup: 260 • Program Speedup: 110 • Note: Not always possible!
Preallocation • Eliminate Dynamic Resizing • Try to predict eventual size of array • Insert early allocation when possible: • x = zeros(1000,1); • Resizing will not be triggered
Example of Preallocation • In Euler-Cromer, 87% of time spent in: for i = 1:N r(i) = … th(i) = … t(i) = … k(i) = … p(i) = … … end
Preallocated Code • In Optimized Euler-Cromer: r = zeros(1,N); ... for i = 1:N r(i) = … … end • Fragment Speedup: 7 • Program Speedup: 4
Expression Optimization • MATLAB interprets expressions naïvely in left to right order • Simple restructuring may significantly effects execution time, e.g.: • A*B*x : O(n3) flops • A*(B*x) : O(n2) flops
Example of Expression Optimization • In QMR, 70% of execution spent in: w = A’*q; • A : 420x420 matrix • q, w : 420x1 vectors • A’ = transpose(A)
Expression Optimized Code • In Optimized QMR: A’*q == (q’*A)’ w = (q’*A)’; • Transpose 2 vectors instead 1 matrix • Fragment Speedup: 20 • Program Speedup: 3
Point #1: • Source optimizations can outperform MCC
Point #2: • Source optimizations complement MCC
Benefits of Source-Level Optimizations • Vectorization • Directly eliminates loop overhead • Move work to hand-optimized BLAS • Preallocation • Eliminates resizing overhead • Enables MCC array bounds elimination • Expression Optimization • Uses algebraic info unavailable in C/Fortran
Implementation Status • Illinois/Cornell MaJic system • Just-in-time MATLAB interpreter/compiler • Incorporates Source-Level Transformation • Semantic Optimization (Menon/Pingali ICS’99) • Vectorization/BLAS call generation • Expression Optimization • Preallocation/Bounds Check Optimization (Work in progress)
Conclusion • Source Level Optimizations are important for enhancing performance of MATLAB whether code is just interpreted or later compiled
Unsafe Type Check Removal • Correct on 11/12 Codes
Unsafe Bounds Check Removal • Correct on 7/12 Codes