270 likes | 279 Views
This paper proposes a data-dependence-based vectorizer that generates efficient vectorized source code in Matlab by utilizing higher-level matrix operations. It also addresses the challenges of dimensionality and pattern recognition in vectorization.
E N D
A Dimension Abstraction Approach to Vectorization in Matlab Neil Birkbeck Jonathan Levesque Jose Nelson Amaral Computing ScienceUniversity of AlbertaEdmonton, Alberta, Canada
Problem • Problem Statement: • Generate equivalent, error-free vectorized source code for Matlab source while utilizing higher level matrix operations when possible to improve efficiency.
Motivation • Loop-based code is slower than vector code in Matlab. • Why? • interpretive overhead (type/shape checking,…) • resizing of arrays in loops n=1000; for i=1:n, A(i)=B(i)+C(i); end 5x faster! n=1000; A(1:n)=B(1:n)+C(1:n); • Vectorization also useful for compiled Matlab code, where optimized vector routines could be substituted.
Related Work • Data dependence vectorization • Allen & Kennedy’s Codegen algorithm • Build data dependence graph • Topological visit strongly connected components • Abstract Matrix Form (AMF) [Menon & Pingali] • axioms used to transform array code • take advantage of matrix multiplication • Not clear if it is easily extensible or allows for vectorization of irregular access (e.g., access to the diagonal)
If this is not true the vectorized code will introduce an error! Incorrect Vectorization • Example 1: Pull out of loop. Index variable substitution (i1:n) for i=1:n, a(i)=b(i)+c(i); end a(1:n)=b(1:n)+c(1:n) • Vectorization correct if a,b, and c are row vectors or column vectors
Incorrect Vectorization for i=1:n, x(i)=y(i,h)*z(h,i); end • Example 2: • Matlab is untyped • Vectorization depends on whether h is a vector or scalar. • If h is a scalar: • Otherwise: x(1:n)=y(1:n,h).*z(h,1:n)’; x(1:n)=sum(y(1:n,h).*z(h,1:n)’,2);
Overview of Solution Data dependence-based vectorizer Vectorizable statement Knowledge of Shape of variables Propagate dimensionality up parse tree Dimensions Agree? Yes Perform Transformations No Leave statement in loop Output Vector statement
More Specifically Examples: • Represent dimensionality of expressions as list of symbols 1 or “*” (>1) • Assume known for variables. • Propagate up parse tree according to Matlab rules • Compatibility: • dim(A)≈dim(B) when the lists are equivalent (after removal of redundant 1’s)
Vectorized Dimensionality • Vectorized dimensionality: • representation of dimensions after vectorization of a loop • denoted dimi for loop with index variable i • Introduce new symbol ri for index variable i for i=1:n, a(i)=10+i; end
Vectorized Dimensionality • Expressions with incompatible vectorized dimensionality should not be vectorized. • When do dimensionalities agree? • Assignment expressions: elhs=erhs • dimi(elhs)≈dimi(erhs) || erhs≈(1) • Element-wise binary operators: e=elhsΘerhs • dimi(elhs) ≈(1)||dimi(erhs)≈(1)||dimi(elhs)≈dim(erhs) Θ in {+,-,.*,…} Dimensionalities are compatible or erhs is scalar Dimensionalities are compatible or either is scalar
Vectorized Dimensionality • Rules very restrictive: • Assume dim(A)=dim(B)=dim(C)=(*,*) for i=1:100, for j=1:100 A(i,j)=B(j,i)+C(i,j); end end dimi,j(B)=(rj,ri) dimi,j(C)=(ri,rj) Vectorization fails because (ri,rj) is not compatible with (rj,ri)
Transpose Transformation • Extension to utilize transpose when necessary is straightforward: • For assignment: • if dimi(A)≈reverse(dimi(B)) then A=BT is allowable for i=1:m, for j=1:n A(i,j)=B(j,i); end end dimi,j(A)=reverse(dimi,j(B))=(ri,rj) A(1:m,1:n)=(B(1:n,1:m))’
Transpose Transformation • Extension to utilize transpose when necessary is straightforward: • Similar for pointwise operations: • if dimi(A)≈reverse(dimi(B)) then AΘBT is allowable, propagate dimi(AΘBT)=dimi(A) • if dimi(reverse(A))≈dimi(A) then ATΘB is allowable, propagate dimi(ATΘB)=dimi(B)
Pattern Database • Dimensionality disagreement at binary operators inhibits vectorization. • Recognizing patterns (consisting of operator type and operand dimensionalities) can be used to identify a transformation enabling vectorization. • lhs operation rhs output • (ri, rj) Θ(ri,1) (ri, rj) Pattern: for i=1:m, for j=1:n, A(i,j)=B(i,j)+C(i); end end Transformed Result B(i,j)+C(i); B(1:m,1:n)+repmat(C(1:m),1,n);
Pattern Database • Diagonal access pattern: • lhs operation rhs output • (ri, ri) (index) nil (1, ri) Pattern: for i=1:n, a(i)=A(i,i)*b(i); end a(1:n)=A((1:n)+size(A,1)*((1:n)-1)).*b(1:n); Column major indexing of A
for i1=…, for i2=…, … for ik=… A(J)=A(J)+E; … end end end Loop nest variables I={i1,i2,…,ik} J is a subset of E for i=1:m, for j=1:n, a(i)=a(i)+B(i,j); end end I={i,j} J={i} Additive Reduction Statements • Additive-reduction statements use a loop variable to perform an accumulation. • Not all loop nest index variables appear in output dimensionality
for i=1:m a=a+10; end I={i},J={} I-J={i} ρ(10)={} ri not in dimi(10) Reduce: 10m*10, ρ(m*10)={ri} Vectorize: a=a+m*10; for i=1:m a=a+b(i); end I={i},J={} I-J={i} ρ(b(i))={} ri in dimi(b(i))=(ri,1) Reduce: b(i)sum(b(i),1); Vectorize: a=a+sum(b(1:m)); Additive Reduction (Solution) • Maintain/propagate dimensionality and reduced variables for an expression. • ρ(E) denotes the reduced variables for expression E • When checking statement A(J)=A(J)+E • ensure dimi1,i2,…,ikA(J)≈dimi1,i2,…,ik(E) and ρ(E)=I-J • any variable ri in I-J but not in ρ(E) must be reduced
for i=1:m for j=1:n a(i)=a(i)+B(i,j)*x(j); end end • j is used for reduction • dimi,j(B(i,j))=(ri,rj) • dimi,j (x(j))=(rj) a(1:m)=a(1:m)+… B(1:m,1:n)*x(1:n); Additive Reduction via Matrix Multiplication • Matrix multiplication can be used to perform reductions on e=elhs*erhs , provided: • dimi1,…,ik(elhs)=(Sl,rk) • dimi1,…,ik(erhs)=(rk,Sr) • rk is a reduction variable. • Implies: • dimi1,…,ik(e)=(Sl,Sr) • ρ(e)=union(ρ(elhs), ρ(erhs),{rk})
Use matrix multiplication to reduce rj ρ(a(i,j)*b(j))={rj}, dimi,j(a(i,j)*b(j))=(ri) Additive Reduction Example • Additive reduction example: ρ(a(i,j))={}, dimi,j(a(i,j))=(ri,rj) ρ(b(j))={}, dimi,j(b(j))=(rj) rj is reduction variable for i=1:m, for j=1:n, d(i)=d(i)+a(i,j)*b(j)+c(i,j) end end ρ(c(i,j))={}, dimi,j(c(i,j))={ri,rj} Need to reduce rj: c(i,j)sum(c(i,j),2); ρ(a(i,j)*b(j)+sum(c(i,j),2))={rj}, dimi,j(a(i,j)*b(j)+sum(c(i,j),2)=(ri,rj) Dimensionality and reduced variables agree, now replace index variables: d(1:m)=d(1:m)+a(1:m,1:n)*b(1:n)+sum(c(1:m,1:n),2);
Implementation Prototype Vectorized Loop Original Loop • Pattern database and corresponding transformations are specified in modular end-user extensible manner. Vectorizer Embedded Control Statements Code Generator Octave Parser yes Dimension Check Success no no Create DDG Vectorize Statement yes
Results • Source-to-source transformation • Timing results averaged over 100 runs: • Platform: • Matlab 7.2.0.283 • 3.0 GHz Pentium D Processor
Results • Histogram Equalization: Input source Vectorized Result h=hist(im(:),[0:255]);%histogram heq=255*cumsum(h(:))/sum(h(:)); for i=1:size(im,1), for j=1:size(im,2), im2(i,j)=heq(im(i,j)+1); end end h=hist(im(:),[(0:255)]); heq=255*cumsum(h(:))/sum(h(:)); im2(1:size(im,1),1:size(im,2))=... heq(im(1:size(im,1),1:size(im,2))+1); • For monochrome 8-bit 800x600 image: • original/vectorized: • Entire routine: 0.178s/0.114s (speedup: 1.56) • Loop Portion only: 0.0814s/0.0176s (speedup: 4.6)
Results (Menon & Pingali Examples) for k=1:p, for j=1:(i-1), X(i,k)=X(i,k)-L(i,j)*X(j,k); end end X(i,1:p)=X(i,1:p)-L(i,1:i-1)*X(1:i-1,1:p); for i=1:N,for j=1:N phi(k)=phi(k)+a(i,j)*x_se(i)*f(j); end end phi(k)=phi(k)+sum(a(1:N,1:N)’* x_se(1:N).*f(1:N),1); for i=1:n,for j=1:n, for k=1:n,for l=1:n y(i)=y(i)+x(j)*A(i,k)* B(l,k)*C(l,j); end end end end y(1:n)=y(1:n)+x(1:n)’*... (A(1:n,1:n)*B(1:n,1:n)’*C(1:n,1:n))’;
Remaining Issues/Future Work • Each pattern transformation is local; no optimization over entire statement. • e.g., we do not optimize and distribute transposes • Control flow within loop • Function calls • functions are treated as pointwise operators (correct for many predefined arithmetic functions) • Incorporate our analysis directly with shape analysis
Summary • Contributions: • A simple method to prevent incorrect vectorization in Matlab • A user extensible operator/dimensionality pattern database can be used to improve vectorization • These patterns can make use of higher level semantics (e.g., matrix multiplication) or diagonal accesses in vectorization.
Acknowledgements • Funding provided by NSERC • Grateful for reviewers comments and suggestions
Thank You Questions?