160 likes | 427 Views
Matrix Multiplication (i,j,k). for I = 1 to n do for j = 1 to n do for k = 1 to n do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor endfor endfor. i. k. k. j. (i,j,k) Memory Map. i. x. =. j. Functional units. Main memory.
E N D
Matrix Multiplication (i,j,k) for I = 1 to n do for j = 1 to n do for k = 1 to n do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor endfor endfor
i k k j (i,j,k) Memory Map i x = j
Functional units Main memory Scalar Architecture Registers Cache memory Memory bus
Cache lines: matrix stored by rows Stride 1 dimension
Matrix Multiplication (i,k,j)Improve Spatial Locality for i = 1 to n do for k = 1 to n do for j = 1 to n do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor endfor endfor
(i,k,j) Memory Map i i k x = k j j
Matrix Multiplication (i,k,j)Improve Temporal Locality C11 C12 C13 C21 C22 C23 C31 C32 C33 A11 A12 A13 A21 A22 A23 A31 A32 A33 B11 B12 B13 B21 B22 B23 B31 B32 B33 = x C11 = A11 x B11 + A12 x B21 + A13 x B31
Submatrix Multiplication (i,k,j) for it = 1 to n by s do for kt = 1 to n by s do for jt = 1 to n by s do for i = it to min(it+s-1,n) do for k = kt to min(kt+s-1,n) do for j = jt to min(jt+s-1,n) do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor endfor endfor endfor endfor endfor
(i,k,j) Memory Map s it it x kt = kt jt jt
CPU CPU Main memory Cache memory Cache memory Multiprocessor Architecture Memory bus
Parallel (i,k,j): Inner loop for i = 1 to n do for k = 1 to n do parfor j = 1 to n do C[i,j] = C[i,j] + A[i,k] x B[k,j] endparfor endfor endfor
Parallel (i,k,j): Inner loopmemory mapping i i k x = k
Parallel (i,k,j): Outer loop parfor i = 1 to n do for k = 1 to n do for j = 1 to n do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor endfor endparfor
Parallel (i,k,j): Submatrix parfor it = 1 to n by s do for kt = 1 to n by s do for jt = 1 to n by s do for i = it to min(it+s-1,n) do for k = kt to min(kt+s-1,n) do for j = jt to min(jt+s-1,n) do C[i,j] = C[i,j] + A[i,k] x B[k,j] endfor endfor endfor endfor endfor endparfor