240 likes | 261 Views
Linear Triangular System. L – lower triangular matrix, nonsingular. Lx=b L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1) For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i) end. Forward substitution, row version. Triangular System.
E N D
Linear Triangular System L – lower triangular matrix, nonsingular Lx=b L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1) For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i) end Forward substitution, row version
Triangular System Column version (column sweep method): As soon as a variable is solved, its effect can be subtracted from subsequent equations Lx = b for j=1:n-1 b(j) = b(j)/L(j,j) b(j+1:n) = b(j+1:n)-b(j)L(j+1:n,j) end b(n) = b(n)/L(n,n) Forward substitution, column version Column version is more amenable to parallel computing
Triangular System: Parallel L b L b block Block cyclic As soon as x_i (or a few x_i variables) is computed, the value is passed downward to neighboring cpus; As soon as a cpu receives x_i value, it passes the value downward to neighboring cpus; Then local b vector is updated. Disadvantage: load imbalance, about 50% cpus are active on average Remedy: cyclic or block cyclic distribution of rows.
Triangular System: Inversion A – NxN lower triangular Divide A into equal blocks Can inverse A recursively: Inverse A1; Inverse A2; Compute X by matrix multiplication Matrix multiplication
Triangular System: Inversion First phase: invert diagonal elements of A 2nd phase: compute 2x2 diagonal blocks of A^(-1) … K-th phase: compute diagonal 2^(k-1) x 2^(k-1) blocks of A^(-1) Essentially matrix multiplications; K-th phase: N/2^(k-1) pairs of 2^(k-2)x2^(k-2) matrix multiplications Can do in parallel on P=K^3 processors
Gaussian Elimination Ax = b A = LU, L – unit lower triangular U – upper triangular Ax = b LUx = b Ly = b, Ux = y Especially with multiple rhs or solve same equations (same coefficient matrix) many times
LU Factorization A = LU A – nxn matrix A(1:k,1:k) nonsingular for k=1:n-1 (kij) version For k=1:n-1 for i=k+1:n A(i,k) = A(i,k)/A(k,k) for j=k+1:n A(i,j) = A(i,j) – A(i,k)A(k,j) end end end or For k=1:n-1 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)- A(k+1:n,k)A(k,k+1:n) end After factorization, L is in strictly lower triangular part of A, U is in upper triangular part of A (including diagonal) A(k,k) is the pivot
Factorization Breakdown If A(k,k)=0 at any stage breakdown, LU factorization may not exist even if A is nonsingular Theorem: Assume A is nxn matrix. (1) A has an LU factorization if A(1:k,1:k) is non-singular for all k=1:n-1. (2) If the LU factorization exists and A is non-singular, then the LU factorization is unique. Avoid method breakdown pivoting Pivoting is also necessary to improve accuracy. Small pivot increased errors Make sure no large entries appear in L or U. Use large pivots.
Block LU Factorization A – nxn matrix, n = r*N A11 – rxr matrix, A22 – (n-r)x(n-r) matrix, A12 – rx(n-r) matrix, A21 – (n-r)xr A11 = L11*U11 A12 = L11*U12 U12 A21 = L21*U11 L21 A22 = L21*U12+L22*U22 A22-L21*U12 = A’ = L22*U22 LU factorization iteratively
Block LU Factorization A – nxn matrix A(1:k,1:k) is non-singular for k=1:n-1 1<= r <= n Upon completion, A(i,j) overwritten by L(i,j) for i>j; A(i,j) overwritten by U(i,j) for i<=j s = 1 While s <= n q = min(n,s+r-1) Use scalar algorithm to LU-factorize A(s:q,s:q) into L and U Solve LZ = A(s:q,q+1:n) for Z and overwrite A(s:q,q+1:n) with Z Solve WU = A(q+1:n,s:q) for W and overwrite A(q+1:n,s:q) with W A(q+1:n,q+1:n) = A(q+1:n,q+1:n) – WZ s = q+1 End Matrix multiplication accounts for significant fraction of operations
Permutation Matrix Permutation matrix: identity matrix with its rows re-ordered. p = [4 1 3 2] encodes permutation matrix P p(k) is the column index of the “1” in k-th row PA: row-permuted version of A AP: column-permuted version of A Interchange permutation matrix: identity matrix with two rows swapped Row 1 and 4 swapped EA: swap rows 1 and 4 of A AE: swap columns 1 and 4 of A
Permutation Matrix A permutation matrix can be expressed as a series of row interchanges: If E_{k} is the interchange permutation matrix with rows k and p(k) interchanged, Then P can be encoded by vector p(1:n). If x(1:n) is a vector, then Px can be computed using p(1:n) For k=1:n swap x(k) and x(p(k)) End p(1:n) vector is useful for pivoting
Partial Pivoting Pivoting is crucial to preventing breakdown and improving accuracy Partial pivoting: choose largest element in a column (or row) and interchange rows (columns) Swap rows 1 and 3 Swap rows 2 and 3
LU Factorization with Row Partial Pivoting A – nxn matrix After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm F2: For k=1:n-1 Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,1:n) and A(s,1:n) p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end
How to Use Factorized A Solve Ax = b Using LU factorization of row partial pivoting Need to swap elements of b according to partial pivoting information in p(1:n-1) Assume A is LU factorized with row partial pivoting using algorithm F2: For k=1:n-1 swap b(k) and b(p(k)) End Solve Ly = b Solve Ux = y L - unit lower triangular matrix whose lower triangular part is the same as that of A; U - upper triangular part of A (including diagonal)
LU Factorization With Row Partial Pivoting A – nxn matrix After factorization, strictly lower triangular part of A contains multipliers; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm F1: For k=1:n-1 Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,k:n) and A(s,k:n) // only difference with F2 p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end
How to Use Factorized A Solve Ax = b Using LU factorization of partial pivoting Need to swap elements of b according to partial pivoting information in p(1:n-1) Need to multiply appropriate coefficients information in lower triangular part of A Assume A is LU factorized with partial pivoting using algorithm F1: For k=1:n-1 swap b(k) and b(p(k)) b(k+1:n) = b(k+1:n) – b(k)A(k+1:n,k) End Solve Ux = b U - upper triangular part of A (including diagonal)
Column Partial Pivoting Column partial pivoting: search row k for the largest element, exchange that column with column k. A – nxn matrix After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm G: For k=1:n-1 Determine s with k<=s<=n s.t. |A(k,s)| is largest among |A(k,k:n)| swap A(1:n,k) and A(1:n,s) p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end
How to Use Factorized A Solve Ax = b Using LU factorization with column partial pivoting Need to swap elements of x according to partial pivoting information in p(1:n-1) Assume A is LU factorized with column partial pivoting using algorithm G: Solve Ly = b Solve Ux = y For k=n-1:-1:1 swap x(k) and x(p(k)) end L - unit lower triangular matrix whose lower triangular part is the same as that of A; U - upper triangular part of A (including diagonal)
Complete Pivoting Complete pivoting: the largest element in submatrix A(k:n,k:n) is permuted into (k,k) as the pivot Need a row interchange and a column interchange A – nxn matrix p(1:n-1) – vector encoding row interchanges q(1:n-1) – vector encoding column interchanges After factorization, lower triangular part of A contains L, upper triangular part of A contains U (including diagonal)
LU Factorization with Complete Pivoting LU factorization with complete pivoting For k=1:n-1 Determine s (k<=s<=n) and t (k<=t<=n) s.t. |A(s,t)| is largest among |A(i,j)| for i=k:n, j=k:n swap A(k,1:n) and A(s,1:n) swap A(1:n,k) and A(1:n,t) p(k) = s q(k) = t if A(k,k) != 0 A(k+1:n,k) = A(k+1:n)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end
How to Use Factorized A Solve Ax = b By LU factorization with complete pivoting Suppose A is LU factorized with complete pivoting, p(1:n-1) and q(1:n-1) are permutation encoding vectors for k=1:n-1 swap b(k) and b(p(k)) End Solve Ly = b for y Solve Ux = y for x For k=n-1:-1:1 swap x(k) and x(q(k)) End L and U are lower and upper triangular parts of factorized A
Parallelization of Gaussian Elimination A(k,k) Row-wise 1D block decomposition At step k, the processor holding the pivot sends row k: A(k,k:n) to bottom neighboring processor; At each processor, forward data immediately to bottom neighbor upon receiving data from top processor; then update its own data; then wait for data from top neighbor Disadvantage: load imbalance Remedy: row-wise block cyclic distribution
Parallelization with Partial Pivoting Row-wise block/block-cyclic decomposition Gaussian elimination with column partial pivoting More difficult with row partial pivoting Pivoting search on the processor holding row k, no communication among processors; Column index of the new pivot element together with row k: A(k:n) need to be sent out; On each processor, upon receiving data from top neighbor, forward immediately to bottom neighbor, and swap column k and new pivot column of own data; update own data; wait data from top neighbor;