1 / 24

Linear Triangular System

Linear Triangular System. L – lower triangular matrix, nonsingular. Lx=b L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1) For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i) end. Forward substitution, row version. Triangular System.

erik
Download Presentation

Linear Triangular System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Linear Triangular System L – lower triangular matrix, nonsingular Lx=b L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1) For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i) end Forward substitution, row version

  2. Triangular System Column version (column sweep method): As soon as a variable is solved, its effect can be subtracted from subsequent equations Lx = b for j=1:n-1 b(j) = b(j)/L(j,j) b(j+1:n) = b(j+1:n)-b(j)L(j+1:n,j) end b(n) = b(n)/L(n,n) Forward substitution, column version Column version is more amenable to parallel computing

  3. Triangular System: Parallel L b L b block Block cyclic As soon as x_i (or a few x_i variables) is computed, the value is passed downward to neighboring cpus; As soon as a cpu receives x_i value, it passes the value downward to neighboring cpus; Then local b vector is updated. Disadvantage: load imbalance, about 50% cpus are active on average Remedy: cyclic or block cyclic distribution of rows.

  4. Triangular System: Inversion A – NxN lower triangular Divide A into equal blocks Can inverse A recursively: Inverse A1; Inverse A2; Compute X by matrix multiplication  Matrix multiplication

  5. Triangular System: Inversion First phase: invert diagonal elements of A 2nd phase: compute 2x2 diagonal blocks of A^(-1) … K-th phase: compute diagonal 2^(k-1) x 2^(k-1) blocks of A^(-1) Essentially matrix multiplications; K-th phase: N/2^(k-1) pairs of 2^(k-2)x2^(k-2) matrix multiplications Can do in parallel on P=K^3 processors

  6. Gaussian Elimination Ax = b A = LU, L – unit lower triangular U – upper triangular Ax = b  LUx = b  Ly = b, Ux = y Especially with multiple rhs or solve same equations (same coefficient matrix) many times

  7. LU Factorization A = LU A – nxn matrix A(1:k,1:k) nonsingular for k=1:n-1 (kij) version For k=1:n-1 for i=k+1:n A(i,k) = A(i,k)/A(k,k) for j=k+1:n A(i,j) = A(i,j) – A(i,k)A(k,j) end end end or For k=1:n-1 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)- A(k+1:n,k)A(k,k+1:n) end After factorization, L is in strictly lower triangular part of A, U is in upper triangular part of A (including diagonal) A(k,k) is the pivot

  8. Factorization Breakdown If A(k,k)=0 at any stage  breakdown, LU factorization may not exist even if A is nonsingular Theorem: Assume A is nxn matrix. (1) A has an LU factorization if A(1:k,1:k) is non-singular for all k=1:n-1. (2) If the LU factorization exists and A is non-singular, then the LU factorization is unique. Avoid method breakdown  pivoting Pivoting is also necessary to improve accuracy. Small pivot  increased errors Make sure no large entries appear in L or U. Use large pivots.

  9. Block LU Factorization A – nxn matrix, n = r*N A11 – rxr matrix, A22 – (n-r)x(n-r) matrix, A12 – rx(n-r) matrix, A21 – (n-r)xr A11 = L11*U11 A12 = L11*U12  U12 A21 = L21*U11  L21 A22 = L21*U12+L22*U22  A22-L21*U12 = A’ = L22*U22 LU factorization iteratively

  10. Block LU Factorization A – nxn matrix A(1:k,1:k) is non-singular for k=1:n-1 1<= r <= n Upon completion, A(i,j) overwritten by L(i,j) for i>j; A(i,j) overwritten by U(i,j) for i<=j s = 1 While s <= n q = min(n,s+r-1) Use scalar algorithm to LU-factorize A(s:q,s:q) into L and U Solve LZ = A(s:q,q+1:n) for Z and overwrite A(s:q,q+1:n) with Z Solve WU = A(q+1:n,s:q) for W and overwrite A(q+1:n,s:q) with W A(q+1:n,q+1:n) = A(q+1:n,q+1:n) – WZ s = q+1 End Matrix multiplication accounts for significant fraction of operations

  11. Permutation Matrix Permutation matrix: identity matrix with its rows re-ordered. p = [4 1 3 2] encodes permutation matrix P p(k) is the column index of the “1” in k-th row PA: row-permuted version of A AP: column-permuted version of A Interchange permutation matrix: identity matrix with two rows swapped Row 1 and 4 swapped EA: swap rows 1 and 4 of A AE: swap columns 1 and 4 of A

  12. Permutation Matrix A permutation matrix can be expressed as a series of row interchanges: If E_{k} is the interchange permutation matrix with rows k and p(k) interchanged, Then P can be encoded by vector p(1:n). If x(1:n) is a vector, then Px can be computed using p(1:n) For k=1:n swap x(k) and x(p(k)) End p(1:n) vector is useful for pivoting

  13. Partial Pivoting Pivoting is crucial to preventing breakdown and improving accuracy Partial pivoting: choose largest element in a column (or row) and interchange rows (columns) Swap rows 1 and 3 Swap rows 2 and 3

  14. LU Factorization with Row Partial Pivoting A – nxn matrix After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm F2: For k=1:n-1 Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,1:n) and A(s,1:n) p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

  15. How to Use Factorized A Solve Ax = b Using LU factorization of row partial pivoting Need to swap elements of b according to partial pivoting  information in p(1:n-1) Assume A is LU factorized with row partial pivoting using algorithm F2: For k=1:n-1 swap b(k) and b(p(k)) End Solve Ly = b Solve Ux = y L - unit lower triangular matrix whose lower triangular part is the same as that of A; U - upper triangular part of A (including diagonal)

  16. LU Factorization With Row Partial Pivoting A – nxn matrix After factorization, strictly lower triangular part of A contains multipliers; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm F1: For k=1:n-1 Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,k:n) and A(s,k:n) // only difference with F2 p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

  17. How to Use Factorized A Solve Ax = b Using LU factorization of partial pivoting Need to swap elements of b according to partial pivoting  information in p(1:n-1) Need to multiply appropriate coefficients  information in lower triangular part of A Assume A is LU factorized with partial pivoting using algorithm F1: For k=1:n-1 swap b(k) and b(p(k)) b(k+1:n) = b(k+1:n) – b(k)A(k+1:n,k) End Solve Ux = b U - upper triangular part of A (including diagonal)

  18. Column Partial Pivoting Column partial pivoting: search row k for the largest element, exchange that column with column k. A – nxn matrix After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm G: For k=1:n-1 Determine s with k<=s<=n s.t. |A(k,s)| is largest among |A(k,k:n)| swap A(1:n,k) and A(1:n,s) p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

  19. How to Use Factorized A Solve Ax = b Using LU factorization with column partial pivoting Need to swap elements of x according to partial pivoting  information in p(1:n-1) Assume A is LU factorized with column partial pivoting using algorithm G: Solve Ly = b Solve Ux = y For k=n-1:-1:1 swap x(k) and x(p(k)) end L - unit lower triangular matrix whose lower triangular part is the same as that of A; U - upper triangular part of A (including diagonal)

  20. Complete Pivoting Complete pivoting: the largest element in submatrix A(k:n,k:n) is permuted into (k,k) as the pivot Need a row interchange and a column interchange A – nxn matrix p(1:n-1) – vector encoding row interchanges q(1:n-1) – vector encoding column interchanges After factorization, lower triangular part of A contains L, upper triangular part of A contains U (including diagonal)

  21. LU Factorization with Complete Pivoting LU factorization with complete pivoting For k=1:n-1 Determine s (k<=s<=n) and t (k<=t<=n) s.t. |A(s,t)| is largest among |A(i,j)| for i=k:n, j=k:n swap A(k,1:n) and A(s,1:n) swap A(1:n,k) and A(1:n,t) p(k) = s q(k) = t if A(k,k) != 0 A(k+1:n,k) = A(k+1:n)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

  22. How to Use Factorized A Solve Ax = b By LU factorization with complete pivoting Suppose A is LU factorized with complete pivoting, p(1:n-1) and q(1:n-1) are permutation encoding vectors for k=1:n-1 swap b(k) and b(p(k)) End Solve Ly = b for y Solve Ux = y for x For k=n-1:-1:1 swap x(k) and x(q(k)) End L and U are lower and upper triangular parts of factorized A

  23. Parallelization of Gaussian Elimination A(k,k) Row-wise 1D block decomposition At step k, the processor holding the pivot sends row k: A(k,k:n) to bottom neighboring processor; At each processor, forward data immediately to bottom neighbor upon receiving data from top processor; then update its own data; then wait for data from top neighbor Disadvantage: load imbalance Remedy: row-wise block cyclic distribution

  24. Parallelization with Partial Pivoting Row-wise block/block-cyclic decomposition Gaussian elimination with column partial pivoting More difficult with row partial pivoting Pivoting search on the processor holding row k, no communication among processors; Column index of the new pivot element together with row k: A(k:n) need to be sent out; On each processor, upon receiving data from top neighbor, forward immediately to bottom neighbor, and swap column k and new pivot column of own data; update own data; wait data from top neighbor;

More Related