Linear Triangular System

Linear Triangular System L – lower triangular matrix, nonsingular Lx=b L: nxn nonsingular lower triangular b: known vector b(1) = b(1)/L(1,1) For i=2:n b(i) = (b(i)-L(i,1:i-1)b(1:i-1))/L(i,i) end Forward substitution, row version

Triangular System Column version (column sweep method): As soon as a variable is solved, its effect can be subtracted from subsequent equations Lx = b for j=1:n-1 b(j) = b(j)/L(j,j) b(j+1:n) = b(j+1:n)-b(j)L(j+1:n,j) end b(n) = b(n)/L(n,n) Forward substitution, column version Column version is more amenable to parallel computing

Triangular System: Parallel L b L b block Block cyclic As soon as x_i (or a few x_i variables) is computed, the value is passed downward to neighboring cpus; As soon as a cpu receives x_i value, it passes the value downward to neighboring cpus; Then local b vector is updated. Disadvantage: load imbalance, about 50% cpus are active on average Remedy: cyclic or block cyclic distribution of rows.

Triangular System: Inversion A – NxN lower triangular Divide A into equal blocks Can inverse A recursively: Inverse A1; Inverse A2; Compute X by matrix multiplication  Matrix multiplication

Triangular System: Inversion First phase: invert diagonal elements of A 2nd phase: compute 2x2 diagonal blocks of A^(-1) … K-th phase: compute diagonal 2^(k-1) x 2^(k-1) blocks of A^(-1) Essentially matrix multiplications; K-th phase: N/2^(k-1) pairs of 2^(k-2)x2^(k-2) matrix multiplications Can do in parallel on P=K^3 processors

Gaussian Elimination Ax = b A = LU, L – unit lower triangular U – upper triangular Ax = b  LUx = b  Ly = b, Ux = y Especially with multiple rhs or solve same equations (same coefficient matrix) many times

LU Factorization A = LU A – nxn matrix A(1:k,1:k) nonsingular for k=1:n-1 (kij) version For k=1:n-1 for i=k+1:n A(i,k) = A(i,k)/A(k,k) for j=k+1:n A(i,j) = A(i,j) – A(i,k)A(k,j) end end end or For k=1:n-1 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)- A(k+1:n,k)A(k,k+1:n) end After factorization, L is in strictly lower triangular part of A, U is in upper triangular part of A (including diagonal) A(k,k) is the pivot

Factorization Breakdown If A(k,k)=0 at any stage  breakdown, LU factorization may not exist even if A is nonsingular Theorem: Assume A is nxn matrix. (1) A has an LU factorization if A(1:k,1:k) is non-singular for all k=1:n-1. (2) If the LU factorization exists and A is non-singular, then the LU factorization is unique. Avoid method breakdown  pivoting Pivoting is also necessary to improve accuracy. Small pivot  increased errors Make sure no large entries appear in L or U. Use large pivots.

Block LU Factorization A – nxn matrix, n = r*N A11 – rxr matrix, A22 – (n-r)x(n-r) matrix, A12 – rx(n-r) matrix, A21 – (n-r)xr A11 = L11*U11 A12 = L11*U12  U12 A21 = L21*U11  L21 A22 = L21*U12+L22*U22  A22-L21*U12 = A’ = L22*U22 LU factorization iteratively

Block LU Factorization A – nxn matrix A(1:k,1:k) is non-singular for k=1:n-1 1<= r <= n Upon completion, A(i,j) overwritten by L(i,j) for i>j; A(i,j) overwritten by U(i,j) for i<=j s = 1 While s <= n q = min(n,s+r-1) Use scalar algorithm to LU-factorize A(s:q,s:q) into L and U Solve LZ = A(s:q,q+1:n) for Z and overwrite A(s:q,q+1:n) with Z Solve WU = A(q+1:n,s:q) for W and overwrite A(q+1:n,s:q) with W A(q+1:n,q+1:n) = A(q+1:n,q+1:n) – WZ s = q+1 End Matrix multiplication accounts for significant fraction of operations

Permutation Matrix Permutation matrix: identity matrix with its rows re-ordered. p = [4 1 3 2] encodes permutation matrix P p(k) is the column index of the “1” in k-th row PA: row-permuted version of A AP: column-permuted version of A Interchange permutation matrix: identity matrix with two rows swapped Row 1 and 4 swapped EA: swap rows 1 and 4 of A AE: swap columns 1 and 4 of A

Permutation Matrix A permutation matrix can be expressed as a series of row interchanges: If E_{k} is the interchange permutation matrix with rows k and p(k) interchanged, Then P can be encoded by vector p(1:n). If x(1:n) is a vector, then Px can be computed using p(1:n) For k=1:n swap x(k) and x(p(k)) End p(1:n) vector is useful for pivoting

Partial Pivoting Pivoting is crucial to preventing breakdown and improving accuracy Partial pivoting: choose largest element in a column (or row) and interchange rows (columns) Swap rows 1 and 3 Swap rows 2 and 3

LU Factorization with Row Partial Pivoting A – nxn matrix After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm F2: For k=1:n-1 Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,1:n) and A(s,1:n) p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

How to Use Factorized A Solve Ax = b Using LU factorization of row partial pivoting Need to swap elements of b according to partial pivoting  information in p(1:n-1) Assume A is LU factorized with row partial pivoting using algorithm F2: For k=1:n-1 swap b(k) and b(p(k)) End Solve Ly = b Solve Ux = y L - unit lower triangular matrix whose lower triangular part is the same as that of A; U - upper triangular part of A (including diagonal)

LU Factorization With Row Partial Pivoting A – nxn matrix After factorization, strictly lower triangular part of A contains multipliers; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm F1: For k=1:n-1 Determine s with k<=s<=n s.t. |A(s,k)| is largest among |A(k:n,k)| swap A(k,k:n) and A(s,k:n) // only difference with F2 p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

How to Use Factorized A Solve Ax = b Using LU factorization of partial pivoting Need to swap elements of b according to partial pivoting  information in p(1:n-1) Need to multiply appropriate coefficients  information in lower triangular part of A Assume A is LU factorized with partial pivoting using algorithm F1: For k=1:n-1 swap b(k) and b(p(k)) b(k+1:n) = b(k+1:n) – b(k)A(k+1:n,k) End Solve Ux = b U - upper triangular part of A (including diagonal)

Column Partial Pivoting Column partial pivoting: search row k for the largest element, exchange that column with column k. A – nxn matrix After factorization, strictly lower triangular part of A contains L; upper triangular part contains U; vector p(1:n-1) contains permutation operations in partial pivoting Algorithm G: For k=1:n-1 Determine s with k<=s<=n s.t. |A(k,s)| is largest among |A(k,k:n)| swap A(1:n,k) and A(1:n,s) p(k) = s if A(k,k) != 0 A(k+1:n,k) = A(k+1:n,k)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

How to Use Factorized A Solve Ax = b Using LU factorization with column partial pivoting Need to swap elements of x according to partial pivoting  information in p(1:n-1) Assume A is LU factorized with column partial pivoting using algorithm G: Solve Ly = b Solve Ux = y For k=n-1:-1:1 swap x(k) and x(p(k)) end L - unit lower triangular matrix whose lower triangular part is the same as that of A; U - upper triangular part of A (including diagonal)

Complete Pivoting Complete pivoting: the largest element in submatrix A(k:n,k:n) is permuted into (k,k) as the pivot Need a row interchange and a column interchange A – nxn matrix p(1:n-1) – vector encoding row interchanges q(1:n-1) – vector encoding column interchanges After factorization, lower triangular part of A contains L, upper triangular part of A contains U (including diagonal)

LU Factorization with Complete Pivoting LU factorization with complete pivoting For k=1:n-1 Determine s (k<=s<=n) and t (k<=t<=n) s.t. |A(s,t)| is largest among |A(i,j)| for i=k:n, j=k:n swap A(k,1:n) and A(s,1:n) swap A(1:n,k) and A(1:n,t) p(k) = s q(k) = t if A(k,k) != 0 A(k+1:n,k) = A(k+1:n)/A(k,k) A(k+1:n,k+1:n) = A(k+1:n,k+1:n)-A(k+1:n,k)A(k,k+1:n) end end

How to Use Factorized A Solve Ax = b By LU factorization with complete pivoting Suppose A is LU factorized with complete pivoting, p(1:n-1) and q(1:n-1) are permutation encoding vectors for k=1:n-1 swap b(k) and b(p(k)) End Solve Ly = b for y Solve Ux = y for x For k=n-1:-1:1 swap x(k) and x(q(k)) End L and U are lower and upper triangular parts of factorized A

Parallelization of Gaussian Elimination A(k,k) Row-wise 1D block decomposition At step k, the processor holding the pivot sends row k: A(k,k:n) to bottom neighboring processor; At each processor, forward data immediately to bottom neighbor upon receiving data from top processor; then update its own data; then wait for data from top neighbor Disadvantage: load imbalance Remedy: row-wise block cyclic distribution

Parallelization with Partial Pivoting Row-wise block/block-cyclic decomposition Gaussian elimination with column partial pivoting More difficult with row partial pivoting Pivoting search on the processor holding row k, no communication among processors; Column index of the new pivot element together with row k: A(k:n) need to be sent out; On each processor, upon receiving data from top neighbor, forward immediately to bottom neighbor, and swap column k and new pivot column of own data; update own data; wait data from top neighbor;

Linear Triangular System

Linear Triangular System

Presentation Transcript

Triangular Trade

Triangular Arrangement

Linear Control system

Solving Linear Systems of Equations - Triangular Form

Triangular Trade

Linear Equation System

Triangular Trade

Linear Positioning System

Triangular Trade

Linear System

Non linear system

Linear System

Linear System

Triangular Trade

Linear Triangular System

Triangular Arrangement

TRIANGULAR RELATIONSHIPS

TRIANGULAR TRADE

Linear vs Non-linear System