380 likes | 629 Views
Linear Algebra and Complexity. Chris Dickson CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 23, 2006. Introduction. Review of linear algebra Complexity of determinant, inverse, systems of equations, etc
E N D
Linear Algebra and Complexity Chris Dickson CAS 746 - Advanced Topics in Combinatorial Optimization McMaster University, January 23, 2006
Introduction • Review of linear algebra • Complexity of determinant, inverse, systems of equations, etc • Gaussian elimination method for soliving linear systems • complexity • solvability • numerical issues • Iterative methods for solving linear systems • Jacobi Method • Gauss-Sidel Method • Successive Over-Relaxation Methods
Review • A subset L of Rn (respectively Qn) is called a linear hyperplane if L = { x | ax = 0 } for some nonzero row vector a in Rn (Qn). • A set U of vectors in Rn (respectively Qn) is called a linear subspace of Rn if the following properties hold: • The zero vector is in U • If x U then x U for any scalar • If x,y U then x + y U • Important subspaces • Null Space : null(A) = { x | Ax = 0 } • Image : im(A) = { b | Ax = b for some x in A } • A basis for a subspace L is a set of linearly independent vectors that spans L
Review • Theorem 3.1. Each linear subspace L of Rn is generated by finitely many vectors and is also the intersection of finitely many linear hyperplanes • Proof : • If L is a subspace, then a maximal set of linearly independent vectors from L will form a basis. • Also, L is the intersection of hyperplanes { x | aix = 0 } where the set of all ai are linearly independent generating L* := { z | zx = 0 for all x in L }
Review • Corollary 3.1a. For each matrix A there exist vectors x1 , … , xT such that: Ax = 0 x = 1x1 + … + TxT for certain rationals 1 , … T. • If x1 , … , xT are linearlly independent they form a fundamental system of solutions to Ax = 0. • Corollary 3.1b The system Ax = b has a solution iff yb = 0 for each vector y with yA = 0. [ Fundamental theorem of linear algebra, Gauss 1809].
Review • Proof : • () If Ax=b then when yA = 0 we have yAx = 0 yb = 0. • ()Let im(A) = {z | Ax = z for some x }. By Thrm 3.1 there exists a matrix C such that im(A) = { z | Cz = 0 }. So, CAx = 0 implies that CA is all-zero. Thus: (yA = 0 yb = 0) Cb = 0 b im(A) There is a solution to Ax = b
Review • Corollary 3.1c. If x0 is a solution to Ax = b, then there are vectors x1, … , xt, such that Ax = b iff x = x0 + 1x1 + … + txt for certain rationals 1, … , t. • Proof: Ax = b iff A(x – x0) = 0. Cor 3.1a implies: (x – x0) = 1x1 + … + txt x = x0 + 1x1 + … + txt
Review • Cramer’s Rule: If A is an invertible (n x n)-matrix, the solution to Ax = b is given by: x1 = detA1 / detA … xn = detAn / detA • Ai is defined by replacing the ith column of A by the column vector b.
Review • Example: 5x1 +x2 – x3 = 4 9x1 +x2 – x3 = 1 x1 -x2 +5x3 = 2Here, x2 = det(A2) / det(A) = 166 / 16 = 10.3750
Sizes and Characterizations • Rational Number: r = p / q (p Z, q N, relatively prime) • Rational Vector c = ( 1 , 2 , ... , n ) • Rational Matrix We only consider rational entries for vectors and matrices.
Sizes and Characterizations • Theorem 3.2. Let A be a square rational matrix of size . Then, the size of det(A) is less than 2 . • Proof: Let A = (pij/qij) for i,j = 1..n. Let detA = p/q where p,q and pij ,qij are all relatively prime. Then: • From the definitition of the determinant:
Sizes and Characterizations • Noting that |p| = |detA|q, we combine the two formulas to bound p and q: • Plugging the bounds for p and q into the definition for the size of a rational number, we get:
Sizes and Characterizations • Several important corollaries from the theorem • Corollary 3.2a. The inverse A-1of a nonsingular (rational) matrix A has size polynomially bounded by the size of A. • Proof: Entries of A-1are quotients of subdeterminants of A. Then, apply the theorem. • Corollary 3.2b. If the system Ax=b has a solution, it has one of size polynomially bounded by the sizes of A and bProof: Since A-1 is polynomially bounded, then A-1b is polynomially bounded. This tells us that the Corollary 3.1b provides a good characterization
Sizes and Characterizations • What if there isn’t a solution to Ax = b? Is this problem still polynomially bounded? Yes! • Corollary 3.2c says so! If there is no solution, we can specify a vector y with yA = 0 and yb = 1. By corollary 3.2b, such a vector exists and is polynomially bounded by the sizes of A and b. • Gaussian elimination provodes a polynomial method for finding (or not finding) a solution • One more thing left to show. Do we know the solution will be polynomially bounded in size?
Sizes and Characterizations • Corollary 3.2d. Let A be a rational m x n matrix and let b be a rational column vector such that each row of the matrix [A b] has size at most . If Ax = b has a solution, then: { x | Ax = b} = { x0 + 1x1 + txt | i R } (*)for certain rational vectors x0 , .... , xt of size at most 4n2 . • Proof: By Cramer’s Rule, there exists x0 , ... , xt satisfying (*) which are quotients of subdeterminants of [A b] of order at most n. By theorem 3.2, each determinant has size <= 2n. Each component of xi has size less than 4n. Since each xi has n components, the size of xi is at most 4n2.
Gaussian Elimination • Given a system of linear equations:a11x1 + a12x2 + .... + a1nxn = b1a21x1 + a22x2 + .... + a2nxn = b2 .........an1x1 + an2x2 + .... + annxn = bnWe subtract appropriate multiples of the first equation from the other equations to get:a11x1 + a12x2 + .... + a1nxn = b1 a’22x2 + .... + a’2nxn = b2 ......... a’n2x2 + .... + a’nnxn = bnWe then recursively solve the system of the last m-1 equations.
Gaussian Elimination • Operations allowed on matrices: • Adding a multiple of one row to another row • Permuting rows or columns • Forward Elimination: Given a matrix Ak of the formWe choose a nonzero component of D called the pivot element. Without loss of generality, assume that this pivot element is in D11. Next, add rational multiples of the first row of D to the other rows in D such that D11 is the only non-zero element in the first column of D. (B is non-singular, upper triangular, order k)
Gaussian Elimination • Back Substitution: After FE stage, we wish to transform the matrix into the form:where is non-singular diagonal. • We accomplish this by adding multiples of the kthrow to the rows 1, ... , k-1 such that the element Akk is the only non-zero in the kth row and column for k = r, r-1, ... , 1. (r = rank(A))
Gaussian Elimination • Theorem 3.3. For rational data, the Gaussian elminination method is a polynomial time method. • proof is found in the book. It is very longwinded! • since operations allowed for GE are polynomially bounded, it only remains to show that numbers do not grow exponentially. • Corollary of the theorem implies that other operations on matrices are polynomially solvable as they depend on solving a system of linear equations. • calculating determinant of a rational matrix • determining the rank of a rational matrix • calculating the inverse of a rational matrix • testing a set of vectors for linear independence • solving a system of rational linear equations
Gaussian Elimination • Numerical Considerations • .A naive pivot strategy always uses D11 as the pivot. However, this can lead to numerical problems when we do not have exact arithmetic. (ie/ floating point numbers). • Partial pivoting always selects the largest element (in absolute value) in the first column of D as the pivot. (swap rows to make this D11) • Scaled partial pivoting is like partial pivoting, but it scales the pivot column and row so that D11 = 1. • Full pivoting uses the largest value in the entire D submatrix as the pivot. We then swap rows and columns to put this value in D11. • Studies show that full pivoting usually requires more overhead to be beneficial.
Gaussian Elimination • Other methods: • Gauss-Jordan Method: This method combines the FE and BS stages in Gaussian elimination. • inferior to ordinary Gaussian elimination • LU Decomposition: Uses GE to decompose original matrix A into the product of a lower and upper triangular matrix. • This is the method of choice when we must solve many systems with the same matrix A but with different b’s. • We only need to perform FE once, then do back substitution for each right hand side. • Iterative Methods: Want to solve Ax = b by determining a sequence of vectors x0 , x1, ... , which eventually converge to a solution x*.
Iterative Methods • Given Ax = b and x0, the goal is find a sequence of iterates x1, x2 , ... , x* such that Ax* = b. • For most methods, the goal is to split A into A = L + D + U where L is strictly lower triangular, D is diagonal, and U is strictly upper triangular. • Example: L D U
Iterative Methods • Now, we can define how the iteration proceeds. In general: • Jacobi Iteration: Mj = D , Nj = -(L + U). Alternatively, the book presents the iteration:These two are equivalent as book assumes aii = 1, which means D = D-1 = I. (Identity)
Iterative Methods • Example: Solve for x in Ax=b using Jacobi’s Method where: • Using the iteration in the previous slide and x0 = b, we obtain:x1 = [ -5 0.667 1.5 ]Tx2 = [ 0.833 -1.0 -1.0]T.........x10 = [0.9999 0.9985 1.9977 ]T [x*= [ 1 1 2] T ]
Iterative Methods • Example 2: Solve for x in Ax=b using Jacobi’s Method where: • Using x0 = b, we obtain:x1 = [ -44.5 -49.5 -13.4 ]Tx2 = [ 112.7 235.7 67.7]T.........x10 = [2.82 4.11 1.39 ]T *106 [x*= [ 1 1 1] T ]
Iterative Methods • .What happened? The sequence diverged! • The matrix in the second example was NOT diagonally dominant. • Strict diagonal row dominance is sufficient for convergence. • Diagonal element is greater than sum of all other elements in the same row (take absolute values!) • Not neccessary though as example 1 did not have strict diagonal dominance. • Alternatively, book says that Jacobi Method will converge if all eigenvalues of (I - A) are less than 1 in absolute value. • Again, this is a sufficient condition. It can be checked that eigenvalues of matrix in first example do not satisfy this condition.
Iterative Methods • Gauss-Sidel Method: Computes the (i+1)thcomponent of xk+1. ie,where
Iterative Methods • Gauss-Sidel Method: Alternatively, if we split A = L+D+U, then we obtain the iteration: • Where MG = D + L , and NG = -U. • Again, it can be shown that the book’s method is equivalent to this method.
Iterative Methods • Example: Gauss-Sidel Method • Starting with x0 = b, we obtain the sequence: x1 = [ -4.666 -5.116 3.366 ]Tx2 = [ 1.477 -2.788 1.662]T x3 = [ 1.821 -0.404 1.116]T.... x12 = [1.000 1.000 1.000 ]T
Iterative Methods • It should also be noted that in both GS and Jacobi Methods, the choice of M is important. • Since the iterations of each method involve inverting M, we should choose M so that this is easy. • In Jacobi Method, M is a diagonal matrix • Easy to invert! • In Gauss-Sidel Method, M is a lower triangular • Slightly harder. Although inverse is still lower triangular, we must perform a little more work, but we only have to invert it once. • Alternatively, use the strategy by the book to compute iterates component by component.
Iterative Methods • Successive Over-Relaxation Method (SOR) Write matrices L and U (L lower triangular with diagonal) such that:for some appropriately chosen > 0 ( is the diagonal of A) • A related class of iterative relaxation methods for approximating the solution to Ax = b can now be defined.
Iterative Methods • Algorithm: • For a certain with 0<≤2 • convergent if 0<<2 and Ax = b has a solution.
Iterative Methods • There are many different choices that can be made for the relaxation methods • Choice of • = 2, then xk+1 is reflection of xk into xi = • = 1, then xk+1 is projection of xk onto the hyperplane xi = • How to choose violated equation • Is it best to just pick any violated equation? • Should we have a strategy? (First, Best, Worst, etc) • Stopping Criteria: • How close is close enough?
Iterative Methods • Why use iterative methods? • Sometimes they will be more numerically stable. • Particulary when matrix A is positive-semi-definite and diagonally dominant. (This is the case for the linear least squares problem) • If an exact solution is not neccessary. Only need to be ‘close enough’. • Which one to use? • Compared with Jacobi Iteration, Gauss-Seidel Method converges faster, usually by a factor of 2.
References • Schrijver, A. Theory of Linear and Integer Programming. John Wiley & Sons, 1998. • Qiao, S. Linear Systems. CAS 708 Lecture Slides. (2005)