1 / 43

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems. Lecture 19: Least Squares. Prof. Tom Overbye Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign overbye@illinois.edu Special Guest Lecture by Dr. Hao Zhu. Announcements.

lainey
Download Presentation

ECE 530 – Analysis Techniques for Large-Scale Electrical Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ECE 530 – Analysis Techniques for Large-Scale Electrical Systems Lecture 19: Least Squares Prof. Tom Overbye Dept. of Electrical and Computer Engineering University of Illinois at Urbana-Champaign overbye@illinois.edu Special Guest Lecture by Dr. Hao Zhu

  2. Announcements • HW 6 is due Thursday November 7

  3. Least Squares • So far we have considered the solution of Ax = b in which A is a square matrix; as long as A is nonsingular there is a single solution • That is, we have the same number of equations (m) as unknowns (n) • Many problems are overdetermined in which there more equations than unknowns (m > n) • Overdetermined systems are usually inconsistent, in which no value of x exactly solves all the equations • Underdetermined systems have more unknowns than equations (m < n); they never have a unique solution but are usually consistent

  4. Method of Least Squares • The least squares method is a solution approach for determining an approximate solution for an overdetermined system • If the system is inconsistent, then not all of the equations can be exactly satisfied • The difference for each equation between its exact solution and the estimated solution is known as the error • Least squares seeks to minimize the sum of the squares of the errors • Weighted least squares allows differ weights for the equations

  5. Least Squares Solution History • The method of least squares developed from trying to estimate actual values from a number of measurements • Several persons in the 1700's, starting with Roger Cotes in 1722, presented methods for trying to decrease model errors from using multiple measurements • Legendre presented a formal description of the method in 1805; evidently Gauss claimed he did it in 1795 • Method is widely used in power systems, with state estimation the best known application, dating from Fred Schweppe's work in 1970 • State estimation is covered in ECE 573

  6. Least Squares and Sparsity • In many contexts least squares is applied to problems that are not sparse. For example, using a number of measurements to optimally determine a few values • Regression analysis is a common example, in which a line or other curve is fit to potentially many points) • Each measurement impacts each model value • In the classic power system application of state estimation the system is sparse, with measurements only directly influencing a few states • Power system analysis classes have tended to focus on solution methods aimed at sparse systems; we'll consider both sparse and nonsparse solution methods

  7. Least Squares Problem • Consider or

  8. Least Squares Solution • We write (ai)T for the row i of Aand aiis a column vector • Here, m ≥ n and the solution we are seeking is that which minimizes Ax - bp, where pdenotes some norm • Since usually an overdetermined system has no exact solution, the best we can do is determine an x that minimizes the desired norm.

  9. Example 1: Choice of p • We discuss the choice of p in terms of a specific example • Consider the equation Ax = b with (hence three equations and one unknown) • We consider three possible choices for p:

  10. Example 1: Choice of p (i) p= 1 (ii) p= 2 (iii) p= 

  11. The Least Squares Problem • In general, is not differentiable for p = 1 or p = ∞ • The choice of p = 2 has become well established given its least-squares fit interpretation • We next motivate the choice of p = 2 by first considering the least–squares problem

  12. The Least Squares Problem • The problem is tractable for 2 major reasons (i) the function is differentiable in x ; and

  13. The Least Squares Problem (ii) the 2 norm is preserved under orthogonal transformations: with Q an arbitrary orthogonal matrix; that is, Q satisfies

  14. The Least Squares Problem • We introduce next, the basic underlying assumption: Ais full rank, i.e., the columns of A constitute a set of linearly independent vectors • This assumption implies that the rank of A is nbecause n ≤ m since we are dealing with an overdetermined system • Fact: The least squares solution x* satisfies

  15. Proof of Fact • Since by definition the leastsquares solution x* minimizes at the optimum, the derivative of this function vanishes:

  16. Implications • This underlying assumption implies that A is full rank • Therefore, the fact that ATA is positive definite (p.d.) follows from considering any x ≠ 0 and evaluating which is the definition of a p.d. matrix • We use the shorthand ATA > 0 for ATA being a symmetric, positive definite matrix

  17. Implications • The underlying assumption that A is full rank and therefore ATA is p.d. implies that there exists a unique leastsquares solution • Note: we use the inverse in a conceptual, rather than a computational, sense • The below formulation is known as the normal equations, with the solution conceptually straightforward

  18. Implications • An important implication of positive definiteness is that we can factor ATA since ATA> 0 • The expression ATA = GTG is called the Cholesky factorization of the symmetric positive definite matrix ATA

  19. Least Squares Solution Algorithm Step 1: Compute the lower triangular part of ATA Step 2: Obtain the Cholesky Factorization Step 3: Compute Step 4: Solve for y using forward substitution in and for x using backward substitution in

  20. Practical Considerations • The two key problems that arise in practice with the triangularization procedure are: (i) While A maybe sparse, ATAis much less sparse and consequently requires more computing resources for the solution (ii)ATAmay be numerically less well-conditioned than A • We must deal with these two problems 20

  21. Example 2: Loss of Sparsity • Assume the B matrix for a network is • Then BTB is • Second neighbors are now connected! But large networks are still sparse, just not as sparse

  22. Numerical Conditioning • To understand the point on numerical ill-conditio-ning, we need to introduce terminology • We define the norm of a matrix to be

  23. Numerical Conditioning i.e., li is a root of the polynomial • In words, the 2 norm of B is the square root of the largest eigenvalue of BTB

  24. Numerical Conditioning • The conditioning number of a matrix B is defined as • A well–conditioned matrix has a small value of , close to 1; the larger the value of , the more pronounced is the ill-conditioning

  25. Numerical Conditioning • The illconditioned nature of ATA may severely impact the accuracy of the computed solution • We illustrate the fact that an illconditioned matrix ATA results in highly sensitive solutions of leastsquares problems with the following example:

  26. Example 3: Ill-Conditioned ATA • Consider the matrixThen

  27. Example 3: Ill-Conditioned ATA • We consider a “noise” in A to be the matrix dAwith

  28. Example 3: Ill-Conditioned ATA • The noise leads to an error E in the computation of ATA with • Let and assume that there is nonoise in b, that is, db = 0

  29. Example 3: Ill-Conditioned ATA • The resulting error in solving the normal equations is independent of db since it is cause purely by nd is • Let x be the true solution of the normal equationso the solution of is x = [1 0]T

  30. Example 3: Ill-Conditioned ATA • Let x' be the solution of the system with the error arising due to dA,i.e., the solution of • Therefore

  31. Example 3: Ill-Conditioned ATA Implies that Therefore the relative error isNow, the conditioning number of ATA is

  32. Example 3: Ill-Conditioned ATA • Since • The product is • Thus the conditioning number is a major contributor to the error in the computation of x • In other words, the sensitivity of the solution to any error, be it data entry or of a numerical nature, is very dependent on the conditioning number

  33. What can be done? • Introduce regularization term to the LS cost • Ridge regression (l2 norm regularization) • At the optimum, the derivative • Different inverse matrix (improving the conditioning)

  34. Example 4: Ridge regression • Recalling Example 3 and • Ridge regression solution with versus • x= [1 0]T

  35. Example 4: Ridge regression • With noise matrix and • Ridge regression solution with versus

  36. Regularization • Can be used for solving underdetermined systems too • Level of regularization important! • Large λ : better conditioning number, but less accurate • Small λ: close to LS, but not improving conditioning • Recent trend: sparsity regularization using l1norm

  37. The Least Squares Problem • With this background we proceed to the typical schemes in use for solving least squares problems, all along paying adequate attention to the numerical aspects of the solution approach • If the matrix is full, then often the best solution approach is to use a singular value decomposition (SVD), to form a matrix known as the pseudo-inverse of the matrix • We'll cover this later after first considering the sparse problem • We first review some fundamental building blocks and then present the key results useful for the sparse matrices common in state estimation

  38. Householder Matrices and Vectors • Consider the nn matrix where, is called a Householder vector • Note that the definition of P in terms of vector v implies the following properties for P: Symmetry: Orthonormality:

  39. Householder Matrices and Vectors • Let x n be an arbitrary vector; then • Now, suppose we want P x to be a multiple of e1, the first unit vector and so P x is a linear combination of the x and v vectors • Then, v is a linear combination of x and e1, and we writeso that

  40. Householder Matrices and Vectors and • Therefore,

  41. Householder Matrices and Vectors • For the coefficient of x to vanish, we require thator so that • Consequentlyso thatThus the determination of v is straightforward

  42. Example 4: Construction of P • Assume we are given • Then

  43. Example 4: Construction of P • Then • It follows then that

More Related