Last lecture summary

Last lecture summary • independent vectors x • rank – the number of independent columns/rows in a matrix • Rank of this matrix is 2! • Thus, this matrix is noninvertible (singular). • It’s because both column and row spaces have the • same rank. • And row2 = row1 + row3 are identical, thus rank is 2.

Column space – space given by columns of the matrix and all their combinations. • Columns of a matrix span the column space. • We’re highly interested in a set of vectors that spans a space and is independent. Such a bunch of vector is called a basis for a vector space. • Basis is not unique. • Every basis has the same number of vectors – dimension. • Rank is dimension of the column space.

dim C(A) = r, dim N(A) = n - r (A is m x n) • row space • C(AT), dim C(AT) = r • left null space • N(AT), dim N(AT) = m – r • C(A) ┴ N(AT) • C(AT) ┴ N(A), row space and null space are orthogonal complements

G. Strang, Introduction to linear algebra

orthogonal = perpendicular, dot product aTb = a1b1+a2b2+… = 0 • length of the vector |a| = √|a|2 = √aTa • If subspace S is orthogonal to subspace T then every vector in S is orthogonal to every vector in T.

Four possibilities for Ax = b A: m × n, rank r

Least squares problem induction based on excelent video lectures by Gilbert Strang, MIT http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-2005/VideoLectures/detail/lecture15.htm Lecture 15

I want to solve Ax = b when there is no solution. WHAT ?? WAS ??

So b is not in a column space. • This problem is not rare, it’s actually quite typical. • It appears when the number of equations is bigger than the number of unknowns (i.e. m > n for m x n matrix A) • so what can you tell me about rank, what the rank can be? • it can’t be m, it can be n or even less • so there will be a lot of RHS with no solution !!

Example • You measure a position of sattelite buzzing around • There are six parameters giving the position • You measure the position 1000-times • And you want to solve Ax = b, where A is 1000 x 6 • In many problems we’ve got too many equations with noisy RHSs (b). • So I can't expect to solve Ax = b exactly right, because there's a measurement mistake in b. But there's information too. There's a lot of information about x in there. • So I’d like to separate the noise from the information.

One way to solve the problem is throw away some measurements till we get nice square, non-singular matrix. • That’s not satisfactory, there's no reason in these measurements to say these measurements are perfect and these measurements are useless. • We want to use all the measurements to get the best information. • But how?

Now I want you jump ahead to the matrix that will play a key role. It is a matrix ATA. • What you can tell me about the matrix? • shape? • square • dimension? • n x n • symmetric or not? • symmetric • Now we can ask more about the matrix. The answers will come later in the lecture • Is it invertible? • If not, what’s its null space? • Now let me to tell you in advance what equation to solve when you can’t solve Ax = b: • multiply both sides by AT from left, and you get ATAx = ATb, but this x is not the same as x in Ax = b, so lets call it , because I am hoping this one will have a solution. • And I will say it’s my best solution. This is going to be my plan.

So you see why I am so interested in ATA matrix, and its invertibility. • Now ask ourselves when ATA is invertible? And do it by example. • 3 x 2 matrix, i.e. 3 equations on 2 unknowns • rank = 2 • Does Ax equal b? When can we solve it? • Only if b is in the column space of A. • It is a combination of columns of A. • The combinations just fill up the plane, • but most vectors b will not be on that plane.

So I am saying I will work with matrix ATA. • Help me, what is ATA for this A? • Is this ATA invertible? • Yes • However, ATA is not always invertible ! • Propose such A so that ATA is not invertible ? Generally, if I have two matrices each with rank r, their product can’t have rank higher than r. And in our case rank(A)=1, so rank(AT) can’t be more than 1.

This happens always, rank(ATA) = rank(A). • If rank(ATA) = rank(A), then N(ATA)=N(A). • So ATA is invertible exactly if N(A)=0. Which means when columns of A are independent.

Projections based on excelent video lectures by Gilbert Strang, MIT http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-2005/VideoLectures/detail/lecture15.htm Lecture 15

e is the error, i.e. how much I am wrong by, and it is perpendicular to a And we know, that the projection p is some multiple of a, p = xa. And we want to find the number x. b e = b - p a p p = xa • I want to find a point on line a that is closest to b. • My space is what? • 2D plane • Is line a a subspace? • Yes, it is, one dimensional. • So where is such a point? • So we say we projected vector b on line a, we projected b into subspace. And how did we get it? • Orthogonality

Key point is that a is perpendicular to e. • So I have aTe = aT(b-p) = aT(b -xa) = 0 • So after some simple math we get • I may look at the problem from another point of view. • The projection from b to p is carried out by some matrix called projection matrix P. • p = Pb • What is the P for our case? →

Projection matrix • What’s its column space? • How acts the column space of a matrix A? • If you multiply the matrix A by anything you always get in the column space. That’s what column space is. • So where am I if I do Pb? • I am on the line a. The column space of P is the line through a.

What is the rank of P? • one • Column times row is a rank one matrix, the columns of the matrix are row-wise-multiples of the column vector, so the column vector is a basis for its column space.

P is symmetric. Show me why? • What happens if I do the projection twice? i.e. I multiply by P and then by P again (P × P = P2).

b a e = b - p p p = xa = Pb • So if I project b, and then do projection again I what? • stay put • So P2 = P … Projection matrix is idempotent.

Summary: if I want to project on line, there are three formulas to remember: • And properties of P: • P = PT, P = P2

More dimensions • Three formulas again, but different, we won’t have single line, but plane, 3D or nD subspace. • You may be asking why I actually project? • Because Ax = b may have no solution • I am given a problem with more equations than unknowns, I can’t solve it. • The problem is that Ax is in the column space, but b does not have to be. • So I change vector b into closest vector in the column space of A. • So I solve Ax = p instead !! • p is a projection of b onto the column space • I should indicate somehow, that I am not looking for x from Ax = b (x, which actually does not exist), but for x that’s the best possible.

I must figure out what’s the good projection here. What's the good RHS that is in the column space and that's as close as possible to b. • Let’s move into 3D space, where I have a vector b I want to project into a plane (i.e. subspace of 3D space)

e = b - p e is perpendicular to the plane b a2 p a1 this is a plane of a1 and a2 This plane is the column space of matrix A Apparently, projection p is some multiple of basis vectors. p = x1a1 + x2a2 = Ax , and I am looking for x ^ ^ ^ ^ So now I've got hold of the problem. The problem is to find the right combination of the columns so that the error vector (b – Ax) is perpendicular to the plane. ^

b e = b - p a2 ^ p a1 ^ • I write again the main point • Projection is p = Ax • Problem is to find x • Key is that e = b – Ax is perpendicular to the plane • So I am looking for two equations, because I have x1 and x2. • And e is perpendicular to the plane, so it means it must be perpendicular to each vector in the plane. It must be perpendicular to a1 and a2!! • So which two eqs. do I have? Help me. ^ ^ ^

A word about subspaces. • In what subspace lies (b – Ax)? • Well, this is actually vector e, so I have ATe=0. Thus in which space is e? • In N(AT)! • And from the last lecture, what do we know about N(AT)? • It is perpendicular to C(A). ^

e is in N(AT) e is ┴ to C(A) b e = b - p a2 p a1 It perfectly holds. We all are happy, aren’t we?

OK, we’ve got the equation, let’s solve it. • ATA is n by n matrix. • As in the line case, we must get answers to three questions: • What is x? • What is projection p? • What is projection matrix P? normal equations ^

^ • x is what? Help me. • What is the projection p =Ax? • What’s the projection matrix p = Pb? ^ projection matrix P

can I do this? • Apparently not, but why not? What did I do wrong? • A is not square matrix, it does not have an inverse. • Of course, this formula works well also if A was square invertible n x n matrix. • Then it’s column space is the whole what? • Rn • Then b is already in the whole Rn space, I am projecting b there, so the P = I.

Also P = PT, and P = P2 holds. Prove P2! • So we have all the formulas • And when will I use these equations. If I have more equations (measurements) than unknowns. • Least squares, fitting by a line.

Moore-Penrose Pseudoinverse

Least SquaresCalculation based on excelent video lectures by Gilbert Strang, MIT http://ocw.mit.edu/OcwWeb/Mathematics/18-06Spring-2005/VideoLectures/detail/lecture16.htm Lecture 16

Projection matrix recap • Projection matrix P = A(ATA)-1AT projects vector b to the nearest point in the column space (i.e. Pb). • Let’s have a look at two extreme cases: • If b is in the column space, then Pb = b. Why? • What does it mean that b is in the column space of A? • b is linear combination of columns of A, i.e. b is in the form Ax. • so Pb = PAx = A(ATA)-1ATAx = Ax = b

If b is ┴ to the column space of A then Pb = 0. Why? • What vectors are perpendicular to the column space? • Vectors in N(AT) • Pb = A(ATA)-1ATb = 0 C(A) = 0 p p = Pb → b – e = Pb e = (I - P)b b e p + e = b That’s the projection too. Projection onto the ┴ space. N(AT) When P projects onto one subspace, I – P projects onto the perpendicular subspace

y points (1,1) (2,2) (3,2) (Points at the picture are shifted for better readability.) x OK, I want to find a matrix A, once we have A, we can do all we need. I am looking for the best line (smallest overall error) y= a+ bx, meaning I am looking for a, b. Equations: a+ b= 1 a+ 2b= 2 a+ 3b= 2 but this can this eq. can’t be solved

In other words, the best solution is the line with smallest errors in all points. • So I want to minimize length |Ax – b|, which is the error |e|, actually I want to minimize the never-zero quantity |Ax – b|2. y b2 p3 so the overall error is the sum of squares |e1|2 + |e2|2 + |e3|2 e2 e3 p1 p2 b3 e1 b1 What are those p1, p2, p3? If I put them in the equations a+ b= p1 a+ 2b= p2 a+ 3b= p3 I can solve them. Vector [p1,p2,p3] is in the column space x

Least squares – traditional way • least squares problem – “metoda nejmenších čtverců” … the sum of square of errors is minimized y points (x,y) : (1,1) (2,2) (3,2) I am looking for a line: a + bx = y x Equations: a + b = 1 a + 2b = 2 a + 3b = 2

Equations: a + b = 1 a + 2b = 2 a + 3b = 2 points (x,y) : (1,1) (2,2) (3,2) y b2 p3 e2 e3 • So if there is a solution, each point lies on that line: • a + b = 1, a + 2b = 2, a + 3b = 2 • However, there is apparently no solution, no line at which all three points lie. • The optimal line a+bx will go somewhere between the points. Thus for each point, there will be some error (i.e. b value of the point on that line will differ from the required b value) • Therefore, the errors are: • e1 = a + b - 1, e2 = a+ 2b- 2, e3 = a + 3b - 2 p1 p2 e1 b3 b1 x

Least squares – linear algebra way C(A) p b e N(AT)

^ • And now computation • Task: find p and x = [a b] • Let’s solve that equation for • Help me, what is ATA? • And what is ATb? • So I have to solve (Gauss elimination) a system of linear equations 3a + 6b =5, 6a + 14b = 11 a = 1/2 b=2/3

points (1,1) (2,2) (3,2) • best line: 2/3 + 1/2x • What is p1? • A value for x = 1 … 7/6 • And e1? • 1 - p1 = -1/6 • p2 = 5/3, e2 = +2/6, p3 = 13/6, e3 = -1/6 • So we have projection vector p, and error vector e Ja, das stimmt!

p and e should be perpendicular. Verify that. • However, e is not perpendicular not only to p. Give me another vector e is perpendicular to? • Well, e is perpendicular to column space, so? • It must be perpendicular to columns of matrix A, i.e. to [1 1 1] and [1 2 3] • Just again, fitting by straight line means solving the key equation But A must have indpendent columns, then ATA is invertible If not, oops, sorry, I am out of luck

Last lecture summary