Eigen Decomposition and Singular Value Decomposition

Eigen Decomposition and Singular Value Decomposition Based on the slides by Mani Thomas Modified and extended by Longin Jan Latecki

Introduction • Eigenvalue decomposition • Spectral decomposition theorem • Physical interpretation of eigenvalue/eigenvectors • Singular Value Decomposition • Importance of SVD • Matrix inversion • Solution to linear system of equations • Solution to a homogeneous system of equations • SVD application

What are eigenvalues? • Given a matrix, A, x is the eigenvector and  is the corresponding eigenvalue if Ax = x • A must be square and the determinant of A - I must be equal to zero Ax - x = 0 ! (A - I) x = 0 • Trivial solution is if x = 0 • The non trivial solution occurs when det(A - I) = 0 • Are eigenvectors are unique? • If x is an eigenvector, then x is also an eigenvector and  is an eigenvalue A(x) = (Ax) = (x) = (x)

Calculating the Eigenvectors/values • Expand the det(A - I) = 0 for a 2 £ 2 matrix • For a 2 £ 2 matrix, this is a simple quadratic equation with two solutions (maybe complex) • This “characteristic equation” can be used to solve for x

Eigenvalue example • Consider, • The corresponding eigenvectors can be computed as • For  = 0, one possible solution is x = (2, -1) • For  = 5, one possible solution is x = (1, 2) For more information: Demos in Linear algebra by G. Strang, http://web.mit.edu/18.06/www/

Physical interpretation • Consider a covariance matrix, A, i.e., A = 1/n S ST for some S • Error ellipse with the major axis as the larger eigenvalue and the minor axis as the smaller eigenvalue

Original Variable B PC 2 PC 1 Original Variable A Physical interpretation • Orthogonal directions of greatest variance in data • Projections along PC1 (Principal Component) discriminate the data most along any one axis

Physical interpretation • First principal component is the direction of greatest variability (covariance) in the data • Second is the next orthogonal (uncorrelated) direction of greatest variability • So first remove all the variability along the first component, and then find the next direction of greatest variability • And so on … • Thus each eigenvectors provides the directions of data variances in decreasing order of eigenvalues For more information: See Gram-Schmidt Orthogonalization in G. Strang’s lectures

Multivariate Gaussian

Bivariate Gaussian

Spherical, diagonal, full covariance

diagonal Eigen/diagonal Decomposition • Let be a squarematrix with mlinearly independent eigenvectors (a “non-defective” matrix) • Theorem: Exists an eigen decomposition • (cf. matrix diagonalization theorem) • Columns of U are eigenvectors of S • Diagonal elements of are eigenvalues of Unique for distinct eigen-values

Let U have the eigenvectors as columns: Then, SU can be written Diagonal decomposition: why/how Thus SU=U, or U–1SU= And S=UU–1.

Diagonal decomposition - example Recall The eigenvectors and form Recall UU–1 =1. Inverting, we have Then, S=UU–1 =

Example continued Let’s divide U (and multiply U–1)by Then, S=  Q (Q-1= QT ) Why? Stay tuned …

Symmetric Eigen Decomposition • If is a symmetric matrix: • Theorem: Exists a (unique) eigen decomposition • where Q is orthogonal: • Q-1= QT • Columns of Q are normalized eigenvectors • Columns are orthogonal. • (everything is real)

Spectral Decomposition theorem • If A is a symmetric and positive definite k £ k matrix (xTAx > 0) with i (i > 0) and ei, i = 1  k being the k eigenvector and eigenvalue pairs, then • This is also called the eigen decomposition theorem • Any symmetric matrix can be reconstructed using its eigenvalues and eigenvectors

Example for spectral decomposition • Let A be a symmetric, positive definite matrix • The eigenvectors for the corresponding eigenvalues are • Consequently,

Singular Value Decomposition • If A is a rectangular m £ k matrix of real numbers, then there exists an m £ m orthogonal matrix U and a k £ k orthogonal matrix V such that •  is an m £ k matrix where the (i, j)th entry i¸ 0, i = 1  min(m, k) and the other entries are zero • The positive constants i are the singular values of A • If A has rank r, then there exists r positive constants 1, 2,r, r orthogonal m £ 1 unit vectors u1,u2,,ur and r orthogonal k £ 1 unit vectors v1,v2,,vr such that • Similar to the spectral decomposition theorem

Singular Value Decomposition (contd.) • If A is a symmetric and positive definite then • SVD = Eigen decomposition • EIG(i) = SVD(i2) • Here AAT has an eigenvalue-eigenvector pair (i2,ui) • Alternatively, the vi are the eigenvectors of ATA with the same non zero eigenvalue i2

Example for SVD • Let A be a symmetric, positive definite matrix • U can be computed as • V can be computed as

Example for SVD • Taking 21=12 and 22=10, the singular value decomposition of A is • Thus the U, V and  are computed by performing eigen decomposition of AAT and ATA • Any matrix has a singular value decomposition but only symmetric, positive definite matrices have an eigen decomposition

Applications of SVD in Linear Algebra • Inverse of a n £ n square matrix, A • If A is non-singular, then A-1 = (UVT)-1= V-1UT where -1=diag(1/1, 1/1,, 1/n) • If A is singular, then A-1 = (UVT)-1¼V0-1UT where 0-1=diag(1/1, 1/2,, 1/i,0,0,,0) • Least squares solutions of a m£n system • Ax=b (A is m£n, m¸n) =(ATA)x=ATb )x=(ATA)-1ATb=A+b • If ATA is singular, x=A+b¼ (V0-1UT)b where 0-1 = diag(1/1, 1/2,, 1/i,0,0,,0) • Condition of a matrix • Condition number measures the degree of singularity of A • Larger the value of 1/n, closer A is to being singular http://www.cse.unr.edu/~bebis/MathMethods/SVD/lecture.pdf

Applications of SVD in Linear Algebra • Homogeneous equations, Ax = 0 • Minimum-norm solution is x=0 (trivial solution) • Impose a constraint, • “Constrained” optimization problem • Special Case • If rank(A)=n-1 (m ¸ n-1, n=0) then x=vn ( is a constant) • Genera Case • If rank(A)=n-k (m ¸ n-k, n-k+1== n=0) then x=1vn-k+1++kvn with 21++2n=1 • Has appeared before • Homogeneous solution of a linear system of equations • Computation of Homogrpahy using DLT • Estimation of Fundamental matrix For proof: Johnson and Wichern, “Applied Multivariate Statistical Analysis”, pg 79

What is the use of SVD? • SVD can be used to compute optimal low-rank approximations of arbitrary matrices. • Face recognition • Represent the face images as eigenfaces and compute distance between the query face image in the principal component space • Data mining • Latent Semantic Indexing for document extraction • Image compression • Karhunen Loeve (KL) transform performs the best image compression • In MPEG, Discrete Cosine Transform (DCT) has the closest approximation to the KL transform in PSNR

Singular Value Decomposition • Illustration of SVD dimensions and sparseness

Thus m=3, n=2. Its SVD is SVD example Let Typically, the singular values arranged in decreasing order.

Frobenius norm Low-rank Approximation • SVD can be used to compute optimal low-rank approximations. • Approximation problem: Find Akof rank k such that • Ak and X are both mn matrices. Typically, want k << r.

k column notation: sum of rank 1 matrices Low-rank Approximation • Solution via SVD set smallest r-k singular values to zero

Approximation error • How good (bad) is this approximation? • It’s the best possible, measured by the Frobenius norm of the error: where the i are ordered such that i  i+1. Suggests why Frobenius error drops as k increased.

Eigen Decomposition and Singular Value Decomposition