270 likes | 366 Views
Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction. Shang-Hua Teng. Singular Value Decomposition. where u 1 … u r are the r orthonormal vectors that are basis of C(A) and v 1 … v r are the r orthonormal vectors that are basis of C(A T ).
E N D
Lecture 21SVD and Latent Semantic Indexing and Dimensional Reduction Shang-Hua Teng
Singular Value Decomposition • where • u1 …ur are the r orthonormal vectors that are basis of C(A) and • v1 …vr are the r orthonormal vectors that are basis of C(AT )
· · The Singular Value Decomposition 0 A U S VT = 0 m x n m x m m x n n x n 0 A U S VT 0 = m x n m x r r x r r x n
0 A U S VT 0 = m x n m x r r x r r x n · · The Singular Value Reduction Ak Uk S VkT = m x n m x k k x k k x n
Distance between Two Matrices • Frobenius Norm of a matrix A. • Distance between two matrices A and B
Approximation Theorem • [Schmidt 1907; Eckart and Young 1939] Among all m by n matrices B of rank at most k, Ak is the one that minimizes
Application: Image Compression • Uncompressed m by n pixel image: m×n numbers • Rank k approximation of image: • k singular values • The first k columns of U (m-vectors) • The first k columns of V (n-vectors) • Total: k× (m + n + 1) numbers
Example: Yogi (Uncompressed) • Source: [Will] • Yogi: Rock photographed by Sojourner Mars mission. • 256 × 264 grayscale bitmap 256 × 264 matrix M • Pixel values [0,1] • ~ 67584 numbers
Example: Yogi (Compressed) • M has 256 singular values • Rank 81 approximation of M: • 81 × (256 + 264 + 1) = ~ 42201 numbers
Patented by MIT Utilizes two dimentional, global grayscale images Face is mapped to numbers Create an image subspace(face space) which best discriminated between faces Can be used in properly lit and only frontal images Eigenface
The Face Database • Set of normalized face images • Used ORL Face DB
EigenFaces • Eigenface (PCA)
Latent Semantic Analysis (LSA) • Latent Semantic Indexing (LSI) • Principal Component Analysis (PCA)
Term-Document Matrix • Index each document (by human or by computer) • fij counts, frequencies, weights, etc • Each document can be regarded as a point in m dimensions
Document-Term Matrix • Index each document (by human or by computer) • fij counts, frequencies, weights, etc • Each document can be regarded as a point in n dimensions
c1 c2 c3 c4 c5 m1 m2 m3 m4 human 1 0 0 1 0 0 0 0 0 interface 1 0 1 0 0 0 0 0 0 computer 1 1 0 0 0 0 0 0 0 user 0 1 1 0 1 0 0 0 0 system 0 1 1 2 0 0 0 0 0 response 0 1 0 0 1 0 0 0 0 time 0 1 0 0 1 0 0 0 0 EPS 0 0 1 1 0 0 0 0 0 survey 0 1 0 0 0 0 0 0 1 trees 0 0 0 0 0 1 1 1 0 graph 0 0 0 0 0 0 1 1 1 minors 0 0 0 0 0 0 0 1 1
D T LSI using k=2… “applications& algorithms” LSI Factor 2 LSI Factor 1 “differentialequations” Each term’s coordinates specified in first K valuesof its row. Each doc’s coordinates specified in first K valuesof its column.
Positive Definite Matrices and Quadratic Shapes For any m x n matrix A, all eigenvalues of AAT and ATA are non-negative Symmetric matrices that have positive eigenvalues are called Positive Definite matrices Symmetric matrices that have non-negative eigenvalues are called Positive semi-definite matrices