Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction

Lecture 21SVD and Latent Semantic Indexing and Dimensional Reduction Shang-Hua Teng

Singular Value Decomposition • where • u1 …ur are the r orthonormal vectors that are basis of C(A) and • v1 …vr are the r orthonormal vectors that are basis of C(AT )

Low Rank Approximation and Reduction

· · The Singular Value Decomposition 0 A U S VT = 0 m x n m x m m x n n x n 0 A U S VT 0 = m x n m x r r x r r x n

0 A U S VT 0 = m x n m x r r x r r x n · · The Singular Value Reduction Ak Uk S VkT = m x n m x k k x k k x n

How Much Information Lost?

Distance between Two Matrices • Frobenius Norm of a matrix A. • Distance between two matrices A and B

How Much Information Lost?

Approximation Theorem • [Schmidt 1907; Eckart and Young 1939] Among all m by n matrices B of rank at most k, Ak is the one that minimizes

Application: Image Compression • Uncompressed m by n pixel image: m×n numbers • Rank k approximation of image: • k singular values • The first k columns of U (m-vectors) • The first k columns of V (n-vectors) • Total: k× (m + n + 1) numbers

Example: Yogi (Uncompressed) • Source: [Will] • Yogi: Rock photographed by Sojourner Mars mission. • 256 × 264 grayscale bitmap  256 × 264 matrix M • Pixel values  [0,1] • ~ 67584 numbers

Example: Yogi (Compressed) • M has 256 singular values • Rank 81 approximation of M: • 81 × (256 + 264 + 1) = ~ 42201 numbers

Example: Yogi (Both)

Patented by MIT Utilizes two dimentional, global grayscale images Face is mapped to numbers Create an image subspace(face space) which best discriminated between faces Can be used in properly lit and only frontal images Eigenface

The Face Database • Set of normalized face images • Used ORL Face DB

Two-dimensional Embedding

EigenFaces • Eigenface (PCA)

Latent Semantic Analysis (LSA) • Latent Semantic Indexing (LSI) • Principal Component Analysis (PCA)

Term-Document Matrix • Index each document (by human or by computer) • fij counts, frequencies, weights, etc • Each document can be regarded as a point in m dimensions

Document-Term Matrix • Index each document (by human or by computer) • fij counts, frequencies, weights, etc • Each document can be regarded as a point in n dimensions

Term Occurrence Matrix

c1 c2 c3 c4 c5 m1 m2 m3 m4 human 1 0 0 1 0 0 0 0 0 interface 1 0 1 0 0 0 0 0 0 computer 1 1 0 0 0 0 0 0 0 user 0 1 1 0 1 0 0 0 0 system 0 1 1 2 0 0 0 0 0 response 0 1 0 0 1 0 0 0 0 time 0 1 0 0 1 0 0 0 0 EPS 0 0 1 1 0 0 0 0 0 survey 0 1 0 0 0 0 0 0 1 trees 0 0 0 0 0 1 1 1 0 graph 0 0 0 0 0 0 1 1 1 minors 0 0 0 0 0 0 0 1 1

Another Example

Term Document Matrix

D T LSI using k=2… “applications& algorithms” LSI Factor 2 LSI Factor 1 “differentialequations” Each term’s coordinates specified in first K valuesof its row. Each doc’s coordinates specified in first K valuesof its column.

Positive Definite Matrices and Quadratic Shapes

Positive Definite Matrices and Quadratic Shapes For any m x n matrix A, all eigenvalues of AAT and ATA are non-negative Symmetric matrices that have positive eigenvalues are called Positive Definite matrices Symmetric matrices that have non-negative eigenvalues are called Positive semi-definite matrices

Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction

Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction

Presentation Transcript

Latent Semantic Indexing: A probabilistic Analysis

Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing

Latent Semantic Indexing

Latent Semantic Indexing

LATENT SEMANTIC INDEXING

Lecture 14: Latent Semantic Indexing +

Dimensionality reduction by random projection and latent semantic indexing

LATENT SEMANTIC INDEXING

Latent Semantic Indexing

Latent Semantic Indexing and Beyond

Indexing by Latent Semantic Analysis

Latent Semantic Indexing

Latent Semantic Indexing and Beyond

Lecture 13: Matrix Factorization and Latent Semantic Indexing

Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing

Latent Semantic Indexing

Latent Semantic Indexing

Lecture 15: Latent Semantic Indexing

Text Categorization Moshe Koppel Lecture 12:Latent Semantic Indexing

Latent Semantic Indexing: A probabilistic Analysis

Lecture 13: Matrix Factorization and Latent Semantic Indexing

Lecture 21 SVD and Latent Semantic Indexing and Dimensional Reduction