Latent Semantic Indexing via a Semi-discrete Matrix Decomposition

1. Latent Semantic Indexing via a Semi-discrete Matrix Decomposition

2. Papers from the same authors with similar topics Kolda, T.G. & O'Leary, D.P. A semidiscrete matrix decomposition for latent semantic indexing information retrieval ACM Trans. Inf. Syst., 1998, 16, 322-346 Kolda, T.G. & O�Leary, D.P. George Cybenko, D.P.O. (ed.) Latentsemantic indexing via a semi-discrete matrix decomposition Springer-Verlag, 1999, 107, 73�80 Kolda, T.G. & O'leary, D.P. Algorithm 805: computation and uses of the semidiscrete matrix decomposition ACM Transactions on Mathematical Software, 2000, 26, 415�435

3. Vector Space Framework Query:

4. Weight of term in a document

5. Weight of term in a document

6. Motivation for using SDD Singular Value Decomposition (SVD) is used for Latent Semantic Indexing (LSI) to estimate the structure of word usage across documents. Use Semi-discrete Decomposition (SDD) instead of SVD for LSI to save storage space and retrieval time.

7. Why? Claim: SVD has nice theoretical properties but SVD contains a lot of information, probably more than is necessary for this application.

8. SVD vs SDD SVD: SDD:

9. SDD is an approximate representation of the matrix. Repackaging, even without removing anything, might not result in the original matrix. Theorems exist that say that as the number of terms k tends to infinity, slowly you will converge to the original matrix. The speed of convergence depends on the original estimate, used to "initialize" the iterative decomposition algorithm.

10. Result: Storage Space

11. Medline test case

12. Results on Medline test case

13. Method for SDD

14. Metrics in those papers Kolda, T.G. & O'Leary, D.P. A semidiscrete matrix decomposition for latent semantic indexing information retrieval ACM Trans. Inf. Syst., 1998, 16, 322-346 Kolda, T.G. & O�Leary, D.P. George Cybenko, D.P.O. (ed.) Latentsemantic indexing via a semi-discrete matrix decomposition Springer-Verlag, 1999, 107, 73�80 Kolda, T.G. & O'leary, D.P. Algorithm 805: computation and uses of the semidiscrete matrix decomposition ACM Transactions on Mathematical Software, 2000, 26, 415�435

15. Greedy Algorithm

16. Notes on the algorithm Starting vector y: every 100th element is 1 and all the other are 0. Ak ? A as k? 8 Find the minimum F-norm can be simplified to find an optimal x. Improvement threshold may be 0.01.improvement = |new - old| / old

17. Finding x and d

Latent Semantic Indexing via a Semi-discrete Matrix Decomposition

Latent Semantic Indexing via a Semi-discrete Matrix Decomposition

Presentation Transcript

Latent Semantic Indexing: A probabilistic Analysis

Detecting Cyberbullying using Latent Semantic Indexing(LSI)

Latent Semantic Indexing

Latent Semantic Indexing

LATENT SEMANTIC INDEXING

Lecture 14: Latent Semantic Indexing +

LATENT SEMANTIC INDEXING

Paper: Indexing by Latent Semantic Analysis

Latent Semantic Indexing

Latent Semantic Indexing and Beyond

Indexing by Latent Semantic Analysis

Latent Semantic Indexing for the Routing Problem

Latent Semantic Indexing

Latent Semantic Indexing and Beyond

Lecture 13: Matrix Factorization and Latent Semantic Indexing

A Study of Semi-Discrete Matrix Decomposition for LSI in Automated Text Categorization

Latent Semantic Indexing (mapping onto a smaller space of latent concepts)

Latent Semantic Indexing

Latent Semantic Indexing

Lecture 15: Latent Semantic Indexing

Latent Semantic Indexing: A probabilistic Analysis

Lecture 13: Matrix Factorization and Latent Semantic Indexing