1 / 20

Latent Semantic Indexing

Latent Semantic Indexing. Jieping Ye Department of Computer Science & Engineering Arizona State University http://www.public.asu.edu/~jye02. Singular Value Decomposition. orthogonal. diagonal. orthogonal. Some Properties of SVD. Some Properties of SVD.

nuncio
Download Presentation

Latent Semantic Indexing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Latent Semantic Indexing Jieping Ye Department of Computer Science & Engineering Arizona State University http://www.public.asu.edu/~jye02

  2. Singular Value Decomposition orthogonal diagonal orthogonal

  3. Some Properties of SVD

  4. Some Properties of SVD • That is, Ak is the optimal approximation in terms of the approximation error measured by the Frobenius norm, among all matrices of rank k • Forms the basics of LSI (Latent Semantic Indexing) in informational retrieval

  5. Low rank approximation by SVD

  6. Applications of SVD • Pseudoinverse • Range, null space and rank • Matrix approximation • Other examples http://en.wikipedia.org/wiki/Singular_value_decomposition

  7. LSI (Latent Semantic Indexing) • Introduction • Latent Semantic Indexing • LSI • Query • Updating • An example

  8. Problem Introduction • Traditional term-matching method doesn’t work well in information retrieval • We want to capture the concepts instead of words. Concepts are reflected in the words. However, • One term may have multiple meaning • Different terms may have the same meaning.

  9. LSI (Latent Semantic Indexing) • LSI approach tries to overcome the deficiencies of term-matching retrieval by treating the unreliability of observed term-document association data as a statistical problem. • The goal is to find effective models to represent the relationship between terms and documents. Hence a set of terms, which is by itself incomplete and unreliable, will be replaced by some set of entities which are more reliable indicants.

  10. LSI, the Method • Document-Term M • Decompose M by SVD. • Approximating M using truncated SVD

  11. LSI, the Method (cont.) Each row and column of A gets mapped into the k-dimensional LSI space, by the SVD.

  12. Query • A query q is also mapped into this space, by • Compare the similarity in the new space • Intuition: Dimension reduction through LSI brings together “related” axes in the vector space.

  13. Example

  14. Example (cont.)

  15. Example (cont. Mapping)

  16. Example (cont. Query) Query: Application and Theory

  17. Example (cont. Query)

  18. How to set the value of k? • LSI is useful only if k << n. • If k is too large, it doesn't capture the underlying latent semantic space; if k is too small, too much is lost. • No principled way of determining the best k.

  19. How well does LSI work? • Effectiveness of LSI compared to regular term-matching depends on nature of documents. • Typical improvement: 0 to 30% better precision. • Advantage greater for texts in which synonymy and ambiguity are more prevalent. • Best when recall is high. • Costs of LSI might outweigh improvement. • SVD is computationally expensive; limited use for really large document collections • Inverted index not possible

  20. References • Mini tutorial on the Singular Value Decomposition • http://www.cs.brown.edu/research/ai/dynamics/tutorial/Postscript/SingularValueDecomposition.ps • Basics of linear algebra • http://www.stanford.edu/class/cs229/section/section_lin_algebra.pdf

More Related