Paper: Indexing by Latent Semantic Analysis

Paper: Indexing by Latent Semantic Analysis for course cs630 presented by: Haiyan Qiao

Problem Introduction • Traditional term-matching method doesn’t work well in information retrieval • We want to capture the concepts instead of words. Concepts are reflected in the words. However, • One term may have multiple meaning • Different terms may have the same meaning.

LSI (Latent Semantic Analysis) • LSI approach tries to overcome the deficiencies of term-matching retrieval by treating the unreliability of observed term-document association data as a statistical problem. • The goal is to find effective models to represent the relationship between terms and documents. Hence a set of terms, which is by itself incomplete and unreliable, will be replaced by some set of entities which are more reliable indicants.

SVD (Singular Value Decomposition) • How to learn the concepts from data? • SVD is applied to derive the latent semantic structure model. • What is SVD? http://kwon3d.com/theory/jkinem/svd.html http://mathworld.wolfram.com/SingularValueDecomposition.html http://www.cs.ut.ee/~toomas_l/linalg/lin2/node13.html#SECTION00013200000000000000

SVD cont’ • SVD of the term-by-document matrix X: • If the singular values of S0 are ordered by size, we only keep the first k largest values and get a reduced model: • doesn’t exactly match X and it gets closer as more and more singular values are kept • This is what we want. We don’t want perfect fit since we think some of 0’s in X should be 1 and vice versa. • It reflects the major associative patterns in the data, and ignores the smaller, less important influence and noise.

Fundamental Comparison Quantities from the SVD Model • Comparing Two Terms: the dot product between two row vectors of reflects the extent to which two terms have a similar pattern of occurrence across the set of document. • Comparing Two Documents: dot product between two column vectors of • Comparing a Term and a Document

Example -Technical Memo • Query: human-computer interaction • Dataset: c1 Human machine interface for Lab ABC computer application c2 A survey of user opinion of computersystemresponsetime c3 The EPS user interface management system c4 System and humansystem engineering testing of EPS c5 Relations of user-perceived responsetime to error measurement m1 The generation of random, binary, unordered trees m2 The intersection graph of paths in trees m3 Graphminors IV: Widths of trees and well-quasi-ordering m4 Graphminors: A survey

Example cont’ % 12-term by 9-document matrix >> X=[ 1 0 0 1 0 0 0 0 0; 1 0 1 0 0 0 0 0 0; 1 1 0 0 0 0 0 0 0; 0 1 1 0 1 0 0 0 0; 0 1 1 2 0 0 0 0 0 0 1 0 0 1 0 0 0 0; 0 1 0 0 1 0 0 0 0; 0 0 1 1 0 0 0 0 0; 0 1 0 0 0 0 0 0 1; 0 0 0 0 0 1 1 1 0; 0 0 0 0 0 0 1 1 1; 0 0 0 0 0 0 0 1 1;];

Example cont’ % X=T0*S0*D0', T0 and D0 have orthonormal columns and So is diagonal % T0 is the matrix of eigenvectors of the square symmetric matrix XX' % D0 is the matrix of eigenvectors of X’X % S0 is the matrix of eigenvalues in both cases >> [T0, S0] = eig(X*X'); >> T0 T0 = 0.1561 -0.2700 0.1250 -0.4067 -0.0605 -0.5227 -0.3410 -0.1063 -0.4148 0.2890 -0.1132 0.2214 0.1516 0.4921 -0.1586 -0.1089 -0.0099 0.0704 0.4959 0.2818 -0.5522 0.1350 -0.0721 0.1976 -0.3077 -0.2221 0.0336 0.4924 0.0623 0.3022 -0.2550 -0.1068 -0.5950 -0.1644 0.0432 0.2405 0.3123 -0.5400 0.2500 0.0123 -0.0004 -0.0029 0.3848 0.3317 0.0991 -0.3378 0.0571 0.4036 0.3077 0.2221 -0.0336 0.2707 0.0343 0.1658 -0.2065 -0.1590 0.3335 0.3611 -0.1673 0.6445 -0.2602 0.5134 0.5307 -0.0539 -0.0161 -0.2829 -0.1697 0.0803 0.0738 -0.4260 0.1072 0.2650 -0.0521 0.0266 -0.7807 -0.0539 -0.0161 -0.2829 -0.1697 0.0803 0.0738 -0.4260 0.1072 0.2650 -0.7716 -0.1742 -0.0578 -0.1653 -0.0190 -0.0330 0.2722 0.1148 0.1881 0.3303 -0.1413 0.3008 0.0000 0.0000 0.0000 -0.5794 -0.0363 0.4669 0.0809 -0.5372 -0.0324 -0.1776 0.2736 0.2059 0.0000 0.0000 0.0000 -0.2254 0.2546 0.2883 -0.3921 0.5942 0.0248 0.2311 0.4902 0.0127 -0.0000 -0.0000 -0.0000 0.2320 -0.6811 -0.1596 0.1149 -0.0683 0.0007 0.2231 0.6228 0.0361 0.0000 -0.0000 0.0000 0.1825 0.6784 -0.3395 0.2773 -0.3005 -0.0087 0.1411 0.4505 0.0318

Example cont’ >> [D0, S0] = eig(X'*X); >> D0 D0 = 0.0637 0.0144 -0.1773 0.0766 -0.0457 -0.9498 0.1103 -0.0559 0.1974 -0.2428 -0.0493 0.4330 0.2565 0.2063 -0.0286 -0.4973 0.1656 0.6060 -0.0241 -0.0088 0.2369 -0.7244 -0.3783 0.0416 0.2076 -0.1273 0.4629 0.0842 0.0195 -0.2648 0.3689 0.2056 0.2677 0.5699 -0.2318 0.5421 0.2624 0.0583 -0.6723 -0.0348 -0.3272 0.1500 -0.5054 0.1068 0.2795 0.6198 -0.4545 0.3408 0.3002 -0.3948 0.0151 0.0982 0.1928 0.0038 -0.0180 0.7615 0.1522 0.2122 -0.3495 0.0155 0.1930 0.4379 0.0146 -0.5199 -0.4496 -0.2491 -0.0001 -0.1498 0.0102 0.2529 0.6151 0.0241 0.4535 0.0696 -0.0380 -0.3622 0.6020 -0.0246 0.0793 0.5299 0.0820

Example cont’ >> S0=eig(X'*X) >> S0=S0.^0.5 S0 = 0.3637 0.5601 0.8459 1.3064 1.5048 1.6445 2.3539 2.5417 3.3409 % We only keep the largest two singular values % and the corresponding columns from the T and D

Example cont’ >> T=[0.2214 -0.1132; 0.1976 -0.0721; 0.2405 0.0432; 0.4036 0.0571; 0.6445 -0.1673; 0.2650 0.1072; 0.2650 0.1072; 0.3008 -0.1413; 0.2059 0.2736; 0.0127 0.4902; 0.0361 0.6228; 0.0318 0.4505;]; >> S = [ 3.3409 0; 0 2.5417 ]; >> D’ =[0.1974 0.6060 0.4629 0.5421 0.2795 0.0038 0.0146 0.0241 0.0820; -0.0559 0.1656 -0.1273 -0.2318 0.1068 0.1928 0.4379 0.6151 0.5299;] >> T*S*D’ 0.1621 0.4006 0.3790 0.4677 0.1760 -0.0527 0.1406 0.3697 0.3289 0.4004 0.1649 -0.0328 0.1525 0.5051 0.3580 0.4101 0.2363 0.0242 0.2581 0.8412 0.6057 0.6973 0.3924 0.0331 0.4488 1.2344 1.0509 1.2658 0.5564 -0.0738 0.1595 0.5816 0.3751 0.4168 0.2766 0.0559 0.1595 0.5816 0.3751 0.4168 0.2766 0.0559 0.2185 0.5495 0.5109 0.6280 0.2425 -0.0654 0.0969 0.5320 0.2299 0.2117 0.2665 0.1367 -0.0613 0.2320 -0.1390 -0.2658 0.1449 0.2404 -0.0647 0.3352 -0.1457 -0.3016 0.2028 0.3057 -0.0430 0.2540 -0.0966 -0.2078 0.1520 0.2212

Summary • What is the common and difference between PCA and SVD? • Both are related to standard eigenvalue-eigenvector, to remove noise or correlation and get the most important info. • PCA is on covariance matrix and SVD works on original matrix.

Paper: Indexing by Latent Semantic Analysis

Paper: Indexing by Latent Semantic Analysis

Presentation Transcript

An Introduction to Latent Semantic Analysis

Latent Semantic Analysis A Gentle Tutorial Introduction Tutorial Resources cis.paisley.ac.uk/giro-ci0/GU_LSA_TUT

Hinrich Schütze and Christina Lioma Lecture 18: Latent Semantic Indexing

Singular Value Decomposition in Text Mining

Latent Semantic Indexing: A probabilistic Analysis

Latent Semantic Indexing

Latent Semantic Kernels

An indexing and retrieval engine for the Semantic Web

Dimensionality reduction by random projection and latent semantic indexing

Outline

Semantic Analysis (Generating An AST)

Algorithmes pour le web

Bayesian Learning for Latent Semantic Analysis

Web indexing

Probabilistic Latent Semantic Analysis

LATENT SEMANTIC INDEXING

LSA, pLSA, and LDA Acronyms, oh my!

Semantic Web

Lecture 15: Latent Semantic Indexing