420 likes | 532 Views
International Workshop on Machine Learning and Text Analytics (MLTA2013). Linear Algebra for Machine Learning and IR. Manoj Kumar Singh. DST-Centre for Interdisciplinary Mathematical Sciences(DST-CIMS) Banaras Hindu University (BHU), Varanasi-221005, INDIA. E-mail: manoj.dstcims@bhu.ac.in.
E N D
International Workshop on Machine Learning and Text Analytics (MLTA2013) Linear Algebra for Machine Learning and IR Manoj Kumar Singh DST-Centre for Interdisciplinary Mathematical Sciences(DST-CIMS) Banaras Hindu University (BHU), Varanasi-221005, INDIA. E-mail: manoj.dstcims@bhu.ac.in December 15, 2013 South Asian University (SAU), New Delhi.
Content • Vector Matrix Model in IR, ML and Other Area • Vector Space • - Formal definition - Linear Combination - Independence - Generator and Basis • - Dimension - Inner product, Norm, Orthogonality - Example • Linear Transformation • - Definition - Matrix and Determinant - LT using Matrix - Rank and Nullity • - Column Space and Row Space - Invertility - Singularity and Non-Singularity – Eigen • Value Eigen Vector - Linear Algebra • Different Type of Matrix And Matrix Algebra • Matrix Factorization • Applications
Vector Matrix Model in IR A collection consisting of the following five documents is queriedfor latent semantic indexing (q): d1 = LSI tutorials and fast tracks. d2 = Books on semantic analysis. d3 = Learning latent semantic indexing. d4 = Advances in structures and advances in indexing. d5 = Analysis of latent structures. Rank documents in decreasing order of relevance to the query? Classification Recommendation System: Item based collaborative filtering
Blind Source Separation Source Measured
Vector Space Def.: Algebraic structure with sets and binary operations is vector space if
Vector Space Linear Algebra: Note: 1. Elements of V are called as vector and F are scalar. 2. Vector do not mean vector quantity as defined in vector algebra as directed line segment. 3. We say vector space V over field F and denote it as V(F).
Vector Space Subspace : e.g. Generator: Linear Combination:
Vector Space Linear Span: Note: Linear Dependence (LD): Linear Independence (LI): Basis: Dimension: e.g.
Vector Space Inner Product Norm / Length: Distance: Note: Orthogonality: Orthogonality:
Linear Transformation Definition (LT): Linear Operator: Range Space of LT: Null Space of LT: Note: Rank and Nullity of LT: Note: Non-Singular Transform: Singular Transform:
Matrices Definition: Unit / Identity Matrix: Diagonal Matrix: Scalar Matrix:
Matrices Upper Triangular Matrix: Lower Triangular Matrix: Symmetric : Skew Symmetric:
Matrices Transpose : Trace : Row /Column Vector Representation of Matrix:
Matrices Row Space And Row Rank of Matrix : Column Space And Column Rank of Matrix : Rank of Matrix : Determinant of Square Matrix:
Determinant Some Properties of Determinant:
Cofactor Expansion Minors: Leading Minors: Cofactors :
Cofactor Expansion Evaluation of Determinant: Cofactor Matrix: Inverse of Matrix: Singular and Non Singular Matrix:
Cofactor Expansion Rank of Matrix: Invertbility of Matrix:
LT using Matrix Example:
Eigen Value and Eigen Vector Eigen Value and Eigen Vector of LT: Eigen Value and Eigen Vector of Matrix:
Eigen Value and Eigen Vector Properties:
Similarity of Matrix Def. Diagonalizable Matrix:
Similarity of Matrix Singular Value Decomposition A singular value and corresponding singular vectors of a rectangular matrix A are, respectively, a scalar σ and a pair of vectors u and v that satisfy With the singular values on the diagonal of a diagonal matrix Σ and the corresponding singular vectors forming the columns of two orthogonal matrices U and V, we have : Since U and V are orthogonal, this becomes the singular value decomposition: Def.:
Similarity of Matrix The Cholesky factorization expresses a symmetric matrix as the product of a triangular matrix and its transpose. Cholesky Factorization: where R is an upper triangular matrix. Not all symmetric matrices can be factored in this way; the matrices that have such a factorization are said to be positive definite. The Cholesky factorization allows the linear system: tobe replaced by to form triangular system of equation. Solved easily by forward and backward substitution. LU factorization, or Gaussian elimination, expresses any square matrix A as the product of a permutation of a lower triangular matrix and an upper triangular matrix LU Factorization: where L is a permutation of a lower triangular matrix with ones on its diagonal and U is an upper triangular matrix. The orthogonal, or QR, factorization expresses any rectangular matrix as the product of an orthogonal or unitary matrix and an upper triangular matrix. QR Factorization: where Q is orthogonal or unitary, R is upper triangular.
APPLICATION • Documents Ranking
Documents Ranking • Rank documents in decreasing order of relevance to the query? A collection consisting of the following five documents: d1 = LSI tutorials and fast tracks.d2 = Books on semantic analysis.d3 = Learning latent semantic indexing. d4 = Advances in structures and advances in indexing.d5 = Analysis of latent structures. • queried for latent semantic indexing (q). Decreasing order of cosine similarities Assume that: 1. Documents are linearized, tokenized, and their stop words removed. Stemming is not used. Survival terms are used to construct a term-document matrix A. This matrix is populated with term weights :
Documents Ranking Procedure: Term-Document Matrix • Documents in collection: d1 = LSI tutorials and fast tracks. d2 = Books on semantic analysis. d3 = Learning latent semantic indexing. d4 = Advances in structures and advances in indexing. d5 = Analysis of latent structures.
Documents Ranking Step1: Weight Matrix q= = A=
Documents Ranking Step2: Normalization: An= qn=
Documents Ranking Documents rank as follows: Step3: Compute Exercises An Repeat the above calculations, this time including all stopwords. Explain any difference in computed results. Repeat the above calculations, this time scoring global weights using IDF probabilistic (IDFP): An= Explain any difference in computed results.
APPLICATION • Latent Semantic Indexing (LSI) • Using SVD
Latent Semantic Indexing • Use of LSI to cluster term, and find the terms that could be used to expand or reformulate the query. Example: Collection consist of following documents: Assume that the query is gold silver truck. d1 = Shipment of gold damaged in a fire. d2 = Delivery of silver arrived in a silver truck. d3 = Shipment of gold arrived in a truck. SVD
Latent Semantic Indexing (Procedure) Step1: Score term weights and construct the term – document matrix A and query matrix. A= q=
Latent Semantic Indexing (Procedure) Step2-1: Decompose matrix A using SVD procedure into U, S and V matrices. A= U= V=
Latent Semantic Indexing (Procedure) Step2-2: Decompose matrix A using SVD procedure into U, S and V matrices. V= U= Step3: Rank 2 Approximation : Vk= Uk=
Latent Semantic Indexing (Procedure) Step 4: Find the new term vector coordinates in this reduced 2-dimensonal space. Rows of U holds eigenvector values. These are coordinates of the individual term vectors. Thus from the reduced matrix (Uk) : Step 5: Find the new query vector coordinates in the reduced 2-dimensional space. Using q= = [-0.2140 -0.1821 ]
Latent Semantic Indexing (Procedure) Step 6: Group terms into clusters Grouping is done by comparing cosine angles between any two pair of vectors. The following clusters are obtained: 1. a, in of 2. gold, shipment 3. damaged, fire 4. arrived, truck 5. silver 6. delivery Some vectors are not shown since these are completely superimposed. This is the case of points 1 – 4. If unit vectors are used and small deviation ignored, clusters3 and 4 and clusters 4 and 5 can be merged.
Latent Semantic Indexing (Procedure) Step 7: Find terms that could be used to expand or reformulate the query The query is gold silver truck. Note that in relation to the query, clusters 1, 2 and 3 are far away from the query. Similarity wise these could be viewed as belonging to a “long tail”. If we insist in combining these with the query, possible expanded queries could be gold silver truck shipment gold silver truck damaged gold silver truck shipment damaged gold silver truck damaged in a fire shipment of gold silver truck damaged in a fire etc… Looking around the query, the closer clusters are 4, 5, and 6. We could use these clusters to expand or reformulate the query. For example, the following are some of the expanded queries one could test. gold silver truck arrived delivery gold silver truck gold silver truck delivery gold silver truck delivery arrived etc… Documents containing these terms should be more relevant to the initial query.
APPLICATION • Latent Semantic Indexing (LSI) • Exercise
Latent Semantic Indexing (Exercise) The svd was the original factorization proposed for Latent Semantic Indexing (LSI), the process of replacing a term-document matrix A with a low-rank approximation Ap which reveals implicit relationships among documents that don’t necessarily share common terms. Example: • A query on clemens will retrieve D1, D2, D3, and D4. • A query on twain will retrieve D1, D2, and D4. For p = 2, the svd gives • Now a query on clemens will retrieve all documents. • A query on twain will retrieve D1, D2, D4, and possibly D3. • The negative entry is disturbing to some and motivates the nonnegative factorizations.
References • Linear Algebra –I module 1, Vector and Matrices, by A.M. MATHAI, Centre for Mathematical Sciences • (CMS) Pala. • Linear Algebra –II module 2, Determinants and Eigenvalues by A.M. MATHAI, Centre for Mathematical • Sciences (CMS) Pala. • Introduction to Linear Algebra, Wellesley – Cambridge Press, 1993. • Matrix Computation, C. Golub and C. Van Loan, Johns Hopkins University Press, 1989. • Linear Algebra, A. R. Vasishtha and J.N. Sharma, KrishanaPrakashan. • Matrices, A. R. Vasishtha and J.N. Sharma, KrishanaPrakashan. • Linear Algebra, RamjiLal, Sail Publication, Allahabad. • An Introduction to Information Retrieval, Christopher D. Manning, PrabhakarRaghavan, HinrichSchutze, • Cambridge University Press.