390 likes | 722 Views
Latent Semantic Kernels. Alejandro Figueroa. Outline. Introduction. From bag of words to semantic space. Representing text (Document – term matrix). Semantic Issues. Vector space kernels Desingning semantic kernels. Designing the proximity matrix. Generalised Vector Space Model.
E N D
Latent Semantic Kernels Alejandro Figueroa
Outline • Introduction. • From bag of words to semantic space. • Representing text (Document – term matrix). • Semantic Issues. • Vector space kernels • Desingning semantic kernels. • Designing the proximity matrix. • Generalised Vector Space Model. • Latent semantic Kernels. • Application
Introduction • Nowadays, classify digital texts by hand is infeasible. • After multivariate data, natural language text is the most important data format for applicactions. • Well-know IR techniques can be reinterpreted as kernels. • This approach detects and explotes statistical patterns of words in documents.
From bag of words to semantic space • Vector Space Model (VSM). • Bag of words. • Normally, Grammatical information is lost. • Definition 1: [Vector Space Model] We represent a document as a vector in a space in which each dimension is associated with one term from the dictionary.
From bag of words to semantic space • tf(t1,d) is the frequency of the term tiin the document d. • Definition 2: [Document-term matrix] is the matrix whose rows are indexed by the documents of the corpus and whose columns are indexed by the terms. The (i,j)th entry gives the frequency of term tj in the document dj.
Kernel Matrix From bag of words to semantic space • The term-document matrix D´ is the transpose of D, the document-term matrix. • The term-by-term matrix is given by D´D while the document-by-document matrix by DD´.
From bag of words to semantic space • Document representation provided by the VSM ignores any semantic relation between words. • Synonymous words give two ways of saying the same thing, but are assigned to distinct components. • Homonymy, when a word has two meaning, is even more difficult to handle.
From bag of words to semantic space • The first approach to deal with this problem is to assign different weights wi to each coordinate. • Remove words like “and”, “of”, etc. That is, assign a 0 in the matrix. This words are know as stop words, and the whole list, stop list. • The length of the document.
Vector space kernels • Definition 3: [vector space kernel]
Relevance matrix Proximity Matrix Vector space kernels • Designing semantic kernels. • First, a lineal transformation is considered: Where S can be a matrix that could be diagonal, square or, in general N*k matrix.
Vector space kernels • Term weighting. • Not all the words have the same importance. • The entropy of the frequency of a word across a corpus can be used to quantify the amount of information carried by a word. • Also, a mesuare of the importance of a word with respect to the given topic, such a mutual information.
Vector space kernels • W(t) is the inverse document frequency.
Vector space kernels • Our matrix R is:
Vector space kernels • Term proximity matrix. • Idea is recognize when two terms are semantically related. • Term weighting is not capable of establishing a relation between two documents that share no terms.
Vector space kernels • Corresponds to representing a document by a less sparse vector that has no zero entries for all terms that are semantically similar to those present in the document d. • It is similar like performing a query expansion, where the query is expanded with semantically related terms.
Vector space kernels • PP´ is view as the semantic strength between terms. If Q = PP´: Remark: [Stemming] differents forms of a words can be treated as equivalent terms in order to perform a reduction of the space.
Vector space kernels • Proximity Matrix. • Can be computed by external source of knowledges, like Wordnet. • The inverse of the distance in a tree. • Generalized vector Space model. • Latent Semantic Indexing.
Vector space kernels • Generalized vector space model (GSVM). • Tries to overcome the problem of semantics by looking at term-term co-occurrence information. • Two terms are considered semantically related if they frequently co-occur in the same documents. This means, two documents can be seem similar even if they do not share any terms, but the terms they contain co-occur in other documents.
Vector space kernels • GVSM Kernel: Two terms co-ocurring in a document are considered related with the streght of the relationship given by the frequency and number of their co-occurences
Vector space kernels • Latent Semantic Index. • Mathematical foundations - PCA. • Psycho-linguistics grounds. • Philosophical explanation.
Principal Component Analysis • A low dimensional representation of the data. • Relation between features. • PCA tries to find a low-rank approximation, where the quality of the approximation depends on how close the data is to lying in a subspace of the given dimensionality.
Latent Semantic Analysis • Psycho-linguistics grounds: • Provides a way to simulate human verbal knowledge (Dumais 1997, Laham 1998). • Learns the meaning of words by noting in which contexts these words are uttered. • It acts like children who acquire word meanings not through explicit definitions but by observing how the words are used. (Kintsch 2002)
Latent Semantic Analysis • Psycho-linguistics grounds: • Make a true semantic representation from the word-use statistics. • Words that are similar in meaning are often expressed in different contexts. • Example: “mountain” and “mountains”. • People not only use words, but also images, feelings, actions, etc.
Latent Semantic Analysis • Psycho-linguistics grounds: • According to PL, LSA is a pale reflection, but a reflection. • LSA also offers an explanation of how people can agree well enough to share meaning (Landauer 2002).
Latent Semantic Analysis • Philosophical explanation. • Words meanings are not to be defined, but can only be characterized by their “family resemblance”. • Some quotations: • (224) “The words ´agreement´ and the word ´rule´ are related to one another, they are cousins. If I teach one the use of one word, he learns the use of the other word with it.” • (225) “The use of the word ´ same´ and the use of the word ´rule ´ are interwoven.(As are the use of ´proposition´ and the use of ´true´)”.
Latent Semantic Analysis • Some quotations: • (264) “Once you know what the word stands for, you understand it, you know its whole use.” • (340) “One can not guess how a word functions. One has to look at its use and learn from that.” • (371) “Essence is expressed in Grammar.”
Latent Semantic Kernels • Latent Semantic Kernels • Semantic Information is extracted by means of LSA/LSI, that is, by means of the Singular Value Decomposition (SVD). LSI uses a reduction of the first k columns of U.
Latent Semantic Kernels • Latent Semantic Kernels • The eigenvectors for a set of documents can be viewed as concepts described by a linear combination of terms chosen in such a way that documents are described as accurately as possible using only k such concepts. • Terms that co-occur frequently will tend to align in the same eigenvectors.
Latent Semantic Kernels • Latent Semantic Kernels • Multilinguality: • The semantic space proposed here provides an ideal representation for performing multilingual information retrieval.
Latent Semantic Kernels • SVD is expensive to compute. • Cristianini developed an approximation strategy, based on the Gram-Schmidt decomposition.
Gram-Schmidt decomposition • Goal: A set of vectors u1,....,un is converted into a set of orthogonal vectors q1,....,qn. • Remark: Orthogonal <x,y>=0, that is, they are perpendicular. • statistically independent. • q´s are unitary vectors.
Latent Semantic Kernels • Gram-Schmidt decomposition. • The first vector v1 = u1. • Then, every vector ui is made orthogonal to u1,….,ui-1 by subtraction of the projections of vi in the directions of u1,…., ui-1:
Latent Semantic Kernels • Results: • Only in some cases the results were improved by the use of Latent Semantic or Gram-Schmidt kernels, but they never decreased the performance. • The test were run on two text collections Reuters and Medline.
Latent Semantic Kernels • The reuters corpus consist of news stories of the Reuters magazine • The Medline corpus consists of 1033 medical documents and 30 queries obtained from the National Library of Medicine. • 90% was used to train the classifier and 10% for evaluation.
Latent Semantic Kernels • Punctuaction is removed. • Words were stemmed with Porter stemmer. • The term were weighted according to: • They selected 2000 out of 3000 documents from the reuters collection for training.
Latent Semantic Kernels Query Q in any Language Tokenizer S Controller Q Q Histogram H of substrings Google H Filter Set S of Snippets Set F of filtered Substrings D K D LSK LSA Uk
Conclusions and Questions • What is the one of the main problems of VSM? • How does the kernel technique deal with those problems? • Which technique is used to measure the semantic relation between terms? • Why is this technique so important? • What do its components mean? • What is the major drawback? • How can we tackle this drawback?
From bag of words to semantic space • Denifintion 3: [On successive embeddings] are operations that can be performed in sequence, which add some additional refinement to the semantics of the representation. For example, term weighting and normalization.