120 likes | 434 Views
Dimension of Meaning. Author: Hinrich Schutze Presenter: Marian Olteanu. Introduction. Represent context as vectors Dimensions of space – words Initial vectors – determined by word occurrence This paper – reduce dimensionality by singular value decomposition Applications WSD
E N D
Dimension of Meaning Author: Hinrich Schutze Presenter: Marian Olteanu
Introduction • Represent context as vectors • Dimensions of space – words • Initial vectors – determined by word occurrence • This paper – reduce dimensionality by singular value decomposition • Applications • WSD • Thesaurus induction
Introduction • Classic scheme in IR • Documents are represented as vectors of words in term space • Extension – represent contexts as vectors of words within a fixed window • Disadvantage – content can be expressed with different words, close in meaning • This approach • Represent words as term vectors that reflect their pattern of usage in a large corpus
Introduction • Dimension in this space: • Cash • Sport • Measure • Cosine of the angle between vectors
Introduction • Compute a representation of context more robust than bag-of-words • Centroid (normalized average) of the vectors of the words in a context • Practical applications • Thousands of dimensions (words) • Matrix of concurrence with only 10% zeros
Application • WSD • Done by clustering the contexts • AutoClass • Buckshot • Assign a sense for each cluster
Discussion • Resembles LSI • Uses SVD • Purpose of space reduction • LSI – improve the quality of representation (because of null values) • This paper • Reducing the computation • Detection of term dependencies (similar terms) • SVD doesn’t influence accuracy of WSD
Discussion • Small number of parameters (thousands) compared to other statistical approaches (i.e.: trigrams)