230 likes | 246 Views
This study explores the utility of Latent Semantic Analysis (LSA) for discriminating word senses, using context vectors and clustering paradigms. Various experiments are conducted to assess clustering quality and discrimination accuracy. The research compares performance on polysemes and homonyms, concluding on the effectiveness of LSA for sense discrimination.
E N D
Evaluation of Utility of LSA for Word Sense Discrimination Esther Levin, Mehrbod Sharifi, Jerry Ball http://www-cs.ccny.cuny.edu/~esther/research/lsa/
Outline • Latent Semantic Analysis (LSA) • Word sense discrimination through Context Group Discrimination Paradigm • Experiments • Sense-based clusters (supervised learning) • K-means clustering (unsupervised learning) • Homonyms vs. Polysemes • Conclusions
Latent Semantic Analysis (LSA)Deerwester ’90 • Represents words and passages as vectors in the same (low-dimensional) semantic space • Similarity in word meaning is defined by similarity of their contexts.
LSA Steps • Document-Term Co-occurrence Matrix e.g., 1151 documents X 5793 terms • Compute SVD • Reduce dimension by taking k largest singular values • Compute the new vector representations for documents • [Our Research] Clustering the new context vectors
Context Group Discrimination ParadigmShutze ’98 • Inducing senses of ambiguous words from their contextual similarity Context Vectors of an ambiguous word
3. Classify new contexts based on distance to centroids b a Sense 1 Sense 2 Context Group Discrimination ParadigmShutze ’98 2. Compute the centroids (sense vectors) 1. Cluster the context vectors a < b
Experimental Setup • Corpus – Leacock `93 • Line (3 senses – 1151 instances) • Hard (2 senses – 752 instances) • Serve (2 senses – 1292 instances) • Interest (3 senses – 2113 instances) • Context size: full document (small paragraph) • Number of clusters = Number of senses
Research Objective • How well the different senses of ambiguous words are separated in the LSA-based vector space. • Parameters: • Dimensionality of LSA representation • Distance measure • L1: City Block • L2: Squared Euclidean • Cosine
Best Case Separation Worst Case Separation Sense-based Clusters • An instance of supervised learning • An upper bound on unsupervised performance of K-means or EM • Not influenced by the choice of clustering algorithm
Sense-based Clusters: Accuracy • Training: Finding sense vectors based on 90% of data • Testing: Assigning the 10% remaining data to the closest sense vectors and evaluate by comparing this assignment to sense tags • Random selection, cross validation
Evaluating Clustering Quality:Tightness and Separation • Dispersion: Inter-cluster (K-Means minimizes) • Silhouette: Intra-cluster a(i): average distance of point i to all other points in the same cluster b(i): average distance of point i to the points in closest cluster
Closest Cluster i • Points are perfectly clustered • Points can belong one cluster or another • Points belong to wrong cluster More on Silhouette Value a(i) average of all blue lines b(i) average of all yellow lines
Evaluating Clustering Quality:Tightness and Separation Average Silhouette Value Cosine 0.9639 L1 0.7355 L2 0.9271 Cosine -0.0876 L1 -0.0504 L2 -0.0879
Sense-based Clusters:Discrimination Accuracy Baseline: Percentage of the majority sense
Sense-based Clusters:Results • Good discrimination accuracy • Low silhouette value • How is that possible?
Start with sense vector Most compact result Start randomly Sense-based clustering Training/Testing Unsupervised Learning with K-means • Cosine measure
Polysemes vs. Homonyms • Polysemes: words with multiple related meanings • Homonyms: words with the same spelling but completely different meaning
… find it hard to believe … … exactly how to say a lineand … … about 30 minutes and serve warm … … set the interest rate on the … … find it x to believe … … exactly how to say a xand … … about 30 minutes and x warm … … set the x rate on the … Pseudo Words as HomonymsShutze ’98
Points on red lines are the most compact cluster out of 10 experiments Dimensions (Pseudo Words) Polysemes vs. Homonyms: In LSA Space The correlation between compactness of clusters and discrimination accuracy is higher for homonyms than polysemes
Conclusions • Good unsupervised sense discrimination performance for homonyms • Major deterioration in sense discrimination of polysemes in absence of supervision • Dimensionality reduction benefit is computational only (no peak in performance) • Cosine measure performs better than L1 and L2