290 likes | 310 Views
Explore how the vector space model PILSA distinguishes synonyms and antonyms, enhancing prediction accuracy. Learn about the method, construction, and extensions.
E N D
Polarity Inducing Latent Semantic Analysis A vector space model that can distinguish Antonyms from Synonyms! Scott Wen-tau Yih Joint work withGeoffrey Zweig & John Platt Microsoft Research
vq Vector Space Model vd • Text objects (e.g., words, phrases, sentences or documents) are represented as vectors • High-dimensional sparse term-vectors • Concept vectors from topic models or projection methods • Constructed compositionally from word vectors [Socher et al. 12] • Relations of the text objects are estimated by functions in the vector space • Relatedness is measured by some distance function (e.g., cosine) cos()
Applications of Vector Space Models • Document Level • Information Retrieval [Salton & McGill 83] • Document Clustering [Deerwester et al. 90] • Search Relevance Measurement [Baeza-Yates & Riberio-Neto ’99] • Cross-lingual document retrieval [Platt et al. 10; Yih et al. 11] • Word Level • Language modeling [Bellegarda 00] • Word similarity and relatedness [Deerwester et al. 90; Lin 98; Turney01; Turney & Littman 05; Agirreet al. 09; Reisinger& Mooney 10; Yih & Qazvinian 12]
Beyond General Similarity • Existing VSMs cannot distinguish finer relations • The “antonym” issue of distributional similarity • The co-occurrence or distributional hypotheses • Apply to near-synonyms, hypernyms and other semantically related words, including antonyms[Mohammad et al. 08] • e.g., “hot” and “cold” occur in similar contexts • LSA does not solve the issue • Might assign a high degree of similarity to opposites as well as synonyms [Landauer& Laham 98]
Approaches for Detecting Antonyms • Separate antonyms from distributionally similar word pairs [Lin et al. 03] • Patterns: “from X to Y”, “either X or Y” • WordNet graph [Harabagiu et al. 06] • Synsets connected by is-a links and exactly one antonymy link • WordNet + affix rules + heuristics [Mohammad et al. 08] Distinguishing synonyms and antonyms is still perceived as a difficult open problem…[Poon & Domingos 09]
Our Contributions • Polarity Inducing Latent Semantic Analysis (PILSA) • A vector space model that encodes polarity information • Synonyms cluster together in this space • Antonyms lie at the opposite ends of a unit sphere burning hot freezing cold
Our Contributions • Polarity Inducing Latent Semantic Analysis (PILSA) • A vector space model that encodes polarity information • Synonyms cluster together in this space • Antonyms lie at the opposite ends of a unit sphere • Significantly improved the prediction accuracy on a benchmark GRE dataset ()
Roadmap • Introduction • Polarity Inducing Latent Semantic Analysis • Basic construction • Extension 1: Improving accuracy • Extension 2: Improving coverage • Experimental evaluation • Task & datasets • Results • Conclusion
The Core Method Input: A thesaurus (with synonyms & antonyms) • Create a “document”-term matrix • Each group of words (synonyms and antonyms) is treated as a “document” • Induce polarity by making antonyms have negative weights • Apply SVD as in regular Latent Semantic Analysis
Matrix Construction • Acrimony: rancor, conflict, bitterness; goodwill, affection • Affection: goodwill, tenderness, fondness; acrimony, rancor Document: row-vector Term: column-vector TFIDF score
Matrix Construction • Acrimony: rancor, conflict, bitterness; goodwill, affection • Affection: goodwill, tenderness, fondness; acrimony, rancor Inducing polarity Cosine Score:
Effect of Inducing Polarity Cosine similarity = 1
Effect of Inducing Polarity Cosine similarity = 1 Cannot distinguish antonyms from synonyms!
Effect of Inducing Polarity Cosine similarity = 1
Effect of Inducing Polarity Cosine similarity = -1
Mapping to Latent Space via SVD words • Word similarity: cosine of two columns in • SVD generalizes and smooths the original data • Uncovers relationships not explicit in the thesaurus
Mapping to Latent Space via SVD words • As , can be viewed as the projection matrix that maps the raw column-vector to the -dimensional latent space
Extension 1: Improve Accuracy • Refine the projection matrix by discriminative training • S2Net [Yih et al. 11]: very similar to RankNet[Burges et al. 05] but focuses on learning concept vectors
Applying S2Net • Training data: Antonym pairs from thesaurus • Initialize model with the PILSA projection matrix • Learning objective: cosine score of antonyms should be lower than other word pairs Other word pair Antonyms
Extension 2: Improve Coverage • What to do with out-of-thesaurus words? • Some lexical variations • Encarta thesaurus contains “corruptible” and “corruption”, but not “corruptibility” • Morphological analysis and stemming to find alternatives of an out-of-thesaurus target word • Rare or offensive words • e.g., “froward” and “moronic” • Embedding out-of-thesaurus words by leveraging a general corpus
Embedding Out-of-thesaurus Words • Create a context vector space model using a collection of documents (e.g., Wikipedia) • Context: words within a window of [-10,10] • Embed target word into the PILSA space by -NN • Find nearby in-thesaurus words in the context space • Remove words with inconsistent polarity • Use the centroid of the corresponding PILSA vectors to represent the target word
Embedding Out-of-thesaurus Words • Create a context vector space model using a collection of documents (e.g., Wikipedia) • Context: words within a window of [-10,10] • Embed target word into the PILSA space by -NN hot sweltering burning cold Context Vector Space PILSA Space
Roadmap • Introduction • Polarity Inducing Latent Semantic Analysis • Basic construction • Extension 1: Improving accuracy • Extension 2: Improving coverage • Experimental evaluation • Task & datasets • Results • Conclusion
Data for Building PILSA Models • Encarta Thesaurus (for basic PILSA) • 47k word categories (i.e., the “documents”) • Vocabulary of 50k words • 125,724 pairs of antonyms • Wikipedia (for embedding out-of-thesaurus words) • Sentences from a Nov-2010 snapshot • 917M words after preprocessing
Experimental Evaluation • Task: GRE closest-opposite questions • Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correct • Dev / Test: 162 / 950 questions [Mohammad et al. 08] • Dev set is used for tuning the dimensionality of PILSA • Evaluation metric • Accuracy: #correct / #total questions • Questions with unresolved out-of-thesaurus target words are treated answered incorrectly
Examples • Target word: admirable • No polarity – LSA • Most Similar: commendable, creditable, despicable • Least Similar: uninviting, dessert, seductive • With polarity – PILSA • Most Similar: commendable, creditable, laudable • Least Similar: despicable, shameful, unworthy • Full results on GRE test set are available online
Conclusion • Polarity Inducing LSA • Solves the open problem of antonyms/synonyms by making a vector space that can distinguish opposites • Vector space designed so that synonyms/antonyms tend to have positive/negative cosine similarity • Future Work • New methods or representations for other word relations • e.g., Part-Whole, Is-A, Attribute • Applications • e.g., Textual Entailment or Sentence Completion