Polarity Inducing Latent Semantic Analysis

Polarity Inducing Latent Semantic Analysis A vector space model that can distinguish Antonyms from Synonyms! Scott Wen-tau Yih Joint work withGeoffrey Zweig & John Platt Microsoft Research

vq Vector Space Model vd • Text objects (e.g., words, phrases, sentences or documents) are represented as vectors • High-dimensional sparse term-vectors • Concept vectors from topic models or projection methods • Constructed compositionally from word vectors [Socher et al. 12] • Relations of the text objects are estimated by functions in the vector space • Relatedness is measured by some distance function (e.g., cosine)  cos()

Applications of Vector Space Models • Document Level • Information Retrieval [Salton & McGill 83] • Document Clustering [Deerwester et al. 90] • Search Relevance Measurement [Baeza-Yates & Riberio-Neto ’99] • Cross-lingual document retrieval [Platt et al. 10; Yih et al. 11] • Word Level • Language modeling [Bellegarda 00] • Word similarity and relatedness [Deerwester et al. 90; Lin 98; Turney01; Turney & Littman 05; Agirreet al. 09; Reisinger& Mooney 10; Yih & Qazvinian 12]

Beyond General Similarity • Existing VSMs cannot distinguish finer relations • The “antonym” issue of distributional similarity • The co-occurrence or distributional hypotheses • Apply to near-synonyms, hypernyms and other semantically related words, including antonyms[Mohammad et al. 08] • e.g., “hot” and “cold” occur in similar contexts • LSA does not solve the issue • Might assign a high degree of similarity to opposites as well as synonyms [Landauer& Laham 98]

Approaches for Detecting Antonyms • Separate antonyms from distributionally similar word pairs [Lin et al. 03] • Patterns: “from X to Y”, “either X or Y” • WordNet graph [Harabagiu et al. 06] • Synsets connected by is-a links and exactly one antonymy link • WordNet + affix rules + heuristics [Mohammad et al. 08] Distinguishing synonyms and antonyms is still perceived as a difficult open problem…[Poon & Domingos 09]

Our Contributions • Polarity Inducing Latent Semantic Analysis (PILSA) • A vector space model that encodes polarity information • Synonyms cluster together in this space • Antonyms lie at the opposite ends of a unit sphere burning hot freezing cold

Our Contributions • Polarity Inducing Latent Semantic Analysis (PILSA) • A vector space model that encodes polarity information • Synonyms cluster together in this space • Antonyms lie at the opposite ends of a unit sphere • Significantly improved the prediction accuracy on a benchmark GRE dataset ()

Roadmap • Introduction • Polarity Inducing Latent Semantic Analysis • Basic construction • Extension 1: Improving accuracy • Extension 2: Improving coverage • Experimental evaluation • Task & datasets • Results • Conclusion

The Core Method Input: A thesaurus (with synonyms & antonyms) • Create a “document”-term matrix • Each group of words (synonyms and antonyms) is treated as a “document” • Induce polarity by making antonyms have negative weights • Apply SVD as in regular Latent Semantic Analysis

Matrix Construction • Acrimony: rancor, conflict, bitterness; goodwill, affection • Affection: goodwill, tenderness, fondness; acrimony, rancor Document: row-vector Term: column-vector TFIDF score

Matrix Construction • Acrimony: rancor, conflict, bitterness; goodwill, affection • Affection: goodwill, tenderness, fondness; acrimony, rancor Inducing polarity Cosine Score:

Effect of Inducing Polarity

Effect of Inducing Polarity Cosine similarity = 1

Effect of Inducing Polarity Cosine similarity = 1 Cannot distinguish antonyms from synonyms!

Effect of Inducing Polarity Cosine similarity = 1

Effect of Inducing Polarity Cosine similarity = -1

Mapping to Latent Space via SVD words • Word similarity: cosine of two columns in • SVD generalizes and smooths the original data • Uncovers relationships not explicit in the thesaurus

Mapping to Latent Space via SVD words • As , can be viewed as the projection matrix that maps the raw column-vector to the -dimensional latent space

Extension 1: Improve Accuracy • Refine the projection matrix by discriminative training • S2Net [Yih et al. 11]: very similar to RankNet[Burges et al. 05] but focuses on learning concept vectors

Applying S2Net • Training data: Antonym pairs from thesaurus • Initialize model with the PILSA projection matrix • Learning objective: cosine score of antonyms should be lower than other word pairs Other word pair Antonyms

Extension 2: Improve Coverage • What to do with out-of-thesaurus words? • Some lexical variations • Encarta thesaurus contains “corruptible” and “corruption”, but not “corruptibility” • Morphological analysis and stemming to find alternatives of an out-of-thesaurus target word • Rare or offensive words • e.g., “froward” and “moronic” • Embedding out-of-thesaurus words by leveraging a general corpus

Embedding Out-of-thesaurus Words • Create a context vector space model using a collection of documents (e.g., Wikipedia) • Context: words within a window of [-10,10] • Embed target word into the PILSA space by -NN • Find nearby in-thesaurus words in the context space • Remove words with inconsistent polarity • Use the centroid of the corresponding PILSA vectors to represent the target word

Embedding Out-of-thesaurus Words • Create a context vector space model using a collection of documents (e.g., Wikipedia) • Context: words within a window of [-10,10] • Embed target word into the PILSA space by -NN hot sweltering burning cold Context Vector Space PILSA Space

Roadmap • Introduction • Polarity Inducing Latent Semantic Analysis • Basic construction • Extension 1: Improving accuracy • Extension 2: Improving coverage • Experimental evaluation • Task & datasets • Results • Conclusion

Data for Building PILSA Models • Encarta Thesaurus (for basic PILSA) • 47k word categories (i.e., the “documents”) • Vocabulary of 50k words • 125,724 pairs of antonyms • Wikipedia (for embedding out-of-thesaurus words) • Sentences from a Nov-2010 snapshot • 917M words after preprocessing

Experimental Evaluation • Task: GRE closest-opposite questions • Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correct • Dev / Test: 162 / 950 questions [Mohammad et al. 08] • Dev set is used for tuning the dimensionality of PILSA • Evaluation metric • Accuracy: #correct / #total questions • Questions with unresolved out-of-thesaurus target words are treated answered incorrectly

Results on Test Set

Examples • Target word: admirable • No polarity – LSA • Most Similar: commendable, creditable, despicable • Least Similar: uninviting, dessert, seductive • With polarity – PILSA • Most Similar: commendable, creditable, laudable • Least Similar: despicable, shameful, unworthy • Full results on GRE test set are available online

Conclusion • Polarity Inducing LSA • Solves the open problem of antonyms/synonyms by making a vector space that can distinguish opposites • Vector space designed so that synonyms/antonyms tend to have positive/negative cosine similarity • Future Work • New methods or representations for other word relations • e.g., Part-Whole, Is-A, Attribute • Applications • e.g., Textual Entailment or Sentence Completion

Polarity Inducing Latent Semantic Analysis

Polarity Inducing Latent Semantic Analysis

Presentation Transcript

An Introduction to Latent Semantic Analysis

Latent Semantic Analysis (LSA)

Latent Semantic Indexing: A probabilistic Analysis

Latent Semantic Analysis

IR Models: Latent Semantic Analysis

Lecture 5: Probabilistic Latent Semantic Analysis

Multi-Relational Latent Semantic Analysis .

Paper: Indexing by Latent Semantic Analysis

An Introduction to Latent Semantic Analysis

Lecture 5: Probabilistic Latent Semantic Analysis

Indexing by Latent Semantic Analysis

Bayesian Learning for Latent Semantic Analysis

Probabilistic Latent Semantic Analysis

Latent Semantic Analysis

Introducing Latent Semantic Analysis

Latent Semantic Indexing

Latent Semantic Indexing: A probabilistic Analysis

Latent Semantic Analysis (LSA)

Latent Semantic Analysis