1 / 29

Polarity Inducing Latent Semantic Analysis

Polarity Inducing Latent Semantic Analysis. A vector space model that can distinguish Antonyms from Synonyms!. Scott Wen-tau Yih Joint work with Geoffrey Zweig & John Platt Microsoft Research. v q. Vector Space Model. v d.

villicana
Download Presentation

Polarity Inducing Latent Semantic Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Polarity Inducing Latent Semantic Analysis A vector space model that can distinguish Antonyms from Synonyms! Scott Wen-tau Yih Joint work withGeoffrey Zweig & John Platt Microsoft Research

  2. vq Vector Space Model vd • Text objects (e.g., words, phrases, sentences or documents) are represented as vectors • High-dimensional sparse term-vectors • Concept vectors from topic models or projection methods • Constructed compositionally from word vectors [Socher et al. 12] • Relations of the text objects are estimated by functions in the vector space • Relatedness is measured by some distance function (e.g., cosine)  cos()

  3. Applications of Vector Space Models • Document Level • Information Retrieval [Salton & McGill 83] • Document Clustering [Deerwester et al. 90] • Search Relevance Measurement [Baeza-Yates & Riberio-Neto ’99] • Cross-lingual document retrieval [Platt et al. 10; Yih et al. 11] • Word Level • Language modeling [Bellegarda 00] • Word similarity and relatedness [Deerwester et al. 90; Lin 98; Turney01; Turney & Littman 05; Agirreet al. 09; Reisinger& Mooney 10; Yih & Qazvinian 12]

  4. Beyond General Similarity • Existing VSMs cannot distinguish finer relations • The “antonym” issue of distributional similarity • The co-occurrence or distributional hypotheses • Apply to near-synonyms, hypernyms and other semantically related words, including antonyms[Mohammad et al. 08] • e.g., “hot” and “cold” occur in similar contexts • LSA does not solve the issue • Might assign a high degree of similarity to opposites as well as synonyms [Landauer& Laham 98]

  5. Approaches for Detecting Antonyms • Separate antonyms from distributionally similar word pairs [Lin et al. 03] • Patterns: “from X to Y”, “either X or Y” • WordNet graph [Harabagiu et al. 06] • Synsets connected by is-a links and exactly one antonymy link • WordNet + affix rules + heuristics [Mohammad et al. 08] Distinguishing synonyms and antonyms is still perceived as a difficult open problem…[Poon & Domingos 09]

  6. Our Contributions • Polarity Inducing Latent Semantic Analysis (PILSA) • A vector space model that encodes polarity information • Synonyms cluster together in this space • Antonyms lie at the opposite ends of a unit sphere burning hot freezing cold

  7. Our Contributions • Polarity Inducing Latent Semantic Analysis (PILSA) • A vector space model that encodes polarity information • Synonyms cluster together in this space • Antonyms lie at the opposite ends of a unit sphere • Significantly improved the prediction accuracy on a benchmark GRE dataset ()

  8. Roadmap • Introduction • Polarity Inducing Latent Semantic Analysis • Basic construction • Extension 1: Improving accuracy • Extension 2: Improving coverage • Experimental evaluation • Task & datasets • Results • Conclusion

  9. The Core Method Input: A thesaurus (with synonyms & antonyms) • Create a “document”-term matrix • Each group of words (synonyms and antonyms) is treated as a “document” • Induce polarity by making antonyms have negative weights • Apply SVD as in regular Latent Semantic Analysis

  10. Matrix Construction • Acrimony: rancor, conflict, bitterness; goodwill, affection • Affection: goodwill, tenderness, fondness; acrimony, rancor Document: row-vector Term: column-vector TFIDF score

  11. Matrix Construction • Acrimony: rancor, conflict, bitterness; goodwill, affection • Affection: goodwill, tenderness, fondness; acrimony, rancor Inducing polarity Cosine Score:

  12. Effect of Inducing Polarity

  13. Effect of Inducing Polarity Cosine similarity = 1

  14. Effect of Inducing Polarity Cosine similarity = 1 Cannot distinguish antonyms from synonyms!

  15. Effect of Inducing Polarity Cosine similarity = 1

  16. Effect of Inducing Polarity Cosine similarity = -1

  17. Mapping to Latent Space via SVD words • Word similarity: cosine of two columns in • SVD generalizes and smooths the original data • Uncovers relationships not explicit in the thesaurus

  18. Mapping to Latent Space via SVD words • As , can be viewed as the projection matrix that maps the raw column-vector to the -dimensional latent space

  19. Extension 1: Improve Accuracy • Refine the projection matrix by discriminative training • S2Net [Yih et al. 11]: very similar to RankNet[Burges et al. 05] but focuses on learning concept vectors

  20. Applying S2Net • Training data: Antonym pairs from thesaurus • Initialize model with the PILSA projection matrix • Learning objective: cosine score of antonyms should be lower than other word pairs Other word pair Antonyms

  21. Extension 2: Improve Coverage • What to do with out-of-thesaurus words? • Some lexical variations • Encarta thesaurus contains “corruptible” and “corruption”, but not “corruptibility” • Morphological analysis and stemming to find alternatives of an out-of-thesaurus target word • Rare or offensive words • e.g., “froward” and “moronic” • Embedding out-of-thesaurus words by leveraging a general corpus

  22. Embedding Out-of-thesaurus Words • Create a context vector space model using a collection of documents (e.g., Wikipedia) • Context: words within a window of [-10,10] • Embed target word into the PILSA space by -NN • Find nearby in-thesaurus words in the context space • Remove words with inconsistent polarity • Use the centroid of the corresponding PILSA vectors to represent the target word

  23. Embedding Out-of-thesaurus Words • Create a context vector space model using a collection of documents (e.g., Wikipedia) • Context: words within a window of [-10,10] • Embed target word into the PILSA space by -NN hot sweltering burning cold Context Vector Space PILSA Space

  24. Roadmap • Introduction • Polarity Inducing Latent Semantic Analysis • Basic construction • Extension 1: Improving accuracy • Extension 2: Improving coverage • Experimental evaluation • Task & datasets • Results • Conclusion

  25. Data for Building PILSA Models • Encarta Thesaurus (for basic PILSA) • 47k word categories (i.e., the “documents”) • Vocabulary of 50k words • 125,724 pairs of antonyms • Wikipedia (for embedding out-of-thesaurus words) • Sentences from a Nov-2010 snapshot • 917M words after preprocessing

  26. Experimental Evaluation • Task: GRE closest-opposite questions • Which is the closest opposite of adulterate?(a) renounce (b) forbid (c) purify (d) criticize (e) correct • Dev / Test: 162 / 950 questions [Mohammad et al. 08] • Dev set is used for tuning the dimensionality of PILSA • Evaluation metric • Accuracy: #correct / #total questions • Questions with unresolved out-of-thesaurus target words are treated answered incorrectly

  27. Results on Test Set

  28. Examples • Target word: admirable • No polarity – LSA • Most Similar: commendable, creditable, despicable • Least Similar: uninviting, dessert, seductive • With polarity – PILSA • Most Similar: commendable, creditable, laudable • Least Similar: despicable, shameful, unworthy • Full results on GRE test set are available online

  29. Conclusion • Polarity Inducing LSA • Solves the open problem of antonyms/synonyms by making a vector space that can distinguish opposites • Vector space designed so that synonyms/antonyms tend to have positive/negative cosine similarity • Future Work • New methods or representations for other word relations • e.g., Part-Whole, Is-A, Attribute • Applications • e.g., Textual Entailment or Sentence Completion

More Related