1 / 69

Chapter 20 Part 3

Chapter 20 Part 3. Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani Nenkova. 1. Similarity Metrics. Similarity metrics are useful not just for word sense disambiguation, but also for:

eavan
Download Presentation

Chapter 20 Part 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 20Part 3 Computational Lexical Semantics Acknowledgements: these slides include material from Dan Jurafsky, Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani Nenkova 1

  2. Similarity Metrics • Similarity metrics are useful not just for word sense disambiguation, but also for: • Finding topics of documents • Representing word meanings, not with respect to a fixed sense inventory • We will start with dictionary based methods and then look at vector space models

  3. Thesaurus-based word similarity • We could use anything in the thesaurus • Meronymy • Glosses • Example sentences • In practice • By “thesaurus-based” we just mean • Using the is-a/subsumption/hypernym hierarchy • Can define similarity between words or between senses

  4. Path based similarity • Two senses are similar if nearby in thesaurus hierarchy (i.e. short path between them)

  5. path-based similarity • pathlen(c1,c2) = number of edges in the shortest path between the sense nodes c1 and c2 • wordsim(w1,w2) = • maxc1senses(w1),c2senses(w2)pathlen(c1,c2)

  6. Problem with basic path-based similarity • Assumes each link represents a uniform distance • But, some areas of WordNet are more developed than others • Depended on the people who created it • Also, links deep in the hierarchy are intuitively more narrow than links higher up [on slide 4, e.g., nickel to money vs nickel to standard]

  7. Information content similarity metrics • Let’s define P(C) as: • The probability that a randomly selected word in a corpus is an instance of concept c • A word is an instance of a concept if it appears below the concept in the WordNet hierarchy • We saw this idea when we covered selectional preferences

  8. In particular • If there is a single node that is the ancestor of all nodes, then its probability is 1 • The lower a node in the hierarchy, the lower its probability • An occurrence of the word dime would count towards the frequency of coin, currency, standard, etc.

  9. Information content similarity • Train by counting in a corpus • 1 instance of “dime” could count toward frequency of coin, currency, standard, etc • More formally: Here N is the total number of words (tokens) in the corpus that are also in the thesaurus

  10. Information content similarity WordNet hierararchy augmented with probabilities P(C)

  11. Information content: definitions • Information content: • IC(c)=-logP(c) • Lowest common subsumer LCS(c1,c2) • I.e. the lowest node in the hierarchy • That subsumes (is a hypernym of) both c1 and c2

  12. Resnik method • The similarity between two senses is related to their common information • The more two senses have in common, the more similar they are • Resnik: measure the common information as: • The info content of the lowest common subsumer of the two senses • simresnik(c1,c2) = -log P(LCS(c1,c2))

  13. Example Use: • Yaw Gyamfi, Janyce Wiebe, Rada Mihalcea, and Cem Akkaya (2009). Integrating Knowledge for Subjectivity Sense Labeling. HLT-NAACL 2009.

  14. What is Subjectivity? • The linguisticexpression of somebody’s opinions, sentiments, emotions, evaluations, beliefs, speculations (private states) This particular use of subjectivity was adapted from literary theory Banfield 1982; Wiebe 1990

  15. Examples of Subjective Expressions • References to private states • She was enthusiastic about the plan • Descriptions • That would lead to disastrous consequences • What a freak show

  16. Subjectivity Analysis • Automatic extraction of subjectivity (opinions) from text or dialog

  17. Subjectivity Analysis: Applications • Opinion-oriented question answering:How do the Chinese regard the human rights record of the United States? • Product review mining:What features of the ThinkPad T43 do customers like and which do they dislike? • Review classification:Is a review positive or negative toward the movie? • Tracking sentiments toward topics over time:Is anger ratcheting up or cooling down? • Etc.

  18. Subjectivity Lexicons • Most approaches to subjectivity and sentiment analysis exploit subjectivity lexicons. • Lists of keywords that have been gathered together because they have subjective uses Brilliant Difference Hate Interest Love …

  19. Automatically Identifying Subjective Words • Much work in this area Hatzivassiloglou & McKeown ACL97 Wiebe AAAI00 Turney ACL02 Kamps & Marx 2002 Wiebe, Riloff, & Wilson CoNLL03 Yu & Hatzivassiloglou EMNLP03 Kim & Hovy IJCNLP05 Esuli & Sebastiani CIKM05 Andreevskaia & Bergler EACL06 Etc. Subjectivity Lexicon available at : http://www.cs.pitt.edu/mpqa Entries from several sources

  20. However… • Consider the keyword “interest” • It is in the subjectivity lexicon • But, what about “interest rate,” for example?

  21. WordNet Senses Interest, involvement -- (a sense of concern with and curiosity about someone or something; "an interest in music") Interest -- (a fixed charge for borrowing money; usually a percentage of the amount borrowed; "how much interest do you pay on your mortgage?")

  22. S O WordNet Senses Interest, involvement -- (a sense of concern with and curiosity about someone or something; "an interest in music") Interest -- (a fixed charge for borrowing money; usually a percentage of the amount borrowed; "how much interest do you pay on your mortgage?")

  23. Senses • Even in subjectivity lexicons, many senses of the keywords are objective • Thus, many appearances of keywords in texts are false hits

  24. WordNet Miller 1995; Fellbaum 1998

  25. Examples • “There are many differences between African and Asian elephants.” • “… dividing by the absolute value of the difference from the mean…” • “Their differences only grew as they spent more time together …” • “Her support really made a difference in my life” • “The difference after subtracting X from Y…”

  26. Our Task: Subjectivity Sense Labeling • Automatically classifying senses as subjective or objective • Purpose: exploit labels to improve • Word sense diambiguation Wiebe and Mihalcea ACL06 • Automatic subjectivity and sentiment analysis systems Akkaya, Wiebe, Mihalcea (2009,2010,2011,2012,2014)

  27. Sense O {1, 2, 5} Difference sense#1 O sense#2 O sense#3 S sense#4 S sense#5 O SWSD System Sense S {3,4} Subjectivity Tagging using Subjectivity WSD Subjectivity Or Sentiment Classifier “There are many differences between African and Asian elephants.” S O? S O? “Their differences only grew as they spent more time together …”

  28. Sense O {1, 2, 5} Difference sense#1 O sense#2 O sense#3 S sense#4 S sense#5 O Sense S {3,4} Subjectivity Tagging using Subjectivity WSD Subjectivity Or Sentiment Classifier “There are many differences between African and Asian elephants.” S O SWSD System S O “Their differences only grew as they spent more time together …”

  29. Using Hierarchical Structure LCS Target sense Seed sense

  30. Using Hierarchical Structure LCS voice#1 (objective)

  31. If you are interested in the entire approach and experiments, please see the paper (it is on my website)

  32. Dekang Lin method Dekang Lin. 1998. An Information-Theoretic Definition of Similarity. ICML • Intuition: Similarity between A and B is not just what they have in common • The more differences between A and B, the less similar they are: • Commonality: the more A and B have in common, the more similar they are • Difference: the more differences between A and B, the less similar • Commonality: IC(common(A,B)) • Difference: IC(description(A,B))-IC(common(A,B))

  33. Dekang Lin similarity theorem • Lin (altering Resnik) defines: • The similarity between A and B is measured by the ratio between the amount of information needed to state the commonality of A and B and the information needed to fully describe what A and B are

  34. Lin similarity function

  35. Summary: thesaurus-based similarity between senses • There are many metrics (you don’t have to memorize these)

  36. Using Thesaurus-Based Similarity for WSD • One specific method (Banerjee & Pedersen 2003): • For sense k of target word t: • SenseScore[k] = 0 • For each word w appearing within –N and +N of t: • For each sense s of w: • SenseScore[k] += similarity(k,s) • The sense with the highest SenseScore is assigned to the target word

  37. Problems with thesaurus-based meaning • We don’t have a thesaurus for every language • Even if we do, they have problems with recall • Many words are missing • Most (if not all) phrases are missing • Some connections between senses are missing • Thesauri work less well for verbs, adjectives • Adjectives and verbs have less structured hyponymy relations

  38. Distributional models of meaning • Also called vector-space models of meaning • Offer much higher recall than hand-built thesauri • Although they tend to have lower precision • Zellig Harris (1954): “oculist and eye-doctor … occur in almost the same environments…. If A and B have almost identical environments we say that they are synonyms. • Firth (1957): “You shall know a word by the company it keeps!”

  39. Intuition of distributional word similarity • Nida example: A bottle of tesgüino is on the table Everybody likes tesgüino Tesgüino makes you drunk We make tesgüino out of corn. • From context words humans can guess tesgüino means • an alcoholic beverage like beer • Intuition for algorithm: • Two words are similar if they have similar word contexts.

  40. Reminder: Term-document matrix • Each cell: count of term t in a document d: tft,d: • Each document is a count vector: a column below

  41. Reminder: Term-document matrix • Two documents are similar if their vectors are similar

  42. The words in a term-document matrix • Each word is a count vector: a row below

  43. The words in a term-document matrix • Two words are similar if their vectors are similar

  44. The Term-Context matrix • Instead of using entire documents, use smaller contexts • Paragraph • Window of 10 words • A word is now defined by a vector over counts of context words

  45. Sample contexts: 20 words (Brown corpus) • equal amount of sugar, a sliced lemon, a tablespoonful of apricotpreserve or jam, a pinch each of clove and nutmeg, • on board for their enjoyment. Cautiously she sampled her first pineappleand another fruit whose taste she likened to that of • of a recursive type well suited to programming on the digital computer. In finding the optimal R-stage policy from that of • substantially affect commerce, for the purpose of gathering data and information necessary for the study authorized in the first section of this

  46. Term-context matrix for word similarity • Two words are similar in meaning if their context vectors are similar

  47. Should we use raw counts? • For the term-document matrix • We used tf-idf instead of raw term counts • For the term-context matrix • Positive Pointwise Mutual Information (PPMI) is common

  48. Pointwise Mutual Information • Pointwise mutual information: • Do events x and y co-occur more than if they were independent? • PMI between two words: (Church & Hanks 1989) • Do words x and y co-occur more than if they were independent? • Positive PMI between two words (Niwa & Nitta 1994) • Replace all PMI values less than 0 with zero

  49. Computing PPMI on a term-context matrix • Matrix F with W rows (words) and C columns (contexts) • fij is # of times wi occurs in context cj

  50. p(w=information,c=data) = p(w=information) = p(c=data) = 6/19 = .32 = .58 11/19 = .37 7/19

More Related