1 / 53

Natural Language Processing

Natural Language Processing. Lecture 22—3-31-2015 Susan W. Brown. Semantics. What we’ve been discussing has mainly concerned sentence meanings. These refer to propositions, things that can be said to be true or false with respect to some model (world).

fcardona
Download Presentation

Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing Lecture 22—3-31-2015 Susan W. Brown

  2. Semantics What we’ve been discussing has mainly concerned sentence meanings. These refer to propositions, things that can be said to be true or false with respect to some model (world). Today we’ll look at a related notion called lexical semantics. Roughly what we mean when we talk about the meaning of a word.

  3. What is the meaning of a word? • A single word can be used in many different ways • Drew the water from the well • Drew the curtains • Drew the cart • Drew a crowd • Drew a picture • 20 most frequent nouns: an average of 8 senses. • 20 most frequent verbs: an average of 20 senses.

  4. Today • Lexical Semantics • Meaning relations between words • Lexical ambiguity • Word sense disambiguation • Quiz review

  5. WordNet • WordNet is a database of facts about words • Here we’ll only use the English WordNet; there are others • Meanings and the relations among them • www.cogsci.princeton.edu/~wn • Currently about 100,000 nouns, 11,000 verbs, 20,000 adjectives, and 4,000 adverbs • Not definitions like a dictionary

  6. Sin sets {gluttony, sloth, lust …}

  7. Synsets • Words are organized in synonym sets, each representing an underlying lexical concept • Verb {glut, gorge, overeat, binge, gormandize, scarf out …} • Noun {lust, lecherousness, lustfulness} • Adjective {lazy, faineant, indolent, otiose, slothful, work-shy} • Adverb {enviously, covetously, jealously}

  8. WordNet • Separate databases for nouns, verbs, adjectives and adverbs • Semantic relations contained within each word class—no links between word classes (except derivation info) • Many words are members of multiple synsets, which can be thought of as separate “senses” of the word

  9. Semantic RelationsNouns • Lexical hierarchy: hypernyms/hyponyms • Mammal/dog • Part-whole: meronyms/holonyms • Beak/bird • Antonyms • Victory/defeat • Derivationally related forms • Destruction/destroy

  10. WordNet Relations

  11. Semantic RelationsVerbs • Hypernyms • Run/travel rapidly, hurry, zip/move, locomote • Troponyms • Manner: run/jog, sprint, lope • Intensity: sleep/nap, hibernate, doze • Intension, degree of force, etc. • Antonyms • Derivationally related forms

  12. WordNet Hierarchies

  13. Inside Words • WordNet relations are called paradigmatic relations • They connect lexemes together in particular ways but don’t say anything about what the meaning representation of a particular lexeme should consist of • Semantic primes • Thematic roles

  14. Inside Words • In our semantics examples, we used various FOL predicates to capture various aspects of events, including the notion of roles • Havers, takers, givers, servers, etc. • All specific to each verb/predicate. • Thematic roles • Thematic roles are semantic generalizations over the specific roles that occur with specific verbs. • I.e. Takers, givers, eaters, makers, doers, killers, all have something in common • -er • They’re all the agents of the actions • We can generalize across other roles as well to come up with a small finite set of such roles

  15. Thematic Roles

  16. Thematic Roles • Takes some of the work away from the verbs. • It’s not the case that every verb is unique and has to completely specify how all of its arguments behave • Provides a locus for organizing semantic processing • It permits us to distinguish near surface-level semantics from deeper semantics

  17. Selection Restrictions • Consider.... • I want to eat someplace near campus

  18. Selection Restrictions • Consider.... • I want to eat someplace near campus • Using thematic roles we can now say that eat is a predicate that has an AGENT and a THEME • What else? • And that the AGENT must be capable of eating and the THEME must be something typically capable of being eaten

  19. As Logical Statements • For eat… • Eating(e) ^Agent(e,x)^ Theme(e,y)^Food(y) (adding in all the right quantifiers and lambdas)

  20. Back to WordNet • Use WordNet hyponyms (type) to encode the selection restrictions

  21. Specificity of Restrictions • Consider the verbs imagine, lift and diagonalize in the following examples • To diagonalize a matrix is to find its eigenvalues • Atlantis lifted Galileo from the pad • Imagine a tennis game • What can you say about THEME in each with respect to the verb? • Some will be high up in the WordNet hierarchy, others not so high…

  22. Problems • Unfortunately, verbs are polysemous and language is creative… WSJ examples… • … ate glass on an empty stomach accompanied only by water and tea • you can’t eat gold for lunch if you’re hungry • … get it to try to eat Afghanistan

  23. Solutions • Eat glass • Not really a problem. It is actually about an eating event • Eat gold • Also about eating, and the can’t creates a scope that permits the THEME to not be edible • Eat Afghanistan • This is harder, its not really about eating at all

  24. Discovering the Restrictions • Instead of hand-coding the restrictions for each verb, can we discover a verb’s restrictions by using a corpus and WordNet? • Parse sentences and find heads • Label the thematic roles • Collect statistics on the co-occurrence of particular headwords with particular thematic roles • Use the WordNet hypernym structure to find the most meaningful level to use as a restriction

  25. Lexical ambiguity • How well will this work with words like draw? • Possible themes? • Curtain • Picture • Crowd • Gun • Card • Conclusion • Water

  26. WSD and Selection Restrictions • Word sense disambiguation refers to the process of selecting the right sense for a word from among the senses that the word is known to have • Semantic selection restrictions can be used to disambiguate • Ambiguous arguments to unambiguous predicates • Ambiguous predicates with unambiguous arguments • Ambiguity all around

  27. WSD and Selection Restrictions • Ambiguous arguments • Prepare a dish • Wash a dish • Ambiguous predicates • Serve Denver • Serve breakfast • Both • Serves vegetarian dishes

  28. WSD and Selection Restrictions • This approach is complementary to the compositional analysis approach. • You need a parse tree and some form of predicate-argument analysis derived from • The tree and its attachments • All the word senses coming up from the lexemes at the leaves of the tree • Ill-formed analyses are eliminated by noting any selection restriction violations

  29. Problems • As we saw earlier, selection restrictions are violated all the time. • This doesn’t mean that the sentences are ill-formed or preferred less than others. • This approach needs some way of categorizing and dealing with the various ways that restrictions can be violated

  30. Supervised ML Approaches • That’s too hard… try something more data-oriented • In supervised machine learning approaches, a training corpus of words tagged in context with their sense is used to train a classifier that can tag words in new text (that reflects the training text)

  31. NLP Classifiers in a Nutshell Machine learning Corpus (data, and lots of it) Annotated by human beings with correct tags (here, word senses) Features extracted from the text Let the computer learn which features are important and how to weight them Give it new text to label automatically, using those features and weights

  32. WSD Tags • What’s a tag? A dictionary sense? • For example, for WordNet an instance of “run” in a text has 57 possible tags or labels • run (move fast by using one's feet, with one foot off the ground at any given time) "Don't run--you'll be out of breath"; "The children ran to the store" • run (flee; take to one's heels; cut and run) "If you see this man, run!"; "The burglars ran before the police showed up" • run (cover by running; run a certain distance) "She ran 10 miles that day" • run (run with the ball; in such sports as football)

  33. Word Sense Disambiguation • Given an occurrence of a word, decide which sense, or meaning, was intended. • Example, run • run1: move swiftly ( I ran to the store. ) • run2: operate (I run a store. ) • run3: flow (A river runs through the farm.) • run4: length of torn stitches (Her stockings had a run.)

  34. Representations • Most supervised ML approaches require a very simple representation for the input training data. • Vectors of sets of feature/value pairs • I.e. files of comma-separated values • So our first task is to extract training data from a corpus with respect to a particular instance of a target word • This typically consists of a characterization of the window of text surrounding the target

  35. Word Sense Disambiguation • Categories • Use word sense labels (run1, run2, etc.) • Features – describe context of word • near(w) : is the given word near word w? (e.g., in a window +/-2) • left(w): is word immediately preceded by w? • pos: word’s part of speech • etc.

  36. Collocational and co-occurrence information • Collocational • Encode features about the words that appear in specific positions to the right and left of the target word • Often limited to the words themselves as well as their part of speech • Co-occurrence • Features characterizing the words that occur anywhere in the window regardless of position • Typically limited to frequency counts

  37. Word Sense Disambiguation • Categories • Use word sense labels (run1, run2, etc.) • Features – describe context of word • near(w) : is the given word near word [race, river, stocking]? • left(w): is word immediately preceded by w? • pos: word’s part of speech [noun or verb] • etc.

  38. WSD: Sample Training DataFeatures run1: move swiftly ( I ran to the store. ) run2: operate (I run a store. ) run3: flow (A river runs through the farm.) run4: length of torn stitches (Her stockings had a run.)

  39. WSD: More instances Maybe more kinds of features would help?

  40. IOBJ SUBJ We serve men We serve food to men. We serve our community. serve —IndirectObject men DOBJ SUBJ We serve men We serve organic food. We serve coffee to connoiseurs. serve —DirectObject men

  41. Contextual Features • Lexical • Neighbor words (determined automatically) • Syntactic • Parts of speech • Passive/active • Types of phrases and clauses • Heads of phrases • Semantic • Synonyms and hypernyms of arguments (WordNet) • Named entity features (PERSON, LOCATION …)

  42. Classifiers • Once we cast the WSD problem as a classification problem, then all sorts of techniques are possible • Naïve Bayes (the right thing to try first) • Decision lists • Decision trees • MaxEnt • Support vector machines • Nearest neighbor methods…

  43. Classifiers • The choice of technique, in part, depends on the set of features that have been used • Some techniques work better/worse with features with numerical values • Some techniques work better/worse with features that have large numbers of possible values • For example, the feature the word to the left has a fairly large number of possible values

  44. Naïve Bayes • Argmax P(sense|feature vector) • Rewriting with Bayes and assuming independence of the features

  45. Naïve Bayes • P(s) … just the prior of that sense. • Just as with part of speech tagging, not all senses will occur with equal frequency • P(vj|s)… conditional probability of some particular feature/value combination given a particular sense • You can get both of these from a tagged corpus with the features encoded

  46. Bootstrapping • What if you don’t have enough data to train a system… • Bootstrap • Pick a word that you as an analyst think will co-occur with your target word in particular sense • Grep through your corpus for your target word and the hypothesized word • Assume that the target tag is the right one

  47. Bootstrapping • For run • Assume the word stockingoccurs with the unraveling sense and the word company occurs with the operate sense • Grep

  48. Bass Results Assume these results are correct and use them as training data.

  49. Problems • Given these general ML approaches, how many classifiers do I need to perform WSD robustly? • One for each ambiguous word in the language • How do you decide what set of tags/labels/senses to use for a given word? • Depends on the application

  50. Possible Applications • Tagging to improve • Translation • Tag sets are specific to the language pairs • Q/A and Information retrieval (search) • Tag sets are sensitive to frequent query terms • In these cases, you need to be able to disambiguate on both sides effectively • Not always easy with queries • “bat care”

More Related