250 likes | 346 Views
Chapter 20 Part 2. Computational Lexical Semantics Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani Nenkova. 1. Knowledge-based WSD. Task definition
E N D
Chapter 20Part 2 Computational Lexical Semantics Acknowledgements: these slides include material from Rada Mihalcea, Ray Mooney, Katrin Erk, and Ani Nenkova 1
Knowledge-based WSD • Task definition • Knowledge-based WSD = class of WSD methods relying (mainly) on knowledge drawn from dictionaries and/or raw text • Resources • Yes • Machine Readable Dictionaries • Raw corpora • No • Manually annotated corpora
Machine Readable Dictionaries • In recent years, most dictionaries made available in Machine Readable format (MRD) • Oxford English Dictionary • Collins • Longman Dictionary of Ordinary Contemporary English (LDOCE) • Thesauruses – add synonymy information • Roget Thesaurus • Semantic networks – add more semantic relations • WordNet • EuroWordNet
WordNet definitions/examples for the noun plant • buildings for carrying on industrial labor; "they built a large plant to manufacture automobiles” • a living organism lacking the power of locomotion • something planted secretly for discovery by another; "the police used a plant to trick the thieves"; "he claimed that the evidence against him was a plant" • an actor situated in the audience whose acting is rehearsed but seems spontaneous to the audience MRD – A Resource for Knowledge-based WSD • For each word in the language vocabulary, an MRD provides: • A list of meanings • Definitions (for all word meanings) • Typical usage examples (for most word meanings)
MRD – A Resource for Knowledge-based WSD • A thesaurus adds: • An explicit synonymy relation between word meanings • A semantic network adds: • Hypernymy/hyponymy (IS-A), meronymy/holonymy (PART-OF), antonymy, etc. WordNet synsets for the noun“plant” 1. plant, works, industrial plant 2. plant, flora, plant life WordNet related concepts for the meaning “plant life” {plant, flora, plant life} hypernym: {organism, being} hypomym: {house plant}, {fungus}, … meronym: {plant tissue}, {plant part} member holonym: {Plantae, kingdom Plantae, plant kingdom}
Lesk Algorithm • (Michael Lesk 1986): Identify senses of words in context using definition overlap. That is, disambiguate more than one word. • Algorithm: • Retrieve from MRD all sense definitions of the words to be disambiguated • Determine the definition overlap for all possible sense combinations • Choose senses that lead to highest overlap Example: disambiguate PINE CONE • PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness • CONE 1. solid body which narrows to a point 2. something of this shape whether solid or hollow 3. fruit of certain evergreen trees Pine#1 Cone#1 = 0 Pine#2 Cone#1 = 0 Pine#1 Cone#2 = 1 Pine#2 Cone#2 = 0 Pine#1 Cone#3 = 2 Pine#2 Cone#3 = 0
Lesk Algorithm for More than Two Words? • I saw a man who is 98 years old and can still walk and tell jokes • nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3) • 43,929,600 sense combinations! How to find the optimal sense combination? • Simulated annealing (Cowie, Guthrie, Guthrie 1992) • Let’s review (from CS1571)
Search Types • Backtracking state-space search • Local Search and Optimization • Constraint satisfaction search • Adversarial search
Local Search • Use a single current state and move only to neighbors. • Use little space • Can find reasonable solutions in large or infinite (continuous) state spaces for which the other algorithms are not suitable
Optimization • Local search is often suitable for optimization problems. Search for best state by optimizing an objective function.
Visualization • States are laid out in a landscape • Height corresponds to the objective function value • Move around the landscape to find the highest (or lowest) peak • Only keep track of the current states and immediate neighbors
Simulated Annealing • Based on a metallurgical metaphor • Start with a temperature set very high and slowly reduce it.
Simulated Annealing • Annealing: harden metals and glass by heating them to a high temperature and then gradually cooling them • At the start, make lots of moves and then gradually slow down
Simulated Annealing • More formally… • Generate a random new neighbor from current state. • If it’s better take it. • If it’s worse then take it with some probabilityproportional to the temperature and the delta between the new and old states.
Simulated annealing • Probability of a move decreases with the amount ΔE by which the evaluation is worsened • A second parameter T isalso used to determine the probability: high Tallows more worse moves, Tclose to zero results in few or no bad moves • Scheduleinput determines the value of Tas a function of the completed cycles
function Simulated-Annealing(problem, schedule) returns a solution state inputs: problem, a problem schedule, a mapping from time to “temperature” current ← Make-Node(Initial-State[problem]) for t ← 1 to ∞ do T ← schedule[t] ifT=0 then return current next ← a randomly selected successor of current ΔE ← Value[next] – Value[current] if ΔE > 0 then current ← next else current ← next only with probability eΔE/T
Intuitions • the algorithm wanders around during the early parts of the search, hopefully toward a good general region of the state space • Toward the end, the algorithm does a more focused search, making few bad moves
Lesk Algorithm for More than Two Words? • I saw a man who is 98 years old and can still walk and tell jokes • nine open class words: see(26), man(11), year(4), old(8), can(5), still(4), walk(10), tell(8), joke(3) • 43,929,600 sense combinations! How to find the optimal sense combination? • Simulated annealing (Cowie, Guthrie, Guthrie 1992) • Given: W, set of words we are disambiguating • State: One sense for each word in W • Neighbors of state: the result of changing one word sense • Objective function: value(state) • Let DWs(state) be the words that appear in the union of the definitions of the senses in state; • value(state) = sum over words in DWs(state): # times it appears in the union of the definitions of the senses • The value will be higher, the more words appear in multiple definitions. • Start state: the most frequent sense of each word
Lesk Algorithm: A Simplified Version • Original Lesk definition: measure overlap between sense definitions for all words in the text • Identify simultaneously the correct senses for all words in the text • Simplified Lesk (Kilgarriff & Rosensweig 2000): measure overlap between sense definitions of a word and its context in the text • Identify the correct sense for one word at a time • Search space significantly reduced (the context in the text is fixed for each word instance)
Lesk Algorithm: A Simplified Version • Algorithm for simplified Lesk: • Retrieve from MRD all sense definitions of the word to be disambiguated • Determine the overlap between each sense definition and the context of the word in the text • Choose the sense that leads to highest overlap Example: disambiguate PINE in “Pine cones hanging in a tree” • PINE 1. kinds of evergreen tree with needle-shaped leaves 2. waste away through sorrow or illness Pine#1 Sentence = 1 Pine#2 Sentence = 0
Selectional Preferences • A way to constrain the possible meanings of words in a given context • E.g. “Wash a dish” vs. “Cook a dish” • WASH-OBJECT vs. COOK-FOOD • Alternative terminology • Selectional Restrictions • Selectional Preferences • Selectional Constraints
Acquiring Selectional Preferences • From raw corpora • Frequency counts • Information theory measures
Preliminaries: Learning Word-to-Word Relations • An indication of the semantic fit between two words • 1. Frequency counts (in a parsed corpus) • Pairs of words connected by a syntactic relations • 2. Conditional probabilities • Condition on one of the words
Learning Selectional Preferences • Word-to-class relations (Resnik 1993) • Quantify the contribution of a semantic class using all the senses subsumed by that class (e.g., the class is an ancestor in WordNet)
Using Selectional Preferences for WSD • Algorithm: • Let N be a noun that stands in relationship R to predicate P. Let s1…sk be its possible senses. • For i from 1 to k, compute: • Ci = {c |c is an ancestor of si} • Ai = max for c in Ci A(P,c,R) • Ai is the score for sense i. Select the sense with the highest score. • For example: Letter has 3 senses in WordNet (written message; varsity letter; alphabetic character) and belongs to 19 classes in all. • Suppose we have predicate “write”. For each sense, calculate a score, by measuring association of “write” & direct object, with each ancestor of that sense.