100 likes | 314 Views
Unsupervised Word Sense Disambiguation. REU, Summer, 2009. Word Sense Disambiguation. large vessel for holding gases or liquids. E.g., “The soldiers drove the tank .”. armored combat vehicle. Context Knowledge Base. “Many companies hire computer programmers”.
E N D
UnsupervisedWord Sense Disambiguation REU, Summer, 2009
Word Sense Disambiguation large vessel for holding gases or liquids E.g., “The soldiers drove the tank .” armored combat vehicle
Context Knowledge Base “Many companies hire computer programmers” “Computer programmers write software” hire write + company programmer programmer software many computer computer
Context Knowledge Base Result of merging dependency trees hire write 1 1 1 1 company programmer software 1 2 many computer Weights are number of dependency relation instances found
WSD Algorithm Parse original sentence using Minipar, get weighted dependency tree. “A large software company hires computer programmers.” hire 1 company programmer 0.5 To-be-disambiguated word software large computer Weights are distances from to-be-disambiguated word 0.33 1 1
WSD Algorithm Parse each gloss of to-be-disambiguated word, get weighted dependency trees. Gloss 1: an institution created to conduct business Gloss 2: a small military unit create unit institution conduct small military business
WSD Algorithm For each word in a gloss tree, find that word’s dependent words in the context knowledge base. We are looking for words in the knowledge base that match words in the original sentence. In other words, we are looking for context clues to disambiguate a word. A score is generated based on the weights of those dependency relations in the knowledge base, and the dependent words of the to-be-disambiguated word in the original sentence. The more matches we find, the higher the generated score will be. The gloss with the highest generated score will be selected as the correct sense of the word.
Synonym Matching If no direct matches are found between a gloss word and dependency relations in context knowledge base, we can replace the gloss word with one of its synonyms, since synonyms are semantically equivalent words.
Hypernym/hyponym Matching • Extract hypernyms and hyponyms of words from WordNet database. • Store these in a data structure. • Strategies: use all “levels” • use only levels close to the original word • apply the above strategies to synonym matching, as well E.g., animal mammal dog poodle
Word Similarity • Use WordNet::Similarity Perl module to calculate “similarity score” between gloss word and dependent words in knowledge base. • The most similar word found will be considered the closest to an actual match. dog 0.780 animal dog 0.162 desk WordNet::Similarity similarity scores