100 likes | 230 Views
A method for WSD on Unrestricted Text. Authors: Rada Mihalcea and Dan Moldovan Presenter: Marian Olteanu. Introduction. WSD methods: Information in MRD (machine readable dictionaries) Supervised training (info from a disambiguated corpus) Unsupervised training (info from a raw corpus)
E N D
A method for WSD on Unrestricted Text Authors: Rada Mihalcea and Dan Moldovan Presenter: Marian Olteanu
Introduction • WSD methods: • Information in MRD (machine readable dictionaries) • Supervised training (info from a disambiguated corpus) • Unsupervised training (info from a raw corpus) • Hybrid methods
Approach • Unsupervised learning • Tag all content words (nouns, verbs, adjectives, adverbs) • Use Web as a corpus (Altavista search engine) • Use semantic density (using WordNet)
Algorithm • Use word pairs (one word in the context of the other) • Verb-noun pairs (syntactically linked) • I.e.: investigate report • {report#1, study}, {report#2, news report, story, account, write up}
Algorithm (cont.) • Search for “investigate report” and “investigate study” – first sense • Search for “investigate report”, “investigate news report”, …, “investigate write up” – second sense • Order sense # by counts
Algorithm (cont.) • Repeat for verbs • Use both phrases and NEAR operator – similar results • Select first 4 senses for N and V, first 2 for J and R
Algorithm – step 2 • Compute conceptual density • Apply only for N-V pair (because WN doesn’t have adequate hierarchies for J and R) • Between senses found at step 1 • Count match between nouns in the sub-glosses of the verb and all the hyponyms (+noun) for the noun
Algorithm – step 2 (cont.) • Formula: • I find it flawed (log part) • revise law:
Evaluation • SemCor • Step 1: • Step 2: