130 likes | 152 Views
WSD for Applications. Bill Dolan SenseEval 2004. Where is WSD useful?. Lots of work in the field, but still no clear answer Where WSD = classical, dictionary-sense resolution. Intuitive Motivations . Automates something we already do with dictionaries Many applications seem to require WSD
E N D
WSD for Applications Bill Dolan SenseEval 2004
Where is WSD useful? • Lots of work in the field, but still no clear answer • Where WSD = classical, dictionary-sense resolution
Intuitive Motivations • Automates something we already do with dictionaries • Many applications seem to require WSD • Information Retrieval/Question Answering • Cross-language information retrieval • Information extraction • Proofing tools, e.g. synonym replacement • Translation
Pragmatic Motivations • Splitting off WSD yields a pleasing division of the NLP problem space • manageable in size • clear success metrics • readily available training data: annotated and unannotated
But where are the applications? • Why is it so hard to find a convincing app? • Hopeful answer: the quality bar just hasn’t been met yet • But even experimentally, little/no evidence that WSD helps any application • Alternatively: maybe we’re trying to automate the wrong task • Then what is the right task?
An Application-centric view • What do apps actually need? • Information Retrieval/Question Answering • Cross-language information retrieval • Information extraction • Proofing tools, e.g. synonym replacement • Translation • Not a sense, a cluster of related words, etc. Instead: • The ability to map one string into another that’s superficially distinct • Regardless of length or language • Paraphrase
Question Answering • The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists • Researchers announced Thursday they've completed the genetic blueprint of the blight-causing culprit responsible for sudden oak death • Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks • The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees
Information Extraction • The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists • Researchers announced Thursday they've completed the genetic blueprint of the blight-causing culprit responsible for sudden oak death • Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks • The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees
Cross-lingual Information Retrieval • The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists • Researchers announced Thursday they've completed the genetic blueprint of the blight-causing culprit responsible for sudden oak death • Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks • The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees
Proofing: rewriting tool • The genome of the fungal pathogen that causes Sudden Oak Death has been sequenced by US scientists • Researchers announced Thursday they'vecompleted the genetic blueprint of the blight-causing culprit responsible for sudden oak death • Scientists have figured out the complete genetic code of a virulent pathogen that has killed tens of thousands of California native oaks • The East Bay-based Joint Genome Institute said Thursday it has unraveled the genetic blueprint for the diseases that cause the sudden death of oak trees
A different take on the problem • What’s missing is a basic enabling technology • Paraphrase identification/generation capability • The applications for WSD that have been suggested over the years really need more general paraphrase identification/generation skills • Resolving lexical associations is just one aspect of this • Problem begins to look more like an MT problem • Map one chunk of text to another, similar or not • Not clear that explicit WSD useful
Some Apps • Machine Translation • Data-driven techniques predominate, work pretty well • No explicit WSD, just learned associations between bilingual pairings • Lexical mappings learned through statistical association • not perfect, but given the right data, pretty good • Different language pairs require different sense breakdowns • Paraphrase/MT are the same problem • Cross-language IR • What else but MT? • Proofing tools, e.g. thesaurus-level replacements • But often not terribly useful; as any writer knows, there’s usually no good synonym, and a complete rewrite is necessary • Question Answering/IR • Map a query to a piece of text to semantically similar but potentially formally distinct prose • For all of these apps, problem is less individual words than whole sequences
Direction? • The applications that have been suggested for WSD are all just aspects of the larger paraphrase problem • Even MT is a paraphrase problem, though a bit more extreme than the monolingual case • Focus on the broader paraphrase problem, rather than on individual words