120 likes | 245 Views
Applications of SuperTagging. Raman Chandrasekar. SuperTagging: Applications. Information Filtering: SuperTagging used to increase retrieval precision Text Simplification: SuperTagging used to induce rules for text simplification Word Sense Disambiguation Machine Translation
E N D
Applications of SuperTagging Raman Chandrasekar Industry Day
SuperTagging: Applications • Information Filtering: SuperTagging used to increase retrieval precision • Text Simplification: SuperTagging used to induce rules for text simplification • Word Sense Disambiguation • Machine Translation • Information Extraction • Noun Phrase Chunking
Glean: Document Filtering • Problem: to access only relevant information • Current approaches: • Information retrieval (IR) systems use keywords, boolean operators etc. • Problems due to synonymy and polysemy • Most Web search engines tend to • maximize recall (coverage) • emphasize speed of retrieval • but sacrifice precision (`accuracy’ of result) • Our approach: Use syntactic information to increase precision.
Glean: The Basics • Underlying ideas: • meaning of a word decided by how it is used • much information latent in text • good to use post-processing filter model • Use SuperTagging to get syntactic labeling • Part-of-Speech tags are not as useful[RIAO ‘97]
Glean: Query by Example • Input: • Search Engine Query Expression+work +IRCS +”natural language processing” +learning • Concept/word of Interestwork • Prototypical usage:She has been working on problems related to aspect.He works in the area of Information Retrieval.She works on statistical mechanics.Recently he has been working in the area of quantum computing. • Interpretation: • get all documents satisfying the query expression, • check if they contain sentences with a variant of work, • check that these are `relevant’, i.e. structurally similar to the context around work in the prototypical sentences.
Glean: Inducing a Pattern • Prototypical usage: • Shehas been working on problems related to aspect. • Chunked, supertagged version: • She/A_NXN has/B_Vvx~been/B_Vvx~working/A_nx0V on/B_vxPnx problems/A_NXN related/A_nx1V to/B_vxPnx aspect/A_NXN ./B_sPU • Context around word of interest: • She/A_NXN has/B_Vvx~been/B_Vvx~working/A_nx0V on/B_vxPnx … • Generalized pattern: • */NP*working/A_nx0V*on/B_vxPnx • This pattern also matches, for example:“We are also workingon type systems for data and knowledge bases.”
The Glean system • Implemented (mainly in PERL) with • HTML Form-interfaces, with a variety of options • a SuperTagger server • Results • 97 % recall and 88 % precision in filtering outirrelevant material in a small test. • Large scale evaluation in progress. • Demo available Research collaboration between the National Centre for Software Technology, Bombay, Institute for Research in Cognitive Science &Center for Advanced Study of India, University of Pennsylvania.
SuperTagging: Benefits • Right level of granularity • Rich tag set, suitable for a variety of applications • Accurate: over 92% accuracy • Fast: 31 - 57 words/sec (interpreted PERL) • Can be easily retrained, if required • Many more applications possible
Automatic Text Simplification • Basic Idea: To process complex text • create better tools or • simplify the text to be processed! • Initial Prototype of Simplification System (Bombay) • Based on Finite State Grammars • Rules on strings to map complex sentences to simpler ones • To simplify sentences of the form:Talwinder Singh, who masterminded the Air India sabotage,was killed in a shoot-out with police ... • we use a rule such as:Segment1/NP, who Segment2, Segment3=> Segment1 Segment3. Segment1 Segment2. • to get :Talwinder Singh was killed in a shoot-out with police….Talwinder Singh masterminded the Air India sabotage.
Automatic Text Simplification • SuperTagging is better [Coling96] • Constituent spans easier to identify • Simplification rules more expressive • Rules can now be induced automatically [KBCS96 , KBS] • Data: Parallel (aligned) corpus of complex and simple text • Induction Procedure: • Data tagged using SuperTagging and LDA • Aligned labeled trees for complex & simple trees compared • Tree-to-trees transformations identified • Reduced to a normal form to get simplification rules.
Noun-Phrase Chunking • Variety of approaches (Hindle, Marcus & Ramshaw, Voutilainen) for Noun-Phrase Chunking • Depending on application, we may need • maximal noun phrases • basal noun phrases • all derivable noun phrases • SuperTagging provides mechanisms for application-specific noun phrase chunking • Can form part of (or basis for) a variety of tools