170 likes | 307 Views
IE approaches. Traditional IE (from NLP and CL) Using syntactic and semantic constraints Wrapper (independently developed for WWW) Using delimiter-based extraction patterns This paper Soft Pattern + IR(PRF) + summarization (sentence retrieval/ranking, MMR) techniques.
E N D
IE approaches • Traditional IE (from NLP and CL) • Using syntactic and semantic constraints • Wrapper (independently developed for WWW) • Using delimiter-based extraction patterns • This paper • Soft Pattern + IR(PRF) + summarization (sentence retrieval/ranking, MMR) techniques
Unsupervised Learning of Soft Patterns for Generating Definitions from Online News • IE from QA perspective • Research question: finding definition sentence for terms or person names; • Previous approaches: • hand-crafted rules (previous paper) or • supervised learning • Research method: • unsupervised soft patterns +IR + summarization • External tools needed: commercial pos tagger and syntactic chunker (NP, VP)
Soft Patterns • A virtual vector representation (window size 3) • <Slot-w, ……, Slot-2, Slot-1, SCH_TERM , Slot1, Slot2, ……Slotw : Pa> • Slot: a vector of tokens with their probabilities of occurrence • <(tokeni1, weighti1), (tokeni2, eighti2) ……(tokenim, weightim): Sloti> • Token: word, punctuation or syntactic tag (substituted?)
sentences Test sentence Tagging, chunking, substitution Tagging, chunking, substitution Pa instances <token-w, ……, token-2, token-1, SCH_TERM, token1, token2, …… tokenw : S> S instance Probability estimate Soft patternsPa Soft Patterns Matching Process Matching:1) bag-of-words similarity using Naive Bayes2) sequences fidelity using bigram model3) weighing patterns by their overall weight
Soft Patterns Matching • bag-of-words similarity using Naive Bayes • sequences fidelity using bigram model Where is Pa? Manual Tuning alpha?
System Architecture Search Term IR, anaphora resolution Final sentenceselection Input relevant sentences Redundancy removal: MMR Centroid-basedranking Matched candidatesentences as definition Reranking by pattern matching Ranked sentences Top n by PRF SP generation Pseudo-relevance feedback or assumption?
Centroid Word Selection • Which sentences are mostly likely to contain a definition? • Local centroid words (summarization techniques) • For each word, compute its mutual info with search term
Summary of the techniques employed • Core: soft pattern generalization and matching • Others: • Heavy use of summarization techniques • MMR for redundancy removal • Sentence Ranking/Retrieval • Shallow NLP • POS tagging and syntactic chunker
Evaluation for Definition Extraction • Test data: • TREC QA corpus • Online news (heuristics leaning to news text) • Experiment: • Comparison to HCR and centroid-based statistical method (baseline) • F5-measure
Questions for this paper • Chunker-variate performance? (NP, VP) • Manual tuning parameter (alpha, delta)? • Void PRF? • Question selection: seed for pattern generation • Is it “patterns” or just one pattern at all? • Arbitrary window size? • Is it really “unsupervised learning?” • Part of data used for rule induction • Can SP+PRF really beat HCR?
References • Line Eikvil. Information Extraction from World Wide Web. Norwegian Computing Center Technical Report 1999 • William Cohen and Andrew McCallum. Information Extraction from World Wide Web. Kdd tutorial 2003 • Stephen Soderland. Learning Information Extraction Rules from Semi-structured and Free-text. Machine Learning (1) 1999 • Fuchun Peng. Models for Information Extraction. Technical Report (2000 or 2001?) • Douglas E. Appelt and David J. Israel. Introduction to Information Extraction Technologies. IJCAI’99 Tutorial.