10 likes | 116 Views
Alyssa at TREC 2006 Statistically-Inspired Q&A. Andreas Merkel 1 Dan Shen 1 Jochen L. Leidner 1,2 Dietrich Klakow 1. lsv_trec_qa@lsv.uni-saarland.de 1) Spoken Language Systems, Saarland University, 66123 Saarbrücken, Germany. 2) Linguit Ltd. Query Construction
E N D
Alyssa at TREC 2006Statistically-Inspired Q&A Andreas Merkel1 Dan Shen1 Jochen L. Leidner1,2 Dietrich Klakow1 lsv_trec_qa@lsv.uni-saarland.de1) Spoken Language Systems, Saarland University, 66123 Saarbrücken, Germany. 2) Linguit Ltd. • Query Construction • Expands the query with the topic multiple times • Including the topic twice is sufficient • Introduction • Information theoretically well-founded approach to open domain QA • Objective: long-term experimental platform • flexibility • modularity • Uses a cascade of: • LM-based document retrieval • LM-based sentence retrieval • maximum entropy based answer extraction over a dependency relation representation • Document Retrieval • Integrates Lemur IR toolkit • Uses Language Model based approach: • Bayesian smoothing with Dirichlet priors • with stemming and no stop-word removal • Fetches 60 relevant documents to get about 90% of answers • Query Analysis • Topic Resolution • a simple anaphora resolution strategy • Expected Answer Type Extraction • NE taxonomy1 • 6 coarse and 50 fine grained NE types • Question Pattern Matching • Maps high frequent questions to classes • 92 questions are classified to question class • Dependency Relation Path Extraction • Dependency Parsing • Extracts dependency relation paths • <question words, path, all question key chunks > • Sentence Retrieval • Integrates LSVLM Language Modeling toolkit • Uses Bayesian smoothing with Dirichlet priors • with stemming and dynamic stop-word reduction • query and document expansion according to query type • removing of query word • Results: Factoid List QT QP2 QP1 AQUAINT Corpus Wikipedia Definition Query Construction Document Retrieval Wiki Passage Retrieval Corref. Resolution Def QA Answer SR 2 SR 1 Google Definition Sentence Annotation • Question Classification & Typing • Uses question types and data by Li and Roth2 • Employs Bayes Classifier • Estimates probabilities using language model toolkit • Explored different smoothing and backing-off techniques • Results: • Answer Extraction & Fusion • Answer Extraction • Linguistic analysis -- NER, Chunking, Dependency Parsing • Chunk-based Surface Text Pattern Matching • Maximum Entropy-based Ranking Model • Correlation of Dependency Relation Path • Answer Fusion • Converts rank to probability using Mandelbrot distribution • Fuses results from sentence retrieval and Wikipedia Dep. AE Pattern AE List QA Aquaint Candidates Wiki Candidates Web Candidates Answer Validation (NIL Judgment) Answer • Results • Performs better than median of participants: • Flexibility of the system architecture: experiment with different approaches systematically • Data driven approach: improve bottlenecks at any point in time • No increase of unsupported answers, unlike other participants • Future Work • Study the impact of the substitution of pronouns • Develop more fine-grained named entity recognizer • Investigate more sophisticated learning algorithms for evidence fusion • Use answer extraction patterns as precise Web validation patterns • Error analysis References: 1)http://I2r.cs.uiuc.edu/~cogcomp/Data/QA/QC 2) X. Li and D. Roth. Learning Question Classifiers. In Proceedings of COLING (2002). 3)D. Zhang and W. Lee. Question Classification using Support Vector Machines. In Proceedings of SIGIR (2003).