LING 573: Deliverable 4

LING 573: Deliverable 4 Group 7 Ryan Cross Justin Kauhl Megan Schneider

Previous Work • Question Classification • MaxEnt Classifier • Vectors: Unigrams, Bigrams, Chunks, Hypernyms • Best Results:

Previous Work • Passage Retrieval • Indri/Lemur • Krovetz Stemmer, stopwords + question words removed • Best results with 150/75 window size

D4 Approaches • Improve passage retrieval system • Analyze window size / increments • Query expansion by question tag • Reranking by question type

Improving Passage Retrieval Got longer counts (50, 70) Removed exact duplicates from returned results; kept first one found for each duplicate Slight gain for lenient, slight loss for strict

Window Size For original window sizes run with all increments from size*.2 to size*.9 Found results best at approximately 60% of window size Reran with window sizes to find maximum window sizes closest to character limits Resulted in 178:109, 19:12, 45:27

Answer Reranking • Rerank passage using question types • Run passages through Question Classification module to get their question type • Promote passages who question type matches the question type of the question

Answer Reranking • Unfortunately, this reduces MRR.

Answer Reranking • Why did reranking fail to help? • Question classifier is trained on questions not answer passages • Small amount of passages when reranking; the correct ones might have been missed in the IR

Query Expansion • Used TREC-2004 questions and associated answer file • Determined all possible acceptable answer stings from answer file • A query was formed using the Indri #band operator • #band( answer string #passage[window size: increment]( query))

Query Expansion (cont.) • Used TREC-2004 coarse tag gold standard file to assign tags to each question • Passages returned for queries that restricted for the correct answer were tokenized and added to frequency tables based on coarse tag type. • Frequency tables were cleaned to remove stopwords, punctuation, and other non-informative tokens • The top 5 tokens in each table were added to queries in the 2005 data corresponding to the coarse tags returned from our question classification system

Query Expansion Results • Unable to test query expansion on 2004 data as 2004 data was used in training • An extra test on the 2005 data was performed

Best Results

Query Expansion Problems • Why did query expansion fail to help? • Too many similarities between most frequent words between lists • Even when given the target answer string the Indri query only scored at about 0.75 lenient accuracy • Our coarse tagger only correctly identified the question type ~85% of the time • Huge bias towards certain articles. • Elements of their meta-tags which contained phrases like “New York Times” were being overly represented • Aquaint corpus not expansive enough. Newspaper articles have frequency bias towards certain words.

Critical Analysis • Runs using query expansion did slightly better than those without. Perhaps with a bit more refinement the system can widen that gap.

References • Deepak Ravichandran, Eduard Hovy, Franz Josef Och. Statistical QA – Classifier vs Re-ranker: What’s the difference? 2003. MultiSumQA ‘03 Proceedings of the Acl 2003 workshop on Multilingual summarization and question answering, Vol 12. • Abdessamad Echihabi, Ulf Hermjakob, Eduard Hovy, Danial Marcu, Eric Melz, Deepak Ravichandran. How to Select and Answer String? 2004. Information Sciences Institute, University of Southern California, CA

Questions? ?

LING 573: Deliverable 4