130 likes | 222 Views
LING 573: Deliverable 3. Group 7 Ryan Cross Justin Kauhl Megan Schneider. The Basics. Implemented in Python with Indri For document retrieval used standard #combine (“query”) operator #combine(x1 x2 … xn) = (score for x1)^(1/n) * (score for x2)^(1/n) * … (score for xn)^(1/n)
E N D
LING 573: Deliverable 3 Group 7 Ryan Cross Justin Kauhl Megan Schneider
The Basics • Implemented in Python with Indri • For document retrieval used standard #combine (“query”) operator • #combine(x1 x2 … xn) = (score for x1)^(1/n) * (score for x2)^(1/n) * … (score for xn)^(1/n) • Used passage#:# to get windows for passage retrieval (100:50, 150:50, 150:75, also 150:10, 150:15, and longer windows) • Used regexes to clean up the Indri printPassages output
Approaches • Stemming • Stop word removal • Question word removal • Query expansion
Approaches (cont.) • Stemming • Tried with stemming in index and stemming query • Porter and Krovetz stemmers • Krovetz performed better (less aggressive)
Approaches (cont.) • Stop word removal • Made runtime faster when removed from index • Offered improvement in all circumstances if removed from queries • Question word removal • Performed in almost all cases for query; some improvement. • Largely intuitive. However some questions had slightly better results when left in because of Q&A files in the corpus.
Approaches (cont.) • Query expansion • Tried adding synonyms from Wordnet • Only added synonyms for nouns, verbs, adjectives, and adverbs • Restricted synonyms added based on a word’s POS (as predicted by NLTK.pos_tag) • Also tried not restricting synonyms by POS
Approaches (cont.) • Query expansion • In both cases, retrieval results were worse with query expansion
Approaches (cont.) • Passage retrieval • Used Indri #combine[passage size:increment]( “query” ) operator • Originally intended to only use documents returned from document retrieval phase • Decided instead to run passage retrieval as a standalone system.
Approaches (cont.) • Passage retrieval results • Attempted with a few different variables. • Krovetz stemming, stopwords + question words removed. • Trying to get a window size that did not return too many characters and meaningful increments.
Overall • Krovetz stemmer • Stopwords removed from query(kept in index)
Critical Analysis • Our query expansion attempts did not help • Too many misleading terms were introduced • Stopword based results were unusual • Assumed that removing them from the index would help. • Passage retrieval yielded better results than document retrieval • It is more meaningful to see a query term in a passage
References • Hitesh Sabnani, Prasenjit Majumder. Question Answering System: Retrieving Relevant Passages. In Proceedings of Cross-Language Evaluation Forum - CLEF. • Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering. 2003. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.