1 / 13

LING 573: Deliverable 3

LING 573: Deliverable 3. Group 7 Ryan Cross Justin Kauhl Megan Schneider. The Basics. Implemented in Python with Indri For document retrieval used standard #combine (“query”) operator #combine(x1 x2 … xn) = (score for x1)^(1/n) * (score for x2)^(1/n) * … (score for xn)^(1/n)

stamos
Download Presentation

LING 573: Deliverable 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LING 573: Deliverable 3 Group 7 Ryan Cross Justin Kauhl Megan Schneider

  2. The Basics • Implemented in Python with Indri • For document retrieval used standard #combine (“query”) operator • #combine(x1 x2 … xn) = (score for x1)^(1/n) * (score for x2)^(1/n) * … (score for xn)^(1/n) • Used passage#:# to get windows for passage retrieval (100:50, 150:50, 150:75, also 150:10, 150:15, and longer windows) • Used regexes to clean up the Indri printPassages output

  3. Approaches • Stemming • Stop word removal • Question word removal • Query expansion

  4. Approaches (cont.) • Stemming • Tried with stemming in index and stemming query • Porter and Krovetz stemmers • Krovetz performed better (less aggressive)

  5. Approaches (cont.) • Stop word removal • Made runtime faster when removed from index • Offered improvement in all circumstances if removed from queries • Question word removal • Performed in almost all cases for query; some improvement. • Largely intuitive. However some questions had slightly better results when left in because of Q&A files in the corpus.

  6. Approaches (cont.) • Query expansion • Tried adding synonyms from Wordnet • Only added synonyms for nouns, verbs, adjectives, and adverbs • Restricted synonyms added based on a word’s POS (as predicted by NLTK.pos_tag) • Also tried not restricting synonyms by POS

  7. Approaches (cont.) • Query expansion • In both cases, retrieval results were worse with query expansion

  8. Approaches (cont.) • Passage retrieval • Used Indri #combine[passage size:increment]( “query” ) operator • Originally intended to only use documents returned from document retrieval phase • Decided instead to run passage retrieval as a standalone system.

  9. Approaches (cont.) • Passage retrieval results • Attempted with a few different variables. • Krovetz stemming, stopwords + question words removed. • Trying to get a window size that did not return too many characters and meaningful increments.

  10. Overall • Krovetz stemmer • Stopwords removed from query(kept in index)

  11. Critical Analysis • Our query expansion attempts did not help • Too many misleading terms were introduced • Stopword based results were unusual • Assumed that removing them from the index would help. • Passage retrieval yielded better results than document retrieval • It is more meaningful to see a query term in a passage

  12. References • Hitesh Sabnani, Prasenjit Majumder. Question Answering System: Retrieving Relevant Passages. In Proceedings of Cross-Language Evaluation Forum - CLEF. • Stefanie Tellex, Boris Katz, Jimmy Lin, Aaron Fernandes, and Gregory Marton. Quantitative Evaluation of Passage Retrieval Algorithms for Question Answering. 2003. Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.

  13. Questions? ?

More Related