210 likes | 215 Views
Explore multiple query aspects and enhance feature construction for better information relevance. Compare various learning approaches for optimal feature weights in information retrieval systems.
E N D
CMU Y2 Rosetta GnG Distillation Jonathan Elsas Jaime Carbonell
Y1 System Rank Learning Y2 System Y2 Eval Y3 Eval+ Rosetta GnG System Evolution Y1 Eval
Distillation Challenges • Multiple aspects to information need: • Query arguments, Locations, Related Words • Static expansion terms/phrases • Bigrams, trigrams, term windows • Named-Entity wildcards & constraints • Occurrence of each of these in a document* is a “feature” indicating relevance of the document* to the information need. • Question: How to best choose the weights for each feature? * Or sentences, paragraphs, “nuggets”, etc.
Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features: Unigram Features
Bigram & Term Window Features Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features:
Entity-Type Constrained Features Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features:
Entity Co-reference Features Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features:
Static Template-based expansion (unigram, bigram, term windows) Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features:
Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features: + potentially many more: structural features, PRF, & SRL annotations
Learning Approach to Setting Feature Weights • Goal: Utilize existing relevance judgments to learn optimal weight setting • Recently has become a hot research area in IR. “Learning to Rank”
Pair-wise Preference Learning • Learning a document scoring function • Treated as a classification problem on pairs of documents: • Resulting scoring function is used as the learned document ranker. Correct Incorrect
Committee Perceptron Algorithm • Online algorithm (instance-at-a-time) • Fast training, low memory requirements • Ensemble method • Selectively chooses N best hypotheses encountered during training • “N heads are better than 1” approach • Significant advantages over previous perceptron variants • Many ways to combine output of hypotheses • Voting, score averaging, hybrid approaches • This is the focus of current research
q, dR, dN R q, dR, dN N q, dR, dN q, dR, dN q, dR, dN q, dR, dN q, dR, dN Committee Perceptron Training Training Data Committee Current Hypothesis
q, dR, dN R q, dR, dN N q, dR, dN q, dR, dN q, dR, dN q, dR, dN q, dR, dN Committee Perceptron Training Training Data Committee Current Hypothesis
q, dR, dN R q, dR, dN N q, dR, dN q, dR, dN q, dR, dN q, dR, dN q, dR, dN Committee Perceptron Training Training Data Committee Current Hypothesis
q, dR, dN R q, dR, dN N q, dR, dN q, dR, dN q, dR, dN q, dR, dN q, dR, dN Committee Perceptron Training Training Data Committee Current Hypothesis
Next Steps • (in progress) Integrate current work with GALE GnG system • Document ranking is the obvious first step • Passage ranking poses additional challenges • Both will be addressed this year • Implement feature-based query generation framework for Rosetta GnG System • Extend & improve performance of our rank learning algorithm
Future Work • Investigate application of preference learning in Utility system, adapting to real-time user preference feedback.