1 / 21

CMU Y2 Rosetta GnG Distillation

Explore multiple query aspects and enhance feature construction for better information relevance. Compare various learning approaches for optimal feature weights in information retrieval systems.

Download Presentation

CMU Y2 Rosetta GnG Distillation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CMU Y2 Rosetta GnG Distillation Jonathan Elsas Jaime Carbonell

  2. Y1 System Rank Learning Y2 System Y2 Eval Y3 Eval+ Rosetta GnG System Evolution Y1 Eval

  3. Distillation Challenges • Multiple aspects to information need: • Query arguments, Locations, Related Words • Static expansion terms/phrases • Bigrams, trigrams, term windows • Named-Entity wildcards & constraints • Occurrence of each of these in a document* is a “feature” indicating relevance of the document* to the information need. • Question: How to best choose the weights for each feature? * Or sentences, paragraphs, “nuggets”, etc.

  4. Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features: Unigram Features

  5. Bigram & Term Window Features Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features:

  6. Entity-Type Constrained Features Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features:

  7. Entity Co-reference Features Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features:

  8. Static Template-based expansion (unigram, bigram, term windows) Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features:

  9. Query Feature Construction DESCRIBE THE ACTIONS OF [Mahmoud Abbas] DURING… Location : Middle East Equivalent terms: Mahmoud Abbas Abu Mazen President of the Palestinian National Authority Query Features: + potentially many more: structural features, PRF, & SRL annotations

  10. Learning Approach to Setting Feature Weights • Goal: Utilize existing relevance judgments to learn optimal weight setting • Recently has become a hot research area in IR. “Learning to Rank”

  11. Pair-wise Preference Learning • Learning a document scoring function • Treated as a classification problem on pairs of documents: • Resulting scoring function is used as the learned document ranker. Correct Incorrect

  12. Committee Perceptron Algorithm • Online algorithm (instance-at-a-time) • Fast training, low memory requirements • Ensemble method • Selectively chooses N best hypotheses encountered during training • “N heads are better than 1” approach • Significant advantages over previous perceptron variants • Many ways to combine output of hypotheses • Voting, score averaging, hybrid approaches • This is the focus of current research

  13. q, dR, dN R q, dR, dN N q, dR, dN q, dR, dN q, dR, dN q, dR, dN q, dR, dN Committee Perceptron Training Training Data Committee Current Hypothesis

  14. q, dR, dN R q, dR, dN N q, dR, dN q, dR, dN q, dR, dN q, dR, dN q, dR, dN Committee Perceptron Training Training Data Committee Current Hypothesis

  15. q, dR, dN R q, dR, dN N q, dR, dN q, dR, dN q, dR, dN q, dR, dN q, dR, dN Committee Perceptron Training Training Data Committee Current Hypothesis

  16. q, dR, dN R q, dR, dN N q, dR, dN q, dR, dN q, dR, dN q, dR, dN q, dR, dN Committee Perceptron Training Training Data Committee Current Hypothesis

  17. Committee Perceptron Performance

  18. Committee Perceptron Learning Curves

  19. Next Steps • (in progress) Integrate current work with GALE GnG system • Document ranking is the obvious first step • Passage ranking poses additional challenges • Both will be addressed this year • Implement feature-based query generation framework for Rosetta GnG System • Extend & improve performance of our rank learning algorithm

  20. Future Work • Investigate application of preference learning in Utility system, adapting to real-time user preference feedback.

More Related