1 / 22

Compact Query Term Selection Using Topically Related Text

Compact Query Term Selection Using Topically Related Text. Date : 2013/10/09 Source : SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia -ling, Koh Speaker : Shun-Chen, Cheng. Outline. Introduction The PhRank Algorithm Graph Construction Edge Weight Random Walk

mahon
Download Presentation

Compact Query Term Selection Using Topically Related Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Compact Query Term SelectionUsing Topically Related Text Date:2013/10/09 Source:SIGIR’13 Authors : K. Tamsin Maxwell, W. Bruce Croft Advisor : Dr.Jia-ling, Koh Speaker : Shun-Chen, Cheng

  2. Outline • Introduction • The PhRank Algorithm • Graph Construction • Edge Weight • Random Walk • Vertex weights • Term ranking • Diversity filter • Experiment • Conclusions

  3. Introduction Query:Locations of volcanic activity which occurred within the present day boundaries of the U.S. and its territories.

  4. Introduction • long queries contain words that are peripheral or shared across many topics so expansion is prone to query drift. • Past:jointly optimize weights and term selection using both global statistics and local syntacticfeatures • Shortcoming:fail to detect or differentiate informative terms Don’t identify all the informative relations Don’t reflect local query context

  5. Introduction • Goal:novel term ranking algorithm, PhRank, that extends work on Markov chain frameworks for query expansion to select compact and focused terms from within a query itself.

  6. Outline • Introduction • The PhRank Algorithm • Graph Construction • Edge Weight • Random Walk • Vertex weights • Term ranking • Diversity filter • Experiment • Conclusions

  7. Principles for Term Selection • An informative word: • Is informative relative to a query accurately represent the meaning of aquery. • Is related to other informative words if one index termis good at discriminating relevant from non-relevantdocuments, then any closely associated index term isalso likely to be good at this • Contains informative words all terms must containinformative words. • Is discriminative in the retrieval collection A term that occurs many times within a small number of documents gives a pronounced relevance signal.

  8. Graph Construction • C:retrieval collection & English Wikipedia Example: • Q:a b • Top k documents :d1,d2 (if k=2) • N(Neighborhood set): {d0,d1,d2},d0:query encoded Graph G: d1:c b e d2:afb a f e b c

  9. Edge Weight 、 : the counts of stem co-occurrence in window size=2 and 10 in N : the probability of the document in which the stems i and j co-occur given Q With idf-weight: factor r confirms the importance of a connection between i and j in N

  10. Random Walk 0.009 1 0.9 0.001 0.1 0.6 0.005 3 2 0.8 0.395 0.1 If it starts from node 1 at time=0 Then the probability that walks to node 3 at time=1 0.0090.90.01 0.0090.90.01 0.10.80.1 0.10.80.1 H= 0.60.0050.395 0.60.0050.395 = 1 0 0 0.6 0.005 0.395

  11. Vertex weights • Factor s balances exhaustivity with global saliency to identify stems that are poor discriminators been relevant and non-relevant documents frequency of a word wnin N ,averagedover k + 1 documents, and normalized by the maximum average frequency of any term in N the number of documents in C containing wn TREC query #840:‘Give the definition, locations, or characteristics of geysers’. => “definition geysers” is not more informative

  12. Example |N| = 3 , |C|=35 Wn = geysers The avg frequency of “geysers” in N = 12/3 max avg frequency of any term in N = 4 dfwn = 3 Wn = definition The frequency of “definition” in N = 2/3 max avg frequency of any term in N = 4 dfwn = 1

  13. Term ranking • Input:all combinations of 1-3 words in a query that are not stopwords. • Output:Rank list sorted by f(x,Q) score • To avoid a bias towards longer terms, a term x is scored by averaging the affinity scores for its component words • factor zx that represents the degree to which the term is discriminative in a collection the frequency of xe in C

  14. Query: Locations of volcanic activity which occurred within the present day boundaries of the U.S. and its territories. • example Term x = volcanic boundaries Term x = volcanic U.S

  15. Outline • Introduction • The PhRank Algorithm • Diversity filter • Experiment • Conclusions

  16. Diversity filter • PhRank often assigns a high rank to multi-word terms that contain only one highly informative word • For example, query: the destruction of Pan Am Flight 103 over Lockerbie, Scotland • term ‘pan flight 103 ’ is informative • “pan” is uninformative by itself Example: Way 1: Way 2: . . declining birth . birth rate . . . declining birth rate . . birth rate china . . . . birth rate Discarded! on the assumption that the shorter terms better represent the information need and the longer term is redundant. on the assumption that the longer term better represents the information need.

  17. Outline • Introduction • The PhRank Algorithm • Diversity filter • Experiment • Conclusions

  18. Experiment • Dataset F:excluded from features ,T:include in features

  19. Experiment

  20. Experiment TREC description topics TREC title queries

  21. Outline • Introduction • The PhRank Algorithm • Diversity filter • Experiment • Conclusions

  22. Conclusions • have presented PhRank, a novel term ranking algorithm that extends work on Markov chain frameworks for query expansion to select focused and succinct terms from within a query. • For all collections, around 26% of queries have more than 5% decrease in MAP compared to SD • Efficiency considerations surrounding the time to construct an affinity graph may be ameliorated by off-line indexing to precompute a language model for each document in a collection.

More Related