250 likes | 469 Views
Analyzing and Evaluating Query Reformulation Strategies in Web Search Logs. Reporter Hsan-Yu Lin. Outline. Introduction Related Work Reformulation Strategies Reformulation Effectiveness Metrics Discussion And Conclusion. Introduction. Query reformulation (refinement)
E N D
Analyzing and EvaluatingQuery Reformulation Strategies in Web Search Logs Reporter Hsan-Yu Lin
Outline • Introduction • Related Work • Reformulation Strategies • Reformulation Effectiveness Metrics • Discussion And Conclusion
Introduction • Query reformulation (refinement) • Users frequently modify a previous search query in hope of retrieving better results • Goal: • Look at the types of query reformulation users perform • Evaluate them using effectiveness metrics such as click data
Related Work • Computer-Generated Reformulations
Related Work • Query Session Boundary Detection • Automatic new topic identification using multiple linear regression (Information Processing & Management 2006) • using time and common words • Identification of User Sessions with Hierarchical Agglomerative Clustering (ASIS&T ‘06) • using hierarchical clustering to find better timeout value
Procedure 1. Create taxonomy of query reformulation strategies defined by formal language 2. An unsupervised rule-based classifier in detecting the different query reformulation strategies 3. Analysis of correlations between query reformulation strategies and effectiveness metrics
Reformulation Strategies • Definitions: • _ : space character • P = {',−,.} : punctuation • λ : empty string • Σ = {[a - z],[0 - 9]}U P: alphabet • ci∈ Σ: character • wi∈ Σ∗: word • zi∈ ( Σ U {_} )∗ :any string
Reformulation Strategies • REFORM. 1: WORD REORDER • seattle pizza palace pizza seattle palace • REFORM. 2: WHITESPACE AND PUNCTUATION • wal mart, tomatoprices walmart tomato prices
Reformulation Strategies • REFORM. 3: REMOVE WORDS • yahoo stock price price yahoo • REFORM. 4: ADD WORDS • eastlake home eastlake home price index • REFORM. 5: URL STRIPPING • http www.yahoo.com yahoo
Reformulation Strategies • REFORM. 6: STEMMING • running over bridges run over bridge • REFORM. 7: FORM ACRONYM • personal computer pc • REFORM. 8: EXPAND ACRONYM • pda personal digital assistant
Reformulation Strategies • REFORM. 9: SUBSTRING • is there spyware on my computer is there spywa • REFORM. 10: SUPERSTRING • nevada police rec nevada police records 2008 • REFORM. 11: ABBREVIATION • shortened dict --> short dictionary
Reformulation Strategies • REFORM. 12: WORD SUBSTITUTION • Synonym: easter egg search easter egg hunt • Hyponym: crimson scarf red scarf • Hypernym: personal computer laptop • Meronym: finger hand • Holonym: automobile wheel • REFORM. 13: SPELLING CORRECTION • reformualtion reformulation
Undetected Reformulations • Categories of reformulations which are not included in taxonomy: • Semantic Rephrasing • how to calculate nutritional values weight watchers calculator • Multi-Reformulations • lane county gabrage lane county garbage disposal (add words and spelling correction) • Classifier Rule Limitations • spelling correction used a Levenshtein edit distance of 2 • Wordnet database limitation
Measures For Session Boundary Detection • Test data: • 100 users in the AOL query logs for evaluation • Same queries were removed (40.8% of queries) • 9,091 query pairs • 2,483 reformulations and 6,608 new queries (27.3% reformulations)
Measures For Session Boundary Detection • Hope high precision but not necessarily high recall • interested in inter-reformulation rather than intra-reformulation
Reformulation Effectiveness Metrics • Data: AOL query logs (released on 08/03/2006) • Queries: 36,389,567 • 16,069,421 new queries • 14,861,326 same queries • 3,411,706 reformulations • Metrics • Click Pattern • Click URL • Rank Change of Clicked Results
Click Pattern • (SkipSkip + ClickSkip) v.s (SkipClick + ClickClick) • (SkipSkip) v.s (SkipClick)
Discussion • different reformulation strategies were effective depending on the action from the initial query • Word substitution • Skip Skip • Click Click • spelling correction • Skip Click • Click Skip
Limitations • Lack of Context • Normalized Query Logs • Ambiguous Queries • ‘american airlines’ , ‘delta airlines’ • Search Engine Effects
CONCLUSIONS • Describes the human side of query reformulation and contributes to our understanding of users in search interaction • add/remove words, word substitution, acronym expansion, and spelling correction seem most effective • acronym formation and reordering wordsmay be less beneficial to the user