240 likes | 354 Views
Mining Term Association Patterns from Search Logs for Effective Query Reformulation. Xuanhui Wang and ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign. Ineffective Queries. reduce space command latex. Effective Queries. squeeze space command latex.
E N D
Mining Term Association Patterns from Search Logs for Effective Query Reformulation Xuanhui Wang and ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign ACM CIKM 2008, Oct. 26-30, Napa Valley
Ineffective Queries reduce space command latex ACM CIKM 2008, Oct. 26-30, Napa Valley
Effective Queries squeeze space command latex ACM CIKM 2008, Oct. 26-30, Napa Valley
More Examples • If you want to wash your vehicle • “vehicle wash”, “auto wash” • “car wash”, “truck wash” • If you want to buy a car • “auto quotes” • “auto sale quotes”? • “auto insurance quotes”? ACM CIKM 2008, Oct. 26-30, Napa Valley
What Makes a Query Ineffective? • Vocabulary mismatch • “reduce space command latex” vs “squeeze space command latex” • “auto wash” vs “car wash” • Lack of discrimination • “auto quotes” vs “auto sale quotes” • … Term substitution Term addition How can we help improving ineffective queries? ACM CIKM 2008, Oct. 26-30, Napa Valley
Our Contribution • We cast query reformulation as term levelpattern mining from search logs • We define two basic types of patterns at term level and propose probabilistic methods • Context-sensitive term substitution • “autocar | _wash”, “car auto | _trade” • Context-sensitive term addition • “+sale | auto_quotes” • We evaluate our methods on commercial search engine logs and show their effectiveness ACM CIKM 2008, Oct. 26-30, Napa Valley
Problem Formulation q = auto wash Search logs Task 1:Contextual Models Task 3: Pattern Mining Query Collection autocar | _washautotruck | _wash Patterns Task 2:Translation Models +southland | _auto wash… car washtruck wash southland auto wash… Offline part Online part ACM CIKM 2008, Oct. 26-30, Napa Valley
Task 1: Contextual Models • Syntagmatic relations • Capture terms frequently co-occur with w inside queries enterprise car rental rental car budget car rentalcar pricingcar picturescar accidents… Sample query collection G: General context rental: 0.375enterprise: 0.125budget: 0.125pricing: 0.125… Model PG( * |car) ACM CIKM 2008, Oct. 26-30, Napa Valley
Task 1: Contextual Models Syntagmatic relations Capture terms frequently co-occur with w inside queries enterprise car rental rental car budget car rentalcar pricingcar picturescar accidents… Sample query collection L1: 1st Left Context rental: 0.333enterprise: 0.333budget: 0.333… Model: P L1( * | car) ACM CIKM 2008, Oct. 26-30, Napa Valley 9
Task 1: Contextual Models Syntagmatic relations Capture terms frequently co-occur with w inside queries enterprise car rental rental car budget car rentalcar pricingcar picturescar accidents… Sample query collection R1: 1st Right context rental: 0.4pricing: 0.2pictures: 0.2accidents: 0.2 … Model: P R1( * |w) ACM CIKM 2008, Oct. 26-30, Napa Valley 10
Task 2: Translation Models • Paradigmatic relations (“car” and “auto”) • Capture terms that are substitutable with w • Similar contexts high translation probability • Translation models Probability of generating s’s context from w’s contextual model Size of L1 context Size of R1 context ACM CIKM 2008, Oct. 26-30, Napa Valley
Task 3.1: Pattern Mining–Term Substitution q=[w1…wi-1wiwi+1…wn] Global factor:translation model Substitute wi by s q’=[w1…wi-1swi+1…wn] Local factor Which word s should be chosen? ACM CIKM 2008, Oct. 26-30, Napa Valley
Estimating Local Factor s w1…wi-1__wi+1…wn Independence … … Ignore those terms far away ACM CIKM 2008, Oct. 26-30, Napa Valley
Task 3.2: Pattern Mining–Term Addition q=[w1…wi-1wi…wn] Uniform Adding r before wi q’=[w1…wi-1rwi…wn] Similar to the Local Factor in Term Substitution Patterns ACM CIKM 2008, Oct. 26-30, Napa Valley
Evaluation: Data Preparation Future logs History Logs 5/1/2006 5/20/2006 5/31/2006 • From Microsoft Live Labs History Collection 4.4M queries 1.6M are distinct 1.3M user sessions Used to construct test cases ACM CIKM 2008, Oct. 26-30, Napa Valley
Examples of Contextual Models • Left and Right contexts are different • General context mixed them together ACM CIKM 2008, Oct. 26-30, Napa Valley
Examples of Translation Models • Conceptually similar keywords have high translation probabilities • Provide possibility for exploratory search in an interactive manner ACM CIKM 2008, Oct. 26-30, Napa Valley
Examples of Term Substitution • Substitution is context sensitive • Intuitively, reworded queries are more effective ACM CIKM 2008, Oct. 26-30, Napa Valley
Effectiveness Comparison of Term Substitution – Experiment Design … Q1 Q2 Session Qk R21 R22 R23 … Rk1 Rk2 Rk3 … C1 … C3 C2 How well can a reformulated query rank C1, C2, and C3 on the top? reformulation Q1 Q1’ Q2’ Q3’ dx C3 C1 C2 dx … dx C1 dx dx dx … dx C2 dx C3 dx … Best P@5=0.6 P@5 0.6 0.2 0.4 ACM CIKM 2008, Oct. 26-30, Napa Valley
Results Our method [Jones’06] #Recommended Queries Our method reformulates queries more effectively ACM CIKM 2008, Oct. 26-30, Napa Valley
Term Addition Patterns Term addition patterns can refine a broad query ACM CIKM 2008, Oct. 26-30, Napa Valley
Related Work • Query suggestions [e.g., Jones’06, Sahami et al’06] • Discover pattern at query level • Rely on external resources or training data • Does not consider the effectiveness • Query modifications in IR [Rocchio’71, Anick’03] • Expand queries from returned documents • Does not rely on search logs, mostly adding terms • Related work in NLP community [Lin’98, Rapp’02] • Finding synonym or near synonyms • Syntagmatic and paradigmatic relations • Not used for query reformulation ACM CIKM 2008, Oct. 26-30, Napa Valley
Conclusions and Future Work • We propose a new way to mine search logs for patterns to address ineffective queries • Vocabulary mismatch • Lack of discrimination • We define and mine two basic patterns at term level • Context-sensitive term substitution patterns • Context-sensitive term addition patterns • Experiments show the effectiveness of our methods • In the future, • Use relevance judgments instead of clicks • Exploit click information for better query reformulation ACM CIKM 2008, Oct. 26-30, Napa Valley
Thank You! ACM CIKM 2008, Oct. 26-30, Napa Valley