320 likes | 421 Views
Predicting Short-Term Interests Using Activity-Based Search Context. CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh. Outline. Introduction Modeling Search Activity Study Conclusions. Introduction.
E N D
Predicting Short-Term Interests Using Activity-Based Search Context CIKM’10 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh
Outline • Introduction • Modeling Search Activity • Study • Conclusions
Introduction • Satisfying searchers’ information needs involves a through understanding of their interests through: - search query - search engine result page (SERP) clicks - post-SERP browsing behavior • Construct interest models of the current query which including: - previous queries - previous clicks on SERP • Evaluate the predictive effectiveness of these models using future actions
Modeling Search Activity • Data - The data set contained browser logs with both searching and browsing episodes. - Log entries include a timestamp for each page view, and the URL of the Web page visited - Only in English-speaking United States locale - Search sessions on the Bing Web search engine were extracted
Modeling Search Activity • ODP Labeling - Represented context a distribution across categories in ODP topical hierarchy. - Provides a consistent topical representation of queries and page visits from which to build the models. - ODP category label can also reflect topical differences in the search results for a query or a user’s interests - Automatic classification skill to assign an ODP category labels to each page. - 219 categories at the top two levels of the ODP hierarchy were used ( called L) -
Modeling Search Activity • ODP Labeling - Strategy of labeling a page 1. Begin with URLs present in the ODP 2. Incrementally prunes non-present URLs until a match is found, or miss declared 3. Check for exact match with logistic regression classifier
Modeling Search Activity • Sources and Source Combinations - ODP labels automatically assigned to the following sources: 1. Query: the top 10 search results for the query 2. SERPClick: the search results clicked by the user during the search session 3. NavTrai: Web pages that the user visits from a SERP click
Modeling Search Activity • Model Definitions– Query Model(Q) - For each query, the category labels for the top 10 search results were obtained. - Probabilities are assigned to the categories in L by 1. normalized click frequencies for each top 10 results from search-engine click log data 2. the distribution across all ODP category labels - ODP categories in L that are not used to label are assigned the prior probabilities
Modeling Search Activity • Model Definitions– Context Model(X) - The context model is constructed based on actions which comprise previous data as follows: 1. Queries 2. Web pages visited through a SERP click 3. Web pages visited on the navigational trail following a SERP click
Modeling Search Activity • Model Definition – Intent Model(I)
Modeling Search Activity • Relevance Model or Ground Truth (R) - The relevance model contains actions that occur following the current query in the session
Study • Learning Optimal Context Weights Steps 1. Identify the optimal context weight (w) for each query on a held out training set 2. Create features for the query and the context that could be useful in predicting w
Study • Learning Optimal Context Weights - To create a training set, the query, context, and relevance models were used to compute the optimal context weight per query by minimizing the regularized cross-entropy for each query independently.
Study A regularizer that penalizes deviations from w=0.5
Study • Generating Features of Query and Context - Divide features into three classes: 1. Query class: capturing characteristics of the current query and the query model. 2. Context class: capturing aspects of the pre-query interaction behavior as well as features of the context model themselves. 3. QueryContext: capturing aspects of how the query model and context model compare. - These features were generated for each session in the set and used to train a predictive model
Study • Generating Features of Query and Context - Query class
Study • Generating Features of Query and Context - Context class
Study • Generating Features of Query and Context - QueryContext class
study • Predicting the Optimal Context Weight - 60% of those queries for training, 20%for validation, 20% for testing - 10-fold cross validation was performed to improve result reliability. - The folds were constructed by splitting session, so that all queries in a session are used for either training, validation, or testing
study • Predicting the Optimal Context Weight The most performant features related to the information divergence to the query models and the context model
study • Predicting the Optimal Context Weight
study • Varying Context and Relevance Information
Conclusions • A study of investigating the effectiveness of activity-based context in predicting user’s search interests. • Explored the value of modeling the current query, its context and their combination, and different sources. • Intent models developed from many sources perform best overall. • Developed techniques to learn the optimal combinations.