140 likes | 214 Views
Context-Aware Click Modeling. Mao Jiaxin Tsinghua University. User clicks: An Important Source of Implicit Relevance Feedback. Large volume Easy to collect Informative. However, User Clicks are Biased. Position-bias Higher Position =>Higher CTR ≠ >Higher relevance.
E N D
Context-Aware Click Modeling Mao Jiaxin Tsinghua University
User clicks: An Important Source of Implicit Relevance Feedback Large volume Easy to collect Informative
However, User Clicks are Biased Position-bias • Higher Position • =>Higher CTR • ≠>Higher relevance [Lorigo, et al. J. Am. Soc. Inf. Sci., 2008]
Click Models Decompose relevance-driven clicks from position-driven clicks • Examine: user reads the result, otherwise • Click: user clicks the result, otherwise • Examination hypothesis:
Validate the Examination Hypothesis Lab experiment • 31 subjects • 25 queries Obtain the ground-truth of E by: • Eye-tracking device • Subjects’ explicit annotations is dependent on the context!
Influence from Context Context when examining a result: • Current query • Previous examined results • Prior clicks Hypothesis: • The user may be distracted by the similar or redundant results in the context [Xiong et al. WSDM’12] • The user’s intention of clicking may be affected by prior clicks [Srikant et al. KDD’10]
Refine P(C|E) Take context factors into consideration Where , is the relevance information to extract and is the context-related features.
Data 15 day click log • 350K queries • 7M training sessions and 3.5M test sessions • For each session, the log only contains the query, the URLs of the results and the clicks Crawled SERPs that covers parts the click log
Context Features Context features: • Number of prior clicks • If the result from the same host has been examined • If the result from the same host has been clicked
Data Analysis on Click log According to linear browsing assumption (cascade model): [Srikant et al. KDD’10]: URLs shown from previously displayed hosts are not favored by users Users favor the same host that they have clicked on
Incorporating with UBM UBM model [Dupret et al.SIGIR’08] • , EM algorithm: E-step: M-step: using Stochastic Gradient Descent (SGD) to optimize
Predicting the CTR The perplexity (lower is better) on sampled data:
Progress and Plan Progress: • Implement EM+SGD framework Plans • Tune and speed-up the training process • Incorporate with other click models, like DBN, CCM… • Build evaluation set for evaluating relevance estimation performance • Search for other context-related features • Assign different weight vector w to different positions and hosts