160 likes | 273 Views
Query Chains: Learning to Rank from Implicit Feedback. Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr. The Problem. The results returned from web searches can be cluttered with results that the user considers to be irrelevant
E N D
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr
The Problem • The results returned from web searches can be cluttered with results that the user considers to be irrelevant • Search engines don’t learn from your document selections or from revisions to your query
Page Ranking Non-learning Methods • Link-based (Google PageRank) Learning Methods • Explicit user feedback • Ask the user how relevant they found the result • Very accurate data, but very time-consuming • Implicit user feedback • Determine the relevance by looking at search engine logs • Unlimited data at a low cost, but requires interpretation
The Solution • Automatically detect query chains • Use query chains to infer relevance of results in each query and between results from all queries in the chain • Use a ranking Support Vector Machine (SVM) to learn a retrieval function from the results. • Osmot search engine based on this model
Query Chains • People often reword their queries to get more useful results • Spelling mistake • Increased or decreased specificity • New but related query • Query chains are defined as a sequence of reformulated queries
Support Vector Machines • Learning method used for classification • Separates two classes of data points by generating a hyperplane that maximizes the vector distance between the two sets and the hyperplane • Uses the hyperplane to assign new data points to one of the two classes
Identifying Query Chains • Manually labeled query chains from the Cornell University library search engine for a period of five weeks • Used data to train SVM’s with various parameters, giving an accuracy of 94.3% and a precision of 96.5% • Non-learning strategy of assuming all queries from the same IP in a 30 minute period belong to the same chain gave an accuracy and precision of 91.6% • The non-learning strategy was sufficiently accurate and less expensive so they used it instead
Inferring Relevance Developed six strategies for generating feedback from query chains • Click >q Skip Above: A clicked on document is more relevant than any documents above it • Click First >q No-Click Second: Given the first two document results, if the first was clicked, it is more relevant • Strategies 3 and 4 are the same as the first two, but with respect to the previous query • Click >q’ Skip Earlier Query: A clicked on document is more relevant than any that were skipped in any earlier query • Click >q’ Top Two Earlier Query: If nothing was clicked in the last query, the clicked document is more relevant than the top two from an earlier query
Experiment • The Osmot search engine was created as a wrapper, implementing logging, analysis and ranking • Users presented with a combination of results from two different ranking functions • Evaluate which ranking was better based on which documents were clicked • Evaluation conducted over two months collecting around 2400 queries
Experiment Results • Users preferred results from the query chain ranking function 53% of the time • Model trained with query chains outperformed model trained without query chains with 99% confidence
Conclusion • Developed an algorithm to determine the relevance of a document from log entries • Developed another algorithm to use preference judgments to learn an improved ranking function • Algorithm can learn to include documents that weren’t included in the original search results
Critique • The learning method uses only log files rather than constantly updating itself • Referred to other papers rather than explain concepts needed to understand the paper • Didn’t offer a comparison between the effectiveness of their learning algorithm compared to other learning algorithms