460 likes | 587 Views
On Understanding and Classifying Web Queries. Prepared for :. Telcordia Contact: Steve Beitzel Applied Research steve@research.telcordia.com April 14, 2008. Overview. Introduction: Understanding Queries Query Log Analysis Automatic Query Classification Conclusions. Problem Statement.
E N D
On Understanding and Classifying Web Queries Prepared for: Telcordia Contact:Steve Beitzel Applied Research steve@research.telcordia.com April 14, 2008
Overview • Introduction: Understanding Queries • Query Log Analysis • Automatic Query Classification • Conclusions
Problem Statement • A query contains more information than just its terms • Search is not just about finding relevant documents – users have: • Target task (information, navigation, transaction) • Target topic (e.g., news, sports, entertainment) • General information need • User queries are simply an attempt to express all of the above in a couple of terms
Problem Statement (2) • Current search systems focus mainly on the terms in the queries • Systems do not focus on extracting target task & topic information about user queries • We propose two techniques for improving understanding of queries • Large-Scale Query Log Analysis • Automatic Query Classification • This information can be used to improve general search effectiveness and efficiency
Query Log Analysis • Introduction to Query Log Analysis • Our Approach • Key Findings • Conclusions
Introduction • Web query logs are a source of information on users’ behaviors on the web • Analysis of logs’ contents may allow search services to better tailor their products to serve users’ needs • Existing query log analysis focuses on high-level, general measurements such as query length and frequency
Our Approach • Examine several aspects of the query stream over time: • Total query volume • Topical trends by category: • Popularity (Topical Coverage of the Query Stream) • Stability (Pearson Correlation of Frequencies)
Query Log Characteristics • Analyzed two AOL search service logs: • One full week of queries from December, 2003 • Six full months of queries; Sept. 2004-Feb. 2005 • Some light pre-processing was done: • Case differences, punctuation, & special operators removed; whitespace trimmed • Basic statistics: • Queries average 2.2 terms in length • Only one page of results is viewed 81% of the time • Two pages: 18% • Three or more: 1%
Category Breakdown • Query lists for each category formed by a team of human editors • Query stream classified by exactly matching each query to category lists
Key Findings • Some topical categories vary substantially more in popularity than others over an average day • Some topics are more popular during particular times of the day • others have a more constant level of interest • Most Individual categories are substantially less divergent over longer periods • Still some seasonal changes (Sports, Holidays)
Pearson Correlations for Selected Categories Over Six Months
Key Findings • The query sets for different categories have differing similarity over time • The level of similarity between the actual query sets received within topical categories varies differently according to category • As we move out to very large time scales, new trends become apparent: • Climatic (Seasonal) • Holidays • Sports-related • Several major events fall within the studied six-month period, causing high divergence in some categories • Long-term trends like these can potentially be very useful for query routing & disambiguation
Summary • Query Stream contains trends that are independent of volume fluctuation • Query Stream exhibits different trends depending on the timescale being examined • Future work may be able to leverage these trends for improvement in areas such as • Caching strategies • Query disambiguation • Query routing & classification
Automatic Query Classification • Introduction: Query Classification • Motivations & Prior Work • Our approach • Results & Analysis • Conclusions • Future Work
Introduction • Goal is to conceive an approach that can identify a query with relevant topical categories • Automatic classifiers help a search service decide when to use specialized databases • Specialized databases may provide tailored, topic-specific results
Problem Statement • Current search systems focus mainly on the terms in the queries • No focus on extracting topic information • Manual query classification is expensive • Does not take advantage of the large supply of unlabeled data available in query logs
Prior Work • Much early text classification was document-based • Query Classification: • Manual (human assessors) • Automatic • Clustering Techniques – doesn’t help identify topics • Supervised learning via retrieved documents • Still expensive – retrieved documents must be classified
Automatic Query Classification Motivations • Web queries have very few features • Achieving and sustaining classification recall is difficult • Web query logs provide a rich source of unlabeled data; we must harness these data to aid classification
Our Approach • Combine three methods of classification: • Labeled Data Approaches: • Manual (exact-match lookup using labeled queries) • Supervised Learning (Perceptron trained with labeled queries) • Unlabeled Data Approach: • Unsupervised Rule Learning with unlabeled data from a large query log • Disjunctive Combination of the above
Approach #1 - Exact-Match to Manual Classifications • A team of editors manually classified approximately 1M popular queries into 18 topical categories • General topics (sports, health, entertainment) • Mostly popular queries • Pros • Expect high precision from exact-match lookup • Cons • Expensive to maintain • Very low classification recall • Not robust to changes in the query stream
Approach #2 - Supervised Learning with a Perceptron • Goal: achieve higher levels of recall than human efforts • Supervised Learning • Used heavily in text classification • Bayes, Perceptron, SVM, etc… • Use manually classified queries to train a classifier • Pros: • Leverages available manual classifications for training • Finds features that are good predictors of a class • Cons: • Entirely dependant on the quality andquantity of manual classifications • Does not leverage unlabeled data
Approach #3 - Unsupervised Rule Learning Using Unlabeled Data • We have query logs with very large numbers of queries • Must take advantage of millions of users showing us how they look for things • Build on manual efforts • Manual efforts tell us some words from each category • Find words associated with each category • Learn how people look for topics, e.g. “what words do users use to find musicians or lawn-mowers”
Unsupervised Rule Learning Using Unlabeled Data (2) • Find good predictors of a class based on how users look for queries related to certain categories • Use those words to predict new members of each category • Apply the notion of selectional preferences to find weighted rules for classifying queries automatically
Selectional Preferences: Step 1 • Obtain a large log of unlabeled web queries • View each query as pairs of lexical units: • <head, tail> • Only applicable to queries of 2+ terms • Queries with n terms form n-1 pairs • Example: “directions to DIMACS” forms two pairs: • <directions, to DIMACS> and <directions to, DIMACS> • Count and record the frequency of each pair
Selectional Preferences: Step 2 • Obtain a set of manually labeled queries • Check the heads and tails of each pair to see if they appear in the manually labeled set • Convert each <head, tail> pair into: • <head, CATEGORY> (forward preference) • <CATEGORY, tail> (backward preference) • Discard <head, tail> pairs for which there is no category information at all • Sum counts for all contributing pairs and normalize by the number of contributing pairs
Selectional Preferences: Step 3 • Score each preference using Resnik’s Selectional Preference Strength formula: • Where urepresents a category, as found in Step 2. • S(x) is the sum of the weighted scores for every category associated with a given lexical unit
Selectional Preferences: Step 4 • Use the mined preferences and weighted scores from Steps 3 and 4 to assign classifications to unseen queries
Forward Rules harlem club X ENT->0.722 PLACES->0.378 TRAVEL->1.531 harley all stainless X AUTOS->3.448 SHOPPING->0.021 harley chicks with X PORN->5.681 Backward Rules X gets hot wont start AUTOS->2.049 PLACES->0.594 X getaway bargain PLACES->0.877 SHOPPING->0.047 TRAVEL->0.862 X getaway bargain hotel and airfare PLACES->0.594 TRAVEL->2.057 Selectional Preference Rule Examples
Combined Approach • Each approach exploits different qualities of our query stream • A natural next step is to combine them • How similar are the approaches?
Evaluation Metrics • Classification Precision: • #true positives / (#true positives + #false positives) • Classification Recall: • #true positives / (#true positives + # false negatives) • F-Measure: Higher values of beta put more emphasis on recall
Experimental Data Sets • Separate collections for training and testing: • Training: • Nearly 1M web queries manually classified by a team of editors • Grouped non-exclusively into 18 topical categories, and trained each category independently • Query log of several hundred million queries used for forming SP rules • Testing: • 20,000 web queries classified by human assessors • ~30% agreement with classifications in training set • 25% of the testing set was set aside for tuning the perceptron & SP classifiers
KDD Cup 2005 • 2005 KDD Cup task was Query Classification • 800,000 queries and 67 topical categories • 800 queries judged by three assessors • Top performers used information from retrieved documents • Retrieved result snippets for aiding classification decisions • Top terms from snippets and documents used for query expansion • Systems evaluated on precision and F1
KDD Cup Experiments • We mapped our manual classifications on to the KDD cup category set • Obviously an imperfect mapping • Our categories are general, i.e. “Sports” • KDD Cup categories are specific, i.e. “Sports-Baseball” • Running a retrieval pass is prohibitively expensive • We relied only on our general manual classifications and queries in the log
Conclusions • Our system successfully makes use of large amounts of unlabeled data • The Selectional Preference rules allow us to classify a significantly larger portion of the query stream than manual efforts alone • Excellent potential for further improvements
Future Work • Expand available classification features per query • Mine web query logs for related terms and patterns • More intelligent combination methods • Learned combination functions • Voting algorithms • Utilize external sources of information • Patterns and trends from query log analysis • Topical ontology lookups • Use automatic query classification to improve effectiveness and efficiency in a production search system
Related Bibliography • Journals • S. Beitzel, et. al, “Temporal Analysis of a Very Large Topically Categorized Query Log”, Journal of the American Society for Information Science and Technology (JASIST), Vol. 58, No. 2, 2007. • S. Beitzel, et. al, “Automatic Classification of Web Queries Using Very Large Unlabeled Query Logs”, ACM Transactions on Information Systems (TOIS), Vol. 25, No. 2, April 2007. • Conferences • S. Beitzel, et. al, “Hourly Analysis of a Very Large Topically Categorized Web Query Log", ACM-SIGIR, July 2004. • S. Beitzel, et. al “Automatic Query Classification”, ACM-SIGIR, August 2005. • S. Beitzel, et. al, “Improving Automatic Query Classification via Semi-supervised Learning”, IEEE-ICDM, November 2005.
Questions? • Thanks!