200 likes | 328 Views
Clustering Web Queries. John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM ’ 09 Speaker: Hsin-Lan, Wang Date: 2010/08/31. Outline. Introduction Experimental Setup Similarity to Manual Labelings Classification Quality Metric Split Discoveries
E N D
Clustering Web Queries John S. Whissell, Charles L.A. Clarke, Azin Ashkan CIKM’09 Speaker: Hsin-Lan, Wang Date: 2010/08/31
Outline • Introduction • Experimental Setup • Similarity to Manual Labelings • Classification Quality Metric • Split Discoveries • Clickthrough Analysis Based on Detected Query Categories • General Web Query Clustering • Concluding Discussion
Introduction • Clustering methods suffer from notable problems, including the evaluation of results. • ground truth labelings • objective functions • Goal: evaluate the quality of clustering results • not require comparison to ground truth • not use a specific clustering algorithm’s objective function
Introduction • Clustering Web Queries: • navigational/informational queries • commercial/non-commercial queries
Experimental setup • Data Set • Weighting Methods • Clustering Algorithms
Data Set • Microsoft adCenter • Includes a record of queries entered, ads displayed and ads clicked. • Personally identifying information was removed. • Commercially-oriented: 1700 queries were selected for which the ad click frequency of the query was above 10.
Data Set • For each query, two types of features available: • search engine result page (SERP) • query-specific features
Clustering Algorithms • K-means clustering using Lloyd’s method (kmeans) • Normalized-Cut Spectral clustering (spect) • UPGMA clustering (upgma) • Single Link clustering (slink) • Complete Link clustering (clink) • Document clustering algorithms from Zhao and Karypis: e1, i1, i2, g1, g1p, and h1 objective functions
Classification Quality Metric • Train a classifier to recognize clusters in a clustering. • Classification accuracy (accc): using crossfold validation
Classification Quality Metric • Illustrate a correlation between Na using a linear SVM and internal similarity.
Clickthrough Analysis Based on Detected Query Categories • Clustering+SVM • Clickthrough rate: percentage of queries in that set that had an ad click
Concluding Discussion • Cluster objects using multiple representations and algorithms. • Classification accuracy is used to measure the quality of a clustering. • Future work: extend metric to select the number of clusters