230 likes | 347 Views
Query Classification. and KDDCUP 2005 Qiang Yang, Dou Shen. Query Classification and Online Advertisement. QC as Machine Learning. Inspired by the KDDCUP’05 competition Classify a query into a ranked list of categories Queries are collected from real search engines
E N D
Query Classification and KDDCUP 2005 Qiang Yang, Dou Shen
QC as Machine Learning Inspired by the KDDCUP’05 competition • Classify a query into a ranked list of categories • Queries are collected from real search engines • Target categories are organized in a tree with each node being a category 3
Solutions: Query Enrichment + Staged Classification Solution 1: Query/Category Enrichment 5 Solution 2: Bridging classifier
Category information Title Snippet Category Full text Query enrichment • Textual information 6
E D Classifiers • Map by Word Matching • Direct and Extended Matching • High precision, low recall • SVM: Apply synonym-based classifiers to map Web pages from ODP to target taxonomy • Obtain <pages, target category> as the training data • Train SVM classifiers for the target categories; • Higher Recall 7
Bridging Classifier Problem with Solution 1: When target is changed, training needs to repeat! Solution: Connect the target taxonomy and queries by taking an intermediate taxonomy as a bridge 8
Bridging Classifier (Cont.) The relation between and • How to connect? The relation between and The relation between and Prior prob. of 9
Category Selection for Intermediate Taxonomy Category Selection for Reducing Complexity Total Probability (TP) Mutual Information 10
Experiment─ Data Sets & Evaluation • KDDCUP • Starting at 1997, KDD Cup is the leading Data Mining and Knowledge Discovery competition in the world, organized by ACM SIGKDD • KDDCUP 2005 • Task: Categorize 800K search queries into 67 categories • Three Awards • (1) Performance Award ; (2) Precision Award; (3) Creativity Award • Participation • 142 registration groups; 37 solutions submitted from 32 teams • Evaluation data • 800 queries randomly selected from the 800K query set • 3 human labelers labeled the entire evaluation query set (details) • Evaluation measurements: Precision and Performance (F1) (details) • a 11/ 68
Experiment Results─ Compare Different Methods Comparison among our own methods Comparison with other teams in KDDCUP2005 From Different Groups 12/ 68
Result of Bridging Classifiers Using bridging classifier allows the target classes to change freely without the need to retrain the classifier! • Performance of the Bridging Classifier with Different Granularity of Intermediate Taxonomy
Target-transfer Learning • Classifier, once trained, stays constant • When target classes change, classifier needs to be retrained with new data • Too costly • Not online • Bridging Classifier: • Allow target to change • Application: advertisements come and go, but our querytarget mapping needs not be retrained! • We call this the target-transfer learning problem
Data: Web Search Queries AAAI Machine learning Constraint Reasoning • Consider the following search queries • “AAAI” • “Machine Learning” • “Constraint Reasoning”
AAAI 07, joint work with D. Shen, J. Sun, M. Qin, Z. Chen et al. Queries have different granularity Car v.s BMW; BMW v.s. AUDI Can we organize the queries into hierarchies? Benefits of building query hierarchies Provide online query suggestion Query classification Query clustering Difficulties of building query hierarchies Queries are short The hierarchical structure cannot be pre-defined
Clickthrough Data Clickthrough Data Search Engines
Intuitive Ideas Our goal: mine the query hierarchies from clickthrough data If two queries are related to each other, they should share some of the same or similar clicked Web pages; For two queries qi and qj, qiis more general if most of the clicked pages of qjhave similar pages to some clicked pages of qi while not the other way around If a query is specific, the contents of its clicked pages are relatively consistent,