1 / 40

Personalized Query Classification

Personalized Query Classification. Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST. Query Classification and Online Advertisement. QC as Machine Learning. Inspired by the KDDCUP’05 competition Classify a query into a ranked list of categories

elma
Download Presentation

Personalized Query Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Personalized Query Classification Bin Cao, Qiang Yang, Derek Hao Hu, et al. Computer Science and Engineering Hong Kong UST

  2. Query Classification and Online Advertisement

  3. QC as Machine Learning Inspired by the KDDCUP’05 competition • Classify a query into a ranked list of categories • Queries are collected from real search engines • Target categories are organized in a tree with each node being a category

  4. Our QC Demo • http://q2c.cs.ust.hk/q2c/

  5. Personalization • The aim of Personalized Query Classification is to classify a user query Q to a ranked list of predefined categories for different users

  6. PQC: Personalized Query Classification • classify a user query Q to a ranked list of categories for different users

  7. Question:Can we personalize search without user registration info? • Profile based PQC • Context based PQC • Conclusion

  8. Difficulties • Web Queries are • Short, sparse: “adi”, ”cs”, “ps” • Noisy: “contnt”, “gogle” • New words are emerging all the time: “windows7” • Training data are hard for human to label • Experts may have different understandings for the same ambiguous query • E.g. “Apple”, “Office”, etc.

  9. Method 1: Profile Based • Profile (U) = { <Q, Search-Result, Clicked-URL>} in the past • Profile based Personalized Query Classification Michael Jordan -…. √ …. √ …. -…. √ …. √ …. -…. √ …. √ …. √ …. -…. √ …. √ …. -…. √ …. √ …. -…. √ …. √ …. -…. √ …. √ …. -…. √ …. √ …. -…. √ …. - …. √ …. √ …. - …. √ …. √ …. √ …. -…. √ …. - …. √ …. √ …. √ …. -…. √ …. √ …. -…. √ ….

  10. Method 2: Context Based • Context = a session of user submitted queries Graphical Model Machine Learning UCB Michael Jordan

  11. Outline • Introduction • Profile based PQC • Context based PQC • Conclusion

  12. How to construct a user profile? • To achieve personalized query classification, under independence assumption • ACM KDDCUP 2005 Solution: estimating: p(q|c) • Focus: estimating p(u|c) for personalization • Difficulty: sparseness • Too many possible categories • Limited information for each user p(c|q,u) ∝ p(q|c)p(u|c)p(c)

  13. Categorized Clickthrough Data:Too Few! Clickthrough Data Search Engines

  14. Collaborative Classification • Leverage information from similar users: user-class matrix √ interested in X not interested in Also can be a value indicate degree of interests

  15. Extending Collaborative Filtering (CF) Model to Ranking (Liu and Yang, SIGIR 008) • Previous method for CF: • Memory based approach: Finding users having similar interests to help predicting missing values • Model based approach: estimating probability based on new user’s known values • We propose a collaborative ranking model to improve model based approach • Using preference or ranking instead of values • better at estimating the preference for users

  16. Item List Rating Prediction Ranking 1. Item y2 2. Item y3 Sort Predicted Ratings Nathan Liu and Qiang Yang. EigenRank: Collaborative Filtering via Rank Aggregation. In ACM SIGIR Conference (ACM SIGIR 08), Singapore, 2008 • Collaborative Ranking Framework Rating Database Active User Ratings

  17. Collaborative Ranking for Intention Mining Input Output |Preference={(URL1<URL2)}| |Intention category| Preference Matrix Interest Score Matrix P(U|C) |Category| |user, or user group| Our objective is to uncover the interest probability P(U|C) consistent with the given observed preference for each query |User|

  18. Solution: Automatically Generate LabeledData (to assist human labelers) C1 • Clickthrough • Connects queries and urls • Contains users’ personal interpretation for query User A Query url a || User B Query url b C2 We need the category information for urls …

  19. Experimental Results: F1 metric

  20. How to enlarge training set? A HUGE number of clickthrough logs without labels A few human labeled data √ …. -…. √ …. √ …. -…. √ …. 1…. 2…. 3…. √ …. -…. √ …. 1…. 2…. 3…. √ …. -…. √ …. 1…. 2…. 3…. √ …. -…. √ …. √ …. -…. √ …. √ …. -…. √ …. - …. -…. √ …. √ …. -…. √ …. √ …. -…. √ …. - …. -…. √ …. -…. √ …. √ …. √ …. -…. √ …. √ …. -…. √ …. - …. √ …. √ …. √ …. -…. √ …. Online Knowledge Bases, such as ODP, Wikipedia

  21. Online Knowledge Base such as WiKi Meaningful Ontology Plentiful Documents Knowledge Base Knowledge Base Links

  22. “Label” Retrieval from Online KB Labels on result pages: Shopping: Commercial Sports: non-Commercial Video Games: Commercial Research:non-Commercial Taking Online Commercial Intention as an example Wikipedia Concept Graph Use labeled result pages as “Seeds” to retrieve the most relevant documents as training data

  23. Obtain “Pseudo-Relevance” Data A few human labeled data A HUGE number of clickthrough logs We can use the HUGE “label” clickthrough log for evaluation √ …. -…. √ …. √ …. -…. √ …. 1…. 2…. 3…. √ …. -…. √ …. 1…. 2…. 3…. √ …. -…. √ …. 1…. 2…. 3…. √ …. -…. √ …. √ …. -…. √ …. √ …. -…. √ …. - …. -…. √ …. √ …. -…. √ …. √ …. -…. √ …. - …. -…. √ …. -…. √ …. √ …. √ …. -…. √ …. √ …. -…. √ …. - …. √ …. √ …. √ …. -…. √ …. We apply the classifier to “label” the HUGE clickthrough log We learn a classifier using the retrieved “labeled” documents

  24. Preliminary results on F(URL)C • We evaluated the performance of the classifier trained with the relevant documents retrieved from Wikipedia • AOL query data set, 10,000 held out for test

  25. Outline • Introduction • Profile based PQC • Context based PQC: Hao Hu, Huanhuan Cao, et al. @ SIGIR 2009, ACML 2009. • Conclusion

  26. Context based PQC for Online Commercial Intention Allan Iverson shoes T-short Commercial! Offer ads! Michael Jordan The commercial intention of the same query can be identified given its context information

  27. Context based PQC forOnlineCommercial Intention [Cao etc. SIGIR’09] Graphical Model Machine Learning UCB Non-Commercial! Redirect to scholar Search! Michael Jordan The commercial intention of the same query can be identified given its context information

  28. Two questions: • How do we model query context? • How do we detect whether two queries are semantically similar? Graphical Models Feature Generation/Enrichment

  29. Conditional Random Field Motivation: model the query logs as a conditional random field. Therefore, the relationships between consecutive and even skip queries can be modeled. Question: How do we decide whether two “skip queries” (non-consecutive queries) are related and should be linked?

  30. Semantic Relationship between queries • Given Query A and Query B, how do we determine the degrees of relevancy of these two queries in a semantic level? • Send queries to search engines • Obtain search results • Determine distance between search results

  31. Context based PQC for Online Commercial Intention • The commercial intention of the same query can be identified given its context information Allan Iverson shoes T-short Commercial! Offer ads! Michael Jordan

  32. Context based PQC for Online Commercial Intention • The commercial intention of the same query can be identified given its context information Graphical Model Machine Learning UCB Non-Commercial! Redirect to scholar Search! Michael Jordan

  33. Evaluation • Using context information • Vs • Not using context information

  34. Preliminary Experimental Results of PQC for Online Commercial Intention • Dataset • AOL Query Log data • Around ~20M Web Queries • Around 650K Web users • Data is sorted by anonymous UserID and sequentially arranged. • Each item of clickthrough log data contains • {AnonID, Query, QueryTime, ItemRank, ClickURL}

  35. Preliminary Results In our preliminary experimental studies, we annotated four users with the OCI (commercial / non-commercial) status in their clickthrough logs. More larger-scale experimental studies to be followed. Evaluation Metric: Standard F1-measure Baseline classifier: the classifier in Dai’s WWW 2006 work (http://adlab.msn.com/OCI/OCI.aspx)

  36. Preliminary Results The parameter we tune is the threshold we use to determine whether we add the “skip edges” in the CRF model or not.

  37. Ongoing work: Personalized Query Classification • Efficiency • More ground truth data for evaluation

  38. PQC and Personalized Search • Similar input: • Query Log, Clickthrough Data, IP Address, etc. • Different output: • Personalized Search • ranked results • PQC • Discrete intention categories, • Application: advertisements etc.

  39. Conclusions: PQC • Have user profile information? • Profile = <User, Query, URLs> • Output=Class • Method = Collaborative Ranking • Have query stream information? • Context = <User, Query-Stream, URLs> • Output=Class • Method = CRF-based method

  40. Q & A

More Related