290 likes | 714 Views
Query Suggestion Using Hitting Time. Qiaozhu Mei † , Dengyong Zhou ‡ , Kenneth Church ‡ † University of Illinois at Urbana-Champaign ‡ Microsoft Research, Redmond. Motivating Examples. Sports center. MSG. 1. Difficult for a user to express information need
E N D
Query Suggestion Using Hitting Time Qiaozhu Mei †, Dengyong Zhou ‡, Kenneth Church ‡ † University of Illinois at Urbana-Champaign ‡ Microsoft Research, Redmond
Motivating Examples Sports center MSG 1. Difficult for a user to express information need 2. Difficult for a Search engine to infer information need Food Additive Query Suggestions: Accurate to express the information need; Easy to infer information need
Motivating Examples (Cont.) Welcome to the hotel california
Motivating Examples: Personalization MSR Metropolis Street Racer Magnetic Stripe Reader Molten salt reactor Mars Sample Return … Mountain safety research Actually Looking for Microsoft Research…
Research Questions • How can we generate query suggestions in a principled way? • Can we generate personalized query suggestions using the same method? • Can this method be generalized to other search related tasks?
Rest of This Talk • Random Walk, Hitting Time, and Bipartite Graph • Generating Query Suggestion • Personalized Query Suggestion • Experiments • Discussion and Summary
Random Walk and Hitting Time P = 0.3 • Hitting Time • TA: the first time that the random walk is at a vertex in A • Mean Hitting Time • hiA: expectation of TA given that the walk starts from vertex i 0.3 k A i 0.7 P = 0.7 j
Computing Hitting Time hiA = 0.7 hjA + 0.3 hkA + 1 h = 0 • TA: the first time that the random walk is at a vertex in A 0.7 k A i • hiA: expectation of TA given that the walk starting from vertex i 0.7 Apparently, hiA = 0 for those j Iterative Computation
Bipartite Graph and Hitting Time • Bipartite Graph: • Edges between V1 and V2 • No edge inside V1 or V2 • Edges are weighted • e.g., V1 = query; V2 = Url 5 5 5 A A A 4 4 4 V1 V1 V1 0.4 0.4 0.4 V2 V2 V2 k 0.7 0.7 0.7 7 7 7 1 1 1 i i i w(i, j) = 3 j j j Expected proximity of query i to the query A : hitting time of i A, hiA • convert to a directed graph, even collapse one group
Generate Query Suggestion • Construct a (kNN) subgraph from the query log data (of a predefined number of queries/urls) • Compute transition probabilities p(i j) • Compute hitting time hiA • Rank candidate queries using hiA Query Url 300 T www.aa.com aa 15 www.theaa.com/travelwatch/planner_main.jsp mexiana american airline en.wikipedia.org/wiki/Mexicana
Intuition • Why it works? • A url is close to a query if freq(q, url) dominates the number of clicks on this url (most people use q to access url) • A query is close to the target query if it is close to many urls that are close to the target query
Personalized Query Suggestion • Queries are ambiguous • Different user different information need different query suggestions • Simple approach: build the graph, compute hitting time solely based on the user’s history • Data Sparseness • E.g., you cannot see a query if you never used it • Alternative: modify the bipartite graph instead of rebuilding all
Personalize the Bipartite Graph • Key: How to compute • From w(url, user, query) – Sparse data! • Compute a smoothed p(Url | User, Query) Query Url Reweight edges using personalized Probs. T aa www.aa.com pseudo query: P “aa” + user www.theaa.com/travelwatch/planner_main.jsp alcoholics anonymous en.wikipedia.org/wiki/Alcoholics_Anonymous Introduce a pseudo (personalized query) american airline www.alcoholics-anonymous.org
Personalization with Backoff (Mei and Church 08) Full personalization: sparse data! 156.111.188.243 156.111.188.* Personalization with backoff: 156.111.*.* 156.*.*.* No personalization: lose the opportunity *.*.*.* • We don’t have enough data for everyone! • - Backoff to classes of users (e.g., IP)
Experiments • Query Suggestion using Query Logs • commercial search engine log (1.5 year) • 637 million queries; 585 million urls • Query-click bipartite graph • Author/keyword suggestion using DBLP • titles and authors from DBLP • 110k of papers, 580k authors • Coauthor graph, keyword graph, author-keyword bipartite graph • Baselines: nearest neighbor; personalized pagerank
Result: Query Suggestion Query = friends
Result: Query Suggestion (II) Query = aa Query = ranknet
Results: Personalized Query Suggestion Query = msr
Result: Author Suggestion Favor students, especially current students Query = Jon Kleinberg (personalized Pagerank is similar) Famous researchers + former students
Result: Keyword Suggestion for Author Query = Michael I. Jordan Query = Jiawei Han
Discussions • Hitting time effectively boosts infrequent queries • Nearest Neighbor & personalized pagerank favorites frequent queries • Fast convergence: a few iterations and a subgraph gets most of the value • No parameter to tune • Can be generalized to many other tasks (on different graphs)
Ranking on Query log Graph and Search Tasks • Query Query: query suggestion • Url Url: finding related pages www.cs.jhu.edu/~brill • "research.microsoft.com/users/brill” • IP IP: finding similar users • Url Query: Annotation, Summarization, ads term • Query Url: Search • IP, Query Url: Personalized Search • IP, Query Query: Personalized Query Suggestion • Many other opportunities!
Summary • Generate query suggestions using hitting time on query-click graph • Personalized query suggestion • Generalizable to other search tasks • Future work: • Different types of graphs: e.g., query sessions • Combine with other features • Large scale evaluation