1.02k likes | 1.23k Views
Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center, Leibniz Universität Hannover. Introduction. ?. Short (1-2 words). Ambiguous (e.g., Java). Lack of domain knowledge. original query. Query Recommendation.
E N D
Query Recommendation Xiaofei Zhu (zhu@l3s.de) L3S Research Center, Leibniz Universität Hannover
Introduction ? Short (1-2 words) Ambiguous (e.g., Java) Lack of domain knowledge
original query Query Recommendation • It aims to provide users alternative queries, which can represent their information needs more clearly in order to return better search results . recommendation
Query Recommendation • How to do query recommendation? • Find alternative queries with similar search intent. • Differ with Document , Image?
Query log • Query log. • A query log records information about the search actions of the users of a search engine. • A typical query log is a set of records <qi,ui,ti,Vi,Ci> • qi – the submitted query • ui– an anonymized identifier for the user who submitted the query • ti– timestamp, the time at which the query was submitted for search. • Vi – the set of returned results to the query • Ci - the set of documents clicked by the user.
Example of query log (AOL, 2006) AnonID Query QueryTimeItemRankClickURL 7051923 motorola text messages 2006-03-24 19:35:31 1 http://www.telusmobility.com 7051923 motorola text messages 2006-03-24 19:35:31 4 http://support.t-mobile.com 7051923 motorola t730 text messages 2006-03-24 19:38:40 2 http://www.phonescoop.com 7051923 motorola t730 text messages 2006-03-24 19:38:40 3 http://www.1800mobiles.com 7051923 motorola t730 text messages 2006-03-24 19:38:40 5 http://cgi.ebay.com 7051923 motorola t730 text messages 2006-03-24 19:38:40 7 http://phonearena.com 7051923 spike muscle car 2006-03-25 12:57:43 2 http://www.classicauto-sales.com 7051923 spike muscle car 2006-03-25 12:57:43 5 http://sev.prnewswire.com 7051923 spike muscle car 2006-03-25 13:00:22 7051923 usps 2006-03-25 14:23:21 1 http://www.usps.com 7051923 vc2 auctions 2006-03-25 14:31:41 7051923 auctions for 1 2006-03-25 14:33:47
Microsoft 2006 RFP dataset Time Query QueryIDSessionIDResultCount 2006-05-01 00:00:01 defination Gravitational 46c13f0705f6436b 19ab975e898d46d1 11 2006-05-01 00:00:01 kimclement a3d2cae45e2b4c5b 1b748d1afa9b4828 10 2006-05-01 00:00:01 scientology crazy beliefs 418324ef33d14ed2 10f477402db84c9a 10 2006-05-01 00:00:01 www.joj.sk 489238bdf8834d68 16271eb6bf174c5c 9 2006-05-01 00:00:04 www.selectcareers.com f92efd8044904ac4 193f9f8442d44c48 0 2006-05-01 00:00:08 What is May Day? 37afe7af832649d2 21f6a0dfea4348ac 14 2006-05-01 00:00:10 vikings draft choices suck b0519e4528d84b44 196b0bb2f1d643f2 10 2006-05-01 00:00:10 wwwcrownawards.com 9eda4716dfb045e2 04e3a26067a84748 0 2006-05-01 00:00:15 Australian miners ba6d190cc4cd4fd3 136fd5e571d24886 10 QueryID Query Time URL Position 0000003a718649f2 schwab 2006-05-11 08:07:35 http://www.schwab.com/ 1 0000006d43b549c1 us geography 2006-05-04 14:23:00 http://www.enchantedlearning.com/usa/ 3 0000006d43b549c1 us geography 2006-05-04 14:23:03 http://www.sheppardsoftware.comState15s_500.html 4 0000016aa52e4fbc wwf 2006-05-21 09:25:34 http://www.panda.org/ 2 000002aa6e27443f biggercity 2006-05-07 13:30:45 http://www.biggercity.com/chat/ 1 1000005aac1f6423f studios 2006-05-09 14:21:29 http://www.shawneestudios.com/contact_us.php 1 1000008d8afaa459a www.nfl.com 2006-05-28 18:22:39 http://www.nfl.com/teams/NYJ.html 7 7000009c2848e4a68 north hills school district 2006-05-04 12:29:12 http://www.nhsd.net/ 1
How to use query log for query recommendation? Click-through data If user clicks a document after she issues a query, then the clicked document is more or less relevant to the submitted query, thus the query can be represented by it clicked documents. • Click-through data records the clicked documents after user submit a query to the search engine. Basic Assumption [Mei, CIKM’08] [Beeferman, KDD’00] Query Feature Representation If two queries co-clicked many common documents, then they have similar search intent. Query-URL Graph
How to use query log for query recommendation? Query Session If two queries frequently co-occur in the same sessions, then they are relevant to each other. • Query session: a single user submits a sequence of related queries in a time interval for a specific search task. [Foneseca, LA-WEB’03] Basic Assumption [Zhang, WWW’06] [Boldi, CIKM’08, WSCD’09] Association Rules Continuous submitted queries in short time interval by the same user share similar search intent. Query Graph
High Relevant Query Recommendation • Query Suggestion Using Hitting Time (CIKM’08) • Click-through Data • Query-URL Bipartite Graph • Query Suggestions Using Query-Flow Graphs (WSCD’09) • Session Data • Query-Flow Graph
High Relevant Query Recommendation • Query Suggestion Using Hitting Time (CIKM’08) • Click-through Data • Query-URL Bipartite Graph • Query Suggestions Using Query-Flow Graphs (WSCD’09) • Session Data • Query-Flow Graph
Query Suggestion Using Hitting Time (CIKM’08) 5 • Query-URL Bipartite Graph • Edges between V1 and V2 • No edge inside V1 or V2 • Edges are weighted • e.g., V1 = query; V2 = Url • Transition Probabilities A 4 V1 4 V2 7 7 1 i 3 j
2 5 1 3 4 Query Suggestion Using Hitting Time (CIKM’08) • Random Walk and Hitting Time • Hitting time. How long does it take to hit node a in a random walk starting at node b ? • Start at 1
2 5 1 3 4 Query Suggestion Using Hitting Time (CIKM’08) • Random Walk and Hitting Time • Hitting time. How long does it take to hit node a in a random walk starting at node b ? • Start at 1 • Pick a neighbor i based on the transition probability. • Move to i t=1
2 5 1 3 4 Query Suggestion Using Hitting Time (CIKM’08) • Random Walk and Hitting Time • Hitting time. How long does it take to hit node a in a random walk starting at node b ? • Start at 1 • Pick a neighbor i uniformly at random • Move to i • Continue t=2
2 5 1 3 4 Query Suggestion Using Hitting Time (CIKM’08) • Random Walk and Hitting Time • Hitting time. How long does it take to hit node a in a random walk starting at node b ? • Start at 1 • Pick a neighbor i uniformly at random • Move to i • Continue If the random walk hits a node quickly, then its close to the start node! Hitting time! t=2
Hitting time from ito A Graph G i A
Hitting time from ito A Graph G j i A k
Hitting time from ito A Graph G j i A k
Generate Query Suggestion • Construct a (kNN) subgraph from the query log data (of a predefined number of queries/urls) • Compute transition probabilities p(i j) • Compute hitting time hiA • Rank candidate queries using hiA Query Url 300 T www.aa.com aa 15 www.theaa.com/travelwatch/planner_main.jsp mexiana american airline en.wikipedia.org/wiki/Mexicana
Result: Query Suggestion Query = ‘aa’
High Relevant Query Recommendation • Query Suggestion Using Hitting Time (CIKM’08) • Click-through Data • Query-URL Bipartite Graph • Query Suggestions Using Query-Flow Graphs (WSCD’09) • Session Data • Query-Flow Graph
Query Suggestions Using Query-Flow Graphs (WSCD’09) • Session Data • Definition: the sequence of queries of one particular user within a specific time limit .
Query Graph two consecutive queries queries that are not neighbors in the same session • This model works by accumulating many query sessions and adding up the similarity values for many same query pairs Z. Zhang and O. Nasraoui. Mining search engine query logs for query recommendation. In WWW, pages 1039–1040, 2006.
Query-Flow Graph P. Boldi, F. Bonchi, C. Castillo, D. Donato, A. Gionis, S. Vigna: “The query-flow graph: model and applications”. CIKM 2008.
Build Query-flow Graph • The key aspect of the construction of the query-flow graph is to define the weighting function w. represent the number of times the transition was observed in the same search session.
Query Recommendation • The query recommendation methods are based on the probability of being at a certain node after performing a random walk over a query graph. • Random Walk with restart • a random surfer starts at the initial query q • at each step • α , follows one of the outlinks from the current node • 1 - α , jumps back to q
Query Recommendation • The query recommendation methods are based on the probability of being at a certain node after performing a random walk over a query graph. • Random Walk with restart M - the transition matrix of a Markov chain P - row-normalized weight matrix of the query flow graph ej - the vector j-th entry is 1,others are zeroes
Random walks • Random walks on graphs correspond to Markov Chains • The set of states S is the set of nodes of the graph G • The transition probability matrix is the probability that we follow an edge from one node to another
1 1 1 1/2 1 1 1 1/2 Definitions Adjacency matrix A Transition matrix P
1 1/2 1 1/2 random walk t=0
1 1 1/2 1/2 1 1 1/2 1/2 random walk t=0 t=1
1 1 1 1/2 1/2 1/2 1 1 1 1/2 1/2 1/2 random walk t=0 t=1 t=2
1 1 1 1 1/2 1/2 1/2 1/2 1 1 1 1 1/2 1/2 1/2 1/2 random walk t=0 t=1 t=2 t=3
Probability Distributions xt(i) = probability that the surfer is on node i at time t xt+1(i) = ∑j(Probability of being at node j)*Pr(j->i) =∑jxt(j)*P(j,i) xt+1 = xtP= xt-1*P*P= xt-2*P*P*P = …=x0 Pt What happens when the surfer keeps walking for a long time?
What happens when the surfer keeps walking for a long time? • Stationary Distribution • Intuitively • the stationary distribution at a node is related to the amount of time a random walker spends visiting that node. • Mathematically • Remember that we can write the probability distribution at a node as xt+1 = xtP. • For the stationary distribution v0 we have v0 = v0 P v0 is the left eigenvector of the transition matrix P !
Interesting questions • Does a stationary distribution always exist? Is it unique? • Yes, if the graph is “well-behaved”, i.e., P is ergodic • P is ergodic if : • irreducible • aperiodic Irreducible: There is a path from every node to every other node. Aperiodic: State i is periodic with period k if all paths from i to i have length that is multiple of k. Otherwise, it’s aperiodic. Irreducible Not irreducible Aperiodic Periodicity is 3
If a markov chain P is irreducible and aperiodic then the largest eigenvalue of the transition matrix will be equal to 1 and all the other eigenvalues will be strictly less than 1. • Let the eigenvalues of P be {σi| i=0:n-1} in non-increasing order of σi . • σ0 = 1 >σ1 > σ2 >=……>= σn
Why Diversity Query Recommendation 相关性 • Actually, in query recommendation, only providing the “relevant” recommendations is far away from satisfying users’ information needs. apple ipad 3 apple tree apple iphone 4s apple seed apple computer Original Query:Apple The queries we recommend should cover multiple potential search intents of users and minimize the risk that users will not be satisfied. ⁞
High Diversity Query Recommendation • Diversifying Query Suggestion Results [Hao Ma, AAAI’10] • Query-URL graph • Hitting time • A Unified Framework for Recommending Diverse and Relevant Queries[Xiaofei Zhu, WWW’11] • Manifold • Manifold Ranking with Stop Points
High Diversity Query Recommendation • Diversifying Query Suggestion Results [H. Ma, AAAI’10] • Query-URL graph • Hitting time • A Unified Framework for Recommending Diverse and Relevant Queries[X.F. Zhu, WWW’11] • Manifold • Manifold Ranking with Stop Points
Graph Construction Figure 1: Example for Bipartite Graph (extracted from the clickthrough data)
Determining the First Suggested Query • Initial Transition Probability -- the number of click frequency between node i and node j -- normalization term, is the total number of times that the query node i has been issued in the dataset. -- initial transition probability from node i to node j
Determining the First Suggested Query • Random Jump • In addition to the transition probability, there are random relations among different queries. • It adds a uniform random relation among different queries -- the probability of taking a “random jump”, i.e., transit among different queries -- Without any prior knowledge, it sets , where d is a uniform stochastic distribution vector
Determining the First Suggested Query • Random Walk on the Query-URL graph • With the transition probabilistic matrix P defined, it then can perform the random walk on the query-URL graph. • the probability of transition from node i to node j after a t step random walk as: Explain: 1) The random walk sums the probabilities of all paths of length t between the two nodes. if there are many paths the transition probability will be high 2) The larger the transition probability Pt(i, j) is, the more the node j is similar to the node i.
Determining the First Suggested Query • the largest transition probability from node q will be recommendedas the first suggested query • performing a t-step random walk • parameter t • determines the resolution of the Markov random walk • Large t: the random walk depend more on the graph structure • Small t: preserves information about the starting node
Ranking the Rest Queries • Employ the hitting time to rank and diversify the rest of the queries. • Hitting time • Let S be a subset of vertex set V, the expected hitting time h(i|S) of the random walk is the expected number of steps before node i is visiting the starting set S. N(i) denotes the neighbors of node i
Ranking the Rest Queries • Property • those nodes strongly connected to s1 will have many fewer visits by the random walk • nodes far away from s1 still allow the random walk to move among them and thus receive more visits • The second suggestion node • select the second suggestion node s2 ∈ Q with the largest expected hitting time to the subset S containing two nodes q and s1.