330 likes | 340 Views
This paper presents a probabilistic approach to query recommendation in systems with smaller user bases and without large query logs. It proposes a document-centric mechanism that suggests phrases present in documents and index phrases from the document corpus to complete the partial user query.
E N D
Papers: • Query Suggestions in the Absence of Query Logs • DQR: A Probabilistic Approach to Diversified Query Recommendation web science Presentation on the topic -Query Recommendation Muhammad Nuruddin ITIS M. Sc. Student Leibniz Universitat Hannover Winter Semester 2012/13 Matrikelnummer: 2961230
Query Suggestion/Recommendation Assist users providing a list of queries have been proven to be effective.
1. Query Suggestions in the Absence of Query Logs Background: • Most of the existing query suggestion works based on query logs. • Log based suggestion suitable for system with large user base, large interactions, past usage • Not suitable for system with smaller user base, system without large log. • Not suitable for newly deployed systems query suggestion. • Example: desktop search, personal email search.
1. Query Suggestions in the Absence of Query Logs How to suggest query where users and query log are insufficient? • They proposed a document centric probabilistic metcanism. • Query phrases present in documentsare suggested. • Index phrases from the document corpus suggested to complete the partial user query.
1. Query Suggestions in the Absence of Query Logs Steps: 1.Phrase Extraction. • N-gram phrases of order 1,2 and 3 from the document corpus. • Ex: “president of Germany”, “president of”, “of Germany”, “president” , “Germany”. 2. Query suggestion • following a probabilistic model
1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 1/9 Probabilistic Model for Query Suggestion Suppose a user typed an incomplete query The query can be decomposed as follows: denotes completed portion of the query denotes the last word of that the user is still typing Example: EinsteinsRel…..
1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 2/9 Probabilistic Model for Query Suggestion Pi = phrase i ( N-gram ) from the Document corpus from step 1 ( Phrase extraction from documents) Using Bayes’ theorem if we calculate ( probability / suitability of Pi as a suggested completion of query for ) Then we will be able to recommend m phrases of P which have higher value of Pi = phrase i ( N-gram ) from the Document corpus from step 1 ( Phrase extraction from documents)
1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 3/9 Probabilistic Model for Query Suggestion They derived the probability equation to: P(pi|Qt) = Probability of phrase pi can be typed that he has already typed Qt ( phrase selection probability ) P(Qc|pi) = Correlation between phrase pi and already typed complete part ( Qc) of query Albert EinsteinsRel….. =
1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 4/9 Probabilistic Model for Query Suggestion Example: P(pi|Qt) = Probability of phrase pi can be typed that he has already typed Qt ( phrase selection probability ) BillGate….. = Qc + Qt Qt = Gate P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …}
1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 5/9 Probabilistic Model for Query Suggestion Example: P(Qc|pi) = Correlation between phrase pi and already typed complete part ( Qc) of query BillGate….. = Qc + Qt Qc = Bill P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …}
1. Query Suggestions in the Absence of Query Logs P(pi|Qt) = Probability of phrase pi can be typed that he has already typed Qt ( phrase selection probability ) C=c1,c2 …. cm the set of m possible words for Qt BillGate….. = Qc + Qt Qt = Gate C = { “Gates” , “Gate”, “Gateway”…} P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …}
BillGate….. = Qc + Qt Qt = Gate C = { “Gates” , “Gate”, “Gateway”… Cm} P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” … Pn }
BillGate….. = Qc + Qt Qt = Gate C = { “Gates” , “Gate”, “Gateway”… Cm} P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” … Pn } P(ci| Qt ) ~ freq( ci ), more used words In the corpus have higher probability to be useful For query recommendation Without IDF some rare but relevant words will be suppressed
1. Query Suggestions in the Absence of Query Logs 2. Query suggestion 9/9 Probabilistic Model for Query Suggestion Example: BillGate….. = Qc + Qt Qc = Bill P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …}
1. Query Suggestions in the Absence of Query Logs Document Corpus P = { “Bill Gates”, “Indian Gate”, “Gateway”, “Bill Gates life”, “Bill Gates Foundation”, “India Gate Rice” …pn} Step1. Phrase Extraction P BillGate….. := Qc = Bill, Qt = Gate Pi Qc, Qt Step2: Probabilistic Ranking.
References [6] H. Bast and I. Weber. The CompleteSearch Engine: Interactive, Efficient, and Towards IR& DB integration. In CIDR’07, pages 88–95, 2007. [7] S. Bhatia and P. Mitra. Adopting inference networks for online thread retrieval. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, pages 1300–1305, Atlanta, Georgia, USA, July 11-15 2010. [8] D. C. Blair and M. E. Maron. An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM, 28(3):289–299, 1985. [9] P. Boldi, F. Bonchi, C. Castillo, D. Donato, and S. Vigna. Query suggestions using query-flow graphs. In WSCD ’09: Proceedings of the 2009 workshop on Web Search Click Data, pages 56–63, 2009. [10] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by mining click-through and session data. In KDD’08, pages 875–883, 2008. [1] Solr–Enterprise Search Platform, http://lucene.apache.org/solr/. [2] R. Baeza-Yates, C. Hurtado, and M. Mendoza. Query Recommendation Using Query Logs in Search Engines, volume 3268/2004 of Lecture Notes in Computer Science, pages 588–596. Springer Berlin / Heidelberg, November 2004. [3] R. Baraglia, C. Castillo, D. Donato, F. M. Nardini, R. Perego, and F. Silvestri. Aging effects on query flow graphs for query suggestion. In CIKM ’09: Proceeding of the 18th ACM conference on Information and knowledge management, pages 1947–1950, 2009. [4] M. Barouni-Ebarhimi and A. A. Ghorbani. A novel approach for frequent phrase mining in web search engine query streams. In CNSR ’07: Proceedings of the Fifth Annual Conference on Communication Networks and Services Research, pages 125–132, Washington, DC, USA, 2007. IEEE Computer Society. [5] H. Bast and I. Weber. Type less, find more: Fast autocompletion search with a succinct index. In SIGIR’06, pages 364–371, 2006.
End of Discussion on 1. Query Suggestions in the Absence of Query Logs
2. DQR: A Probabilistic Approach to Diversified Query Recommendation
2. DQR: A Probabilistic Approach to Diversified Query Recommendation • In this paper they proposed a query recommendation methodology for log based system • Two components of their proposed system 1. Query concept building (Concept Mining) - Clustering the search logs. 2. Recommending query from the concepts. - Probabilistic model to select top m query concepts and selecting representative query of each concept.
2. DQR: A Probabilistic Approach to Diversified Query Recommendation A good quality recommender system should have 5 property: 1. Relevancy: Recommended queries should be semantically relevantto the user search query. 2. Redundancy Free: The recommendation should not contain redundant queries that repeat similar search intents. 3. Diversity: The recommendation should cover search intents of different interpretations of the keywords given in the input query. 4. Ranking: Highly relevant queries should be ranked first ahead of less relevant ones in the recommendation list. 5. Efficiency: Query recommendation provides online helps. Therefore, recommendation algorithms should achieve fast response times They claimed that DQR is the first system to address all the 5 requirements
2. DQR: A Probabilistic Approach to Diversified Query Recommendation A click-through bipartite graph
2. DQR: A Probabilistic Approach to Diversified Query Recommendation • Two components of their proposed system 1. Query concept building (Concept Mining) - Clustering the search logs. 2. Recommending query from the concepts. - Probabilistic model to select top m query concepts and selecting representative query of each concept.
2. DQR: A Probabilistic Approach to Diversified Query Recommendation • number of queries in Q is huge • 10 million queries in the AOL dataset • Even picking, say, m = 10 recommended queries from Q involves a huge search space.
2. DQR: A Probabilistic Approach to Diversified Query Recommendation 1. Query concept building (Concept Mining) 1/3 1.1) Concept Mining: - Similar queries are grouped to form query concept. - For this grouping each query is represented by a |D|- dimentional vector - User-frequency-inverse-query-frequency(UF-IQF) scores qi for dimensions dj UF IQF Nu(qi,dj) = No. of Unique users issued qiand clicking URL dj Nq(dj) = No. of queries that lead to clicking URL dj Normalized weight Similarity of query qi and qj
2. DQR: A Probabilistic Approach to Diversified Query Recommendation 1. Query concept building (Concept Mining) 2/3 1.1) Concept Mining: - K means clustering is not suitable, algorithm did not terminated for two days. - Instead a one pass algorithm is porposed - very efficient but highly sensitive to order Example: Compactness: Average pairwise Distance in a cluster < 0.5 q1,q2,q3 : C1 = {{q1,q2},{q3}} ; q2,q3,q1 : C1 = {{q1},{q2,q3}}
2. DQR: A Probabilistic Approach to Diversified Query Recommendation 1. Query concept building (Concept Mining) 3/3 1.1) Concept Mining: Diameter measuer L(c) of cluster C
2. DQR: A Probabilistic Approach to Diversified Query Recommendation 2. Recommending query from the concepts. • Probabilistic model to select top m query concepts and selecting representative query of each concept. • A heuristic algorithm is applied to find a set of m query concepts such that is maximum. To construct Yc incrementally they applied greedy strategy. - In the greedy approach, they added one more concept at a time until m. At each step it picks the concept to maximize the probability increment: where is input query, query concept belongs and the set of m query concepts
2. DQR: A Probabilistic Approach to Diversified Query Recommendation 2. Recommending query from the concepts. • Probabilistic model to select top m query concepts and selecting representative query of each concept.
2. DQR: A Probabilistic Approach to Diversified Query Recommendation Selecting representative query of each concept. • By popularity vote from the log • For concept C, its representative query is the one that is issued by large no. of distinct user among all the queries in C
2. DQR: A Probabilistic Approach to Diversified Query Recommendation Result comparison from different approaches: Top 10 queries recommended by the 6 methods for the input query “yahoo” SR= Similarity based ranking. Finding similar query in past in log, ignores redundancy MMR = Maximal Marginal Relevance, Considers relevancy & diversity*, ignores redundancy CACB = Context-Aware Concept-Based Method. Based on search session, builds query concepts. Ignores diversity* DQR-ND = DQR with no Diversity. Same to DQR, ignores diversity. DQR-OPC = DQR with One Pass Clustering, Same to DQR, but uses only one pass for clustering DQR = Diversified Query Recommendation *Diversity: The recommendation should cover search intents of different interpretations of the keywords given in the input query.
Rererences [10] H. Deng, I. King, and M. R. Lyu. Entropy-biased models for query representation on the click graph. In SIGIR, 2009. [11] B. M. Fonseca, P. B. Golgher, B. Pôssas, B. A. Ribeiro-Neto, and N. Ziviani. Concept-based interactive query expansion. In CIKM, 2005. [12] J. Guo, X. Cheng, G. Xu, and H. Shen. A structured approach to query recommendation with social annotation data. In CIKM, 2010. [13] J. Guo, X. Cheng, G. Xu, and X. Zhu. Intent-aware query similarity. In CIKM, 2011. [14] K. Järvelin and J. Kekäläinen. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst., 20(4), 2002. [15] H. Ma, M. R. Lyu, and I. King. Diversifying query suggestion results. In AAAI, 2010. [16] Q. Mei, D. Zhou, and K. W. Church. Query suggestion using hitting time. In CIKM, 2008. Torgeson. A picture of search. In Infoscale, 2006. [18] M. Sanderson. Ambiguous queries: test collections need more sense. In SIGIR, 2008. [19] E. M. Voorhees. The TREC-8 question answering rack report. In TREC, 1999. [20] X. Wang and C. Zhai. Learn from web search logs to organize search results. In SIGIR, 2007. [21] J.-R. Wen, J.-Y. Nie, and H. Zhang. Clustering user queries of a search engine. In WWW, 2001. [1] http://www.cs.hku.hk/research/techreps/document/TR-2012-06.pdf. [2] R. A. Baeza-Yates, C. A. Hurtado, and M. Mendoza. Query recommendation using query logs in search engines. In EDBT Workshops, 2004. [3] R. Baraglia, C. Castillo, D. Donato, F. M. Nardini, R. Perego, and F. Silvestri. Aging effects on query flow graphs for query suggestion. In CIKM, 2009. [4] D. Beeferman and A. L. Berger. Agglomerative clustering of a search engine query log. In KDD, 2000. [5] C. J. C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. N. Hullender. Learning to rank using gradient descent. In ICML, 2005. [6] H. Cao, D. Jiang, J. Pei, Q. He, Z. Liao, E. Chen, and H. Li. Context-aware query suggestion by mining click-through and session data. In KDD, 2008. [7] J. G. Carbonell and J. Goldstein. The use of mmr, diversity-based reranking for reordering documents and producing summaries. In SIGIR, 1998. [8] P.-A. Chirita, C. S. Firan, and W. Nejdl. Personalized queryexpansion for the web. In SIGIR, 2007.[ 9] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. Harshman. Indexing by latent semantic analysis. JASIS, 41(6), 1990.
End of the Presentation Thank you very much for your attention!