250 likes | 430 Views
Cross-Lingual Query Suggestion Using Query Logs of Different Languages. SIGIR 07. Abstract. Query suggestion To suggest relevant queries for a given query To help users better specify their information needs Cross-Lingual Query Suggestion (CLQS):
E N D
Cross-LingualQuery Suggestion Using Query Logs of Different Languages SIGIR 07
Abstract • Query suggestion • To suggest relevant queries for a given query • To help users better specify their information needs • Cross-Lingual Query Suggestion (CLQS): • For a query in one language, we suggest similar or relevant queries in other languages. • cross-lingual keyword bidding (Search Engine) • cross-language information retrieval (CLIR)
Introduction • CLQS vs. Cross-Lingual Query Expansion • Full queries formulated by users in another language. • The users of search engines • similar interests in the same period of time • queries on similar topics in different languages • Key point • How to learn a similarity measure between two queries • MLQS: Term Co-Occurrence based MI and c2
Estimating Cross-Lingual Query similarity • Discriminative Model for Estimating Cross-Lingual Query Similarity • Monolingual Query Similarity Measure Based on Click-through Information • Features Used for Learning Cross-Lingual Query Similarity Measure • Bilingual Dictionary • Parallel Corpora • Online Mining for Related Queries • Monolingual Query Suggestion • Estimating Cross-lingual Query Similarity
Discriminative Model for Estimating Cross-Lingual Query Similarity – 1/2 • qf : a source language query • qe : a target language query • simML : Monolingual query similarity • simCL : Cross-lingual query similarity • Tqf : translation of qf in the target language
Discriminative Model for Estimating Cross-Lingual Query Similarity – 2/2 • Learning: LIBSVM regression algorithm • f : feature functions • f : mapping feature space onto kernel space • w : weight vector in the kernel space • relevant vs. irrelevant • strongly relevant, weakly relevant or irrelevant
Estimating Cross-Lingual Query similarity • Discriminative Model for Estimating Cross-Lingual Query Similarity • Monolingual Query Similarity Measure Based on Click-through Information • Features Used for Learning Cross-Lingual Query Similarity Measure • Bilingual Dictionary • Parallel Corpora • Online Mining for Related Queries • Monolingual Query Suggestion • Estimating Cross-lingual Query Similarity
Monolingual Query Similarity Measure Based on Click-through Information • click-through information in query logs [26] • KN(x) : number of keyword in a query x • RD(x): number of clicked URLs for a query x • a = 0.4 , b =0.6
Estimating Cross-Lingual Query similarity • Discriminative Model for Estimating Cross-Lingual Query Similarity • Monolingual Query Similarity Measure Based on Click-through Information • Features Used for Learning Cross-Lingual Query Similarity Measure • Bilingual Dictionary • Parallel Corpora • Online Mining for Related Queries • Monolingual Query Suggestion • Estimating Cross-lingual Query Similarity
1. Bilingual Dictionary – 1/2 • 120,000 unique entries (built-in-house) • Given an input query qf={wf1,wf2,…,wfn} (in source language) • By bilingual dictionary D: D(wfi)={ti1,ti2,…,tim} • C(x,y) is the number of queries in the log containing both x and y. • C(x) is the number of queries in the log containing x. • N is the total number of queries in the log
1.Bilingual Dictionary – 2/2 • The set of top-4 query translations is denoted as S(Tqf) • T S(Tqf) • Retrieve all queries containing T in target language and assign Sdict(T) as their value
2. Parallel Corpora • Given a pair of queries • qf : in the source language • qe : in the target language • Bi-Directional Translation Score : • IBM model 1 & GIZA++ tool • P(yj|xi) is the word to word translation probability • Top 10 queries {qe} with qf from the query log
3. Online Mining for Related Queries – 1/3 • OOV is a major knowledge bottleneck for query translation and CLIR • Assumption : • A query in the target co-occurs with the source query in many web pages • They are probably semantically related • but, amount of noise
3. Online Mining for Related Queries – 2/3 • Frequency in the Snippets • For example: • Given a query q=abc in source language • By dictionary : a={a1,a2,a3}, b={b1,b2} and c={c1} • Web query : q ^ (a1 v a2 v a3) ^ (b1v b2) ^ (c1) in target language • 700 snippets , most frequent 10 target queries
3. Online Mining for Related Queries – 3/3 • Any query qe mined from the web will be associated with a feature CODC Measure with SCODC(qf,qe)
4. Monolingual Query Suggestion • Q0 : candidate queries (in target language) • For each target query qe, • SQML(qe) : monolingual source query
Estimating Cross-Lingual Query similarity • Discriminative Model for Estimating Cross-Lingual Query Similarity • Monolingual Query Similarity Measure Based on Click-through Information • Features Used for Learning Cross-Lingual Query Similarity Measure • Bilingual Dictionary • Parallel Corpora • Online Mining for Related Queries • Monolingual Query Suggestion • Estimating Cross-lingual Query Similarity
Estimating Cross-lingual Query Similarity • Four categories of features are used to learn the cross-lingual query similarity. • cross-lingual query similarity score • Learning: LIBSVM regression algorithm • f : feature functions • f : mapping feature space onto kernel space • w : weight vector in the kernel space
Performance Evaluation – Log Data • Data Resources : • MSN Search Engine • French (source language) vs. English ( target language) • A one-month English query log • 7 million unique English queries • Occurrence frequency more than 5 • 5,000 French queries • 4,171 queries have their translations in the English queries • 70% training weight of LIBSVM • 10% development data • 20% testing
Source Language Target Language CLIR qf CLQS {qe} BM25 Performance Evaluation - CLIR • Data Resources : • TREC6 CLIR data (AP88-90 newswire, 750MB) • 25 short French-English queries Pairs (CL1-CL25) • average long 3.3 • match in the web query logs for training CLQS
Conclusion • Cross-lingual query suggestion • Query Logs • French to English • TREC6 French to English CLIR task • CLQO demonstrates the high quality