190 likes | 447 Views
CLEF 2005: Multilingual Retrieval by Combining Multiple Multilingual Ranked Lists. Luo Si and Jamie Callan Language Technology Institute, School of Computer Science Carnegie Mellon University CLEF 2005. Task Definition. Multi-8 Two Years On: multilingual information retrieval
E N D
CLEF 2005: Multilingual Retrieval by Combining MultipleMultilingual Ranked Lists Luo Si and Jamie CallanLanguage Technology Institute, School of Computer ScienceCarnegie Mellon University CLEF 2005
Task Definition • Multi-8 Two Years On: • multilingual information retrieval • Multi-8 Merging Only: • participants merge provided bilingual ranked lists
Task 1:Multilingual Retrieval System • Method Overview
Task 1:Multilingual Retrieval System • Text Preprocessing • Stop words • Stemming • Decompounding • Word translation
Task 1:Multilingual Retrieval System • Method Overview
Task 1: Multilingual Retrieval System • Method 1: • Multilingual Retrieval via Query Translation no query feedback Raw score merge and Okapi system • Method 2: • Multilingual Retrieval via Query Translation with query feedback Raw score merge and Okapi system • Method 3: • Multilingual Retrieval via Document Translation no query feedback Raw score merge and Okapi system • Method 4: • Multilingual Retrieval via Document Translation with query feedback Raw score merge and Okapi system • Method 5: UniNe System
Task 1:Multilingual Retrieval System • Method Overview
Task 1:Multilingual Retrieval System • Normalization • drsk_mj denote the raw document score for the jth document retrieved from the mth ranked list for kth query,
Task 1:Multilingual Retrieval System • Method Overview
Task 1:Multilingual Retrieval System • Combine Multilingual Ranked Lists • (wm , rm) represents the weight of the vote and the exponential normalization factor for the mth ranked list
Task 1:Experimental Results: Multilingual Retrieval • Qry/Doc: what was translated • fb/nofb: with/without pseudo relevance back • UniNe: UniNE system
Task 1:Experimental Results: Multilingual Retrieval • MX: Combine models • W1/Trn: Equal or learned weights
Task 2:Results Merge for Multilingual Retrieval • merge ranked lists of eight different languages (i.e., bilingual or monolingual) into a single final list • Logistic model (rank , doc score) language-specific methods query-specific & language-specific
Task 2:Results Merge for Multilingual Retrieval • Learn Query-Independent and Language-Specific Merging Model • estimated probability of relevance of document dk_ij • Model parameter • maximizing the log-likelihood (MLE) • maximizing MAP
Task 2:Results Merge for Multilingual Retrieval • Learn Query-Specific and Language-Specific Merging Model • Calculate comparable scores for top ranked documents in each language (1) Combine scores of query-based and doc-based translation methods (2) Build language-specific query-specific logistic models to transform language-specific scores to comparable scores
Task 2:Results Merge for Multilingual Retrieval (2) Build language-specific query-specific logistic models to transform language-specific scores to comparable scores • logistic model parameter estimate • minimize the mean squared error between exact normalized comparable scores and the estimated comparable scores • Estimate comparable scores for all retrieved documents in each language • Use comparable scores to create a merged multilingual result list
Task 2:Experimental Results: Results Merge • Query-independent , language-specific • Mean average precision of merged multilingual lists of different methods on UniNE result lists • Mean average precision of merged multilingual lists of different methods on HummingBird result lists MAP is more accurate
Task 2:Experimental Results: Results Merge • Query-specific , language-specific • Mean average precision of merged multilingual lists of different methods on UniNE result lists • C_X: top X docs from each list merged by exact comparable scores. • Top_X_0.5: top X docs from each list downloaded for logistic model to estimate comparable scores and combine them with exact scores by equal weight This means that the combination of estimated comparable scores and exact comparable scores can be more accurate than exact comparable scores in some cases
Task 2:Experimental Results: Results Merge • Query-specific , language-specific • Mean average precision of merged multilingual lists of different methods on HummingBird result lists • Outperform query-independent and language-specific algorithm