Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR

Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR Ray R Larson School of Information University of California, Berkeley

Motivation • In previous GeoCLEF evaluations we found very mixed results in using various methods of query expansion, attempts at explicit geographic constraints, etc. • Last year we decided to try just our “basic” retrieval method • I.e., Logistic regression with blind feedback • The goal was to establish baseline data that we can use to test selective additions in later experiments GeoCLEF 2008 -- Aarhus

Motivation • Because the “baselines” worked well last year, we decided to continue with them and begin testing “fusion” approaches for combining the results of different retrieval algorithms • This was due in part to Neuchatel’s use of fusion approaches with good results and our previous use of fusion approaches in earlier CLEF tasks GeoCLEF 2008 -- Aarhus

Experiments • TD, TDN, and TDN Fusion for Monolingual English, German, Portuguese (9 runs) • TD, TDN, and TDN Fusion for Bilingual X to English, German, and Portuguese (18 runs) GeoCLEF 2008 -- Aarhus

Monolingual GeoCLEF 2008 -- Aarhus

Bilingual GeoCLEF 2008 -- Aarhus

TDN Fusion B: TDN OKAPI BM-25 Result A: TD Logistic Regression with Blind Feedback Result NewWt= (B*piv) + (A*(1-piv)) (piv = 0.29) A and B Normalized using MinMax to [0:1] Final Result GeoCLEF 2008 -- Aarhus

Results • Fusion of Logistic regression with blind feedback and Okapi BM-25 resulted in most of our best performing runs • Not always dramatic improvement • With a single algorithm use of the Narrative is counter-productive. Using Title and Description provides better results with these algorithms • Does blind feedback accomplish some of the geographic expansion explicit in the narrative? GeoCLEF 2008 -- Aarhus

Comparison of Berkeley Results 2006, 2007-2008 GeoCLEF 2008 -- Aarhus *using fusion

What happened in 2007 German? • We speculated last year that it was • No decompounding • 2006 used Aitao Chen’s decompounding (no) • Worse translation? • Possibly - different MT systems were used • But same for 2007 and 2008, so no • Incomplete stoplist? • Was it really the same? (yes) • Was stemming the same? (yes) GeoCLEF 2008 -- Aarhus

Why did German work better for us in 2008? • That was all speculation, but… • It REALLY helps if you include the entire database • Our 2007 German runs did not include any documents from the SDA collection! GeoCLEF 2008 -- Aarhus

What Next? • Finally start adding back true geographic processing and test where and why (and if) results are improved • Get decompounding working with German GeoCLEF 2008 -- Aarhus

Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR