1 / 13

Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR

Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR. Ray R Larson School of Information University of California, Berkeley. Motivation.

qamra
Download Presentation

Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cheshire at GeoCLEF 2008: Text and Fusion Approaches for GIR Ray R Larson School of Information University of California, Berkeley

  2. Motivation • In previous GeoCLEF evaluations we found very mixed results in using various methods of query expansion, attempts at explicit geographic constraints, etc. • Last year we decided to try just our “basic” retrieval method • I.e., Logistic regression with blind feedback • The goal was to establish baseline data that we can use to test selective additions in later experiments GeoCLEF 2008 -- Aarhus

  3. Motivation • Because the “baselines” worked well last year, we decided to continue with them and begin testing “fusion” approaches for combining the results of different retrieval algorithms • This was due in part to Neuchatel’s use of fusion approaches with good results and our previous use of fusion approaches in earlier CLEF tasks GeoCLEF 2008 -- Aarhus

  4. Experiments • TD, TDN, and TDN Fusion for Monolingual English, German, Portuguese (9 runs) • TD, TDN, and TDN Fusion for Bilingual X to English, German, and Portuguese (18 runs) GeoCLEF 2008 -- Aarhus

  5. Monolingual GeoCLEF 2008 -- Aarhus

  6. Monolingual GeoCLEF 2008 -- Aarhus

  7. Bilingual GeoCLEF 2008 -- Aarhus

  8. TDN Fusion B: TDN OKAPI BM-25 Result A: TD Logistic Regression with Blind Feedback Result NewWt= (B*piv) + (A*(1-piv)) (piv = 0.29) A and B Normalized using MinMax to [0:1] Final Result GeoCLEF 2008 -- Aarhus

  9. Results • Fusion of Logistic regression with blind feedback and Okapi BM-25 resulted in most of our best performing runs • Not always dramatic improvement • With a single algorithm use of the Narrative is counter-productive. Using Title and Description provides better results with these algorithms • Does blind feedback accomplish some of the geographic expansion explicit in the narrative? GeoCLEF 2008 -- Aarhus

  10. Comparison of Berkeley Results 2006, 2007-2008 GeoCLEF 2008 -- Aarhus *using fusion

  11. What happened in 2007 German? • We speculated last year that it was • No decompounding • 2006 used Aitao Chen’s decompounding (no) • Worse translation? • Possibly - different MT systems were used • But same for 2007 and 2008, so no • Incomplete stoplist? • Was it really the same? (yes) • Was stemming the same? (yes) GeoCLEF 2008 -- Aarhus

  12. Why did German work better for us in 2008? • That was all speculation, but… • It REALLY helps if you include the entire database • Our 2007 German runs did not include any documents from the SDA collection! GeoCLEF 2008 -- Aarhus

  13. What Next? • Finally start adding back true geographic processing and test where and why (and if) results are improved • Get decompounding working with German GeoCLEF 2008 -- Aarhus

More Related