100 likes | 201 Views
Disambiguating Queries for Geographic Information Retrieval. Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros. Information Retrieval (IR). What are the goals of an IR system? What is a relevant document?
E N D
Disambiguating Queries for Geographic Information Retrieval Carolyn Hafernik Thesis Proposal May 10, 2006 Computer Science Advisor: Lisa Ballesteros
Information Retrieval (IR) • What are the goals of an IR system? • What is a relevant document? • How does one determine which documents are relevant? • How are IR systems evaluated?
Geographic Information Retrieval (GIR) • GIR is an extension of IR • It aims to use geospatial information to help improve retrieval effectiveness • What makes GIR challenging? • Poor query specification • Ambiguity of language • No central repository for geospatial information
Map from www.lib.utexas.edu/maps/usmet.html Locations Population statistics Name variations Nearby landmarks How can geospatial information be used to increase retrieval effectiveness given a query? Example query: “Hiking near the Bay Area” Geospatial Information
<top> <num> GC001 </num> <orignum> C084 </orignum> <EN-title> Shark Attacks off Australia and California </EN-title> <EN-desc> Documents will report any information relating to shark attacks on humans. </EN-desc> <EN-narr> Identify instances where a human was attacked by a shark, including where the attack took place and the circumstances surrounding the attack. Only documents concerning specific attacks are relevant; unconfirmed shark attacks or suspected bites are not relevant. </EN-narr> <!-- NOTE: This topic has added tags for GeoCLEF --> <EN-concept> Shark Attacks </EN-concept> <EN-spatialrelation> near </EN-spatialrelation> <EN-location> Australia </EN-location> <EN-location> California </EN-location> </top> <top> <num> GC004 </num> <orignum> C126 </orignum>- <EN-title> Actions against the fur industry in Europe and the U.S.A. </EN-title> <EN-desc> Find information on protests or violent acts against the fur industry. </EN-desc> <EN-narr> Relevant documents describe measures taken by animal right activists against fur farming and/or fur commerce, e.g. shops selling items in fur. Articles reporting actions taken against people wearing furs are also of importance. </EN-narr> <!-- NOTE: This topic has added tags for GeoCLEF --> <EN-concept> Animal Rights Actions against the fur industry </EN-concept> <EN-spatialrelation> in </EN-spatialrelation> <EN-location> Europe </EN-location> <EN-location> United States </EN-location> </top> Sample GeoCLEF 2005 Topics
Previous Work • GeoCLEF 2005 • Common approaches • Places to store information • Named Entity Recognition • Query Expansion • Traditional IR approaches
Hypothesis • My hypothesis is that using geospatial information for query expansion and to re-weight geospatial components for each query will improve retrieval effectiveness. • Improvement will occur because the expanded query will provide the system with more specific information than that contained in the original query.
Timeline • Timeline • Fall Semester • Build the Gazetteer • Modify Query Analyzer • Design Experiments • Do More Background Reading • Start writing thesis • January Term • Run experiments • Continue writing thesis • Spring Semester • Analyze results • Run more experiments (If necessary) • Finish thesis
References • [1] Davide Buscaldi, Paolo Rosso, Emilio Sanchia Arnal. A WordNet-based Query Expansion method for Geographical Information Retrieval. 2005. • [2] Nuno Cardoso, Bruno Martins, Marcirio Silveira Chaves, Leonardo Andrade, Mario J. Silva. The XLDB Group at GeoCLEF 2005. 2005. • [3] O. Ferrandez, Z. Kozareve, A. Toral, E. Noguera, A. Montoyo, R. Munoz, Fernando Llopis. Univeristy of Alicante at GeoCLEF 2005. 2005. • [4] Daniel Ferres, Alicia Ageno, Horacio Rodriguez. The GeoTALP-IR System at GeoCLEF-2005: Experiments Using a QA-based IR System, Linguistic Analysis, and a Geographical Thesaurus. 2005. • [5] Fredric Gey, Ray Larson, Mark Sanderson, Hideo Joho, Paul Chlough. GeoCLEF: the CLEF 2005 Cross-Language Geographic Information Retrieval Track Overview. 2005. • [6] Fredric Gey, Vivien Petras. Berkeley2 at GeoCLEF: Cross-Language Geographic Information Retrieval of German and English Documents. 2005. • [7] Rocio Guillen. CSUSM Experiments in GeoCLEF2005: Monolingual and Bilingual Tasks. 2005. • [8] Baden Hughes. NICTA i2d2 at GeoCLEF 2005. 2005. • [9] Andras Kornai. MetaCarta at GeoCLEF 2005. 2005. • [10] Sara Lana-Serrano, Jose M. Goni-Menoyo, Jose C. Gonzalez-Cristobal. Miracle’s 2005 Approach to Geographical Information Retrieval. 2005. • [11] Ray R. Larson. Chesire II at GeoCLEF: Fusion and Query Expansion for GIR. 2005. • [12] Jochen L. Leidner. Preliminary Experiments with Geo-Filtering Predicates for Geographic IR. 2005. • [13] Johannes Leveling, Sven Hartrumpf, Dirk Veiel. University of Hagen at GeoCLEF 2005: Using Semantic Networks for Interpreting Geographical Queries. 2005.
Thank you! Questions? Comments?