110 likes | 253 Views
INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents. Esaú Villatoro-Tello Manuel Montes-y-Gómez Luis Villaseñor-Pineda Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx
E N D
INAOE at GeoCLEF 2008: A Ranking Approach based on Sample Documents Esaú Villatoro-TelloManuel Montes-y-GómezLuis Villaseñor-Pineda Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg
General ideas Our system focuses on the ranking process It is based on the following hypotheses: Current IR machines are able to retrieve relevant documents for geographic queries Complete documents provide more and better elements for the ranking than isolated query-terms We aimed to show that: Using some query-related sample texts it is possible to improve the final ranking of some retrieved documents
Re-ranked documents Re-ranking Process General architecture of our system Document Collection First Stage (Retrieval Stage) Feedback Process IR Machine Query Retrieved documents (small) Selected Sample Texts Query Expansion Retrieveddocuments (large) Second Stage (Ranking stage)
Re-ranking process Geonames DB Geo-Expansion Process SampleTexts Different ranking proposals Similarity Calculation Re-Ranked list of Documents Information Fusion |S| |R| |R| r s r 2 2 2 1 1 1 RetrievedDocuments
System configurationTraditional modules IR Machine: Based on LEMUR Retrieves 1000 documents (original/expanded queries) Feedback module Based on blind relevance feedback Selects the top 5 retrieved documents (sample texts) Query Expansion Adds to the original query the five most frequent terms from the sample texts
System ConfigurationRe-ranking module Geo-Expansion: Geo-terms are identified using NER LingPipe Expands geo-terms of sample texts by adding their two nearest ancestors (Paris France, Europe) Similarity Calculation: Considers thematic and geographic similarities; it is based on the cosine formula Information Fusion: Merges into one single list all different ranking proposals, using the Round-Robin technique
Evaluation points Document Collection First Stage (Retrieval Stage) 1st EP Feedback Process IR Machine Query Retrieved documents (small) Selected Sample Texts Re-ranked documents Query Expansion Re-ranking Process 2nd EP Retrieveddocuments (large) Second Stage (Ranking stage) 3rd EP
Experimental resultsSubmitted runs +4.87% +3.33% +0% +3.24%
+26.4% +15.8% +28.3% +3.24% Experimental resultsAdditional runs • Sample texts were manually selected (from Inaoe-BASELINE1) • Two documents were selected in average for each topic
Final remarks Results showed that the query-related sample texts allow improving the original ranking of the retrieved documents Our experiments also showed that the proposed method is very sensitive to the presence of incorrect sample texts Since our geo-expansion process is still very simple, we believe it is damaging the performance of the method Ongoing Work A new sample text selection method A new strategy for geographic expansion that considers a more precise disambiguation strategy
Thank you! Manuel Montes y Gómez Language Technologies Laboratory National Institute of Astrophysics, Optics and Electronics Tonantzintla, México mmontesg@inaoep.mx http://ccc.inaoep.mx/~mmontesg