1 / 29

Microsoft Research India’s Participation in FIRE2008

Find transliterations of OOV query terms to improve cross-language retrieval efficacy. Explore empirical validation of mining transliterations in relevant documents. Date-based document restriction enhances relevant results by restricting documents based on query dates.

Download Presentation

Microsoft Research India’s Participation in FIRE2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Microsoft Research India’s Participation in FIRE2008 Raghavendra Udupa raghavu@microsoft.com

  2. CLIR System CLEF’07 Query #10.2452/447-AH ऐसे दस्‍तावेज खोजिए जिनमें पिम फोरत्‍यून के राजनैतिक विचारों पर चर्चा की गई हो। पिम फोरत्‍यून की राजनीति Dictionary Query Translator Pim Fortuyn politics Inverted Index Document Ranker LA Times 2002 articles

  3. Domain Adaptation Mining transliterations of OOV words Mining Translation Lexicon from Comparable Corpora Dictionary Query Translator Mining NETE Transliterations from Comparable Corpora Inverted Index Document Ranker Cross-Language Ranking Model Document Collection

  4. Mining transliterations of OOV terms (ECIR 2009) Domain Adaptation Mining Translation Lexicon from Comparable Corpora (MT Summit 2007) Dictionary Query Translator Mining NETE Transliterations from Comparable Corpora (CIKM’08) Inverted Index Document Ranker Cross-Language Ranking Models Document Collection

  5. Baseline Retrieval System Language Model-Based Retrieval Probabilistic Translation Lexicon ~100K parallel sentences IBM Model 3 Alignment GIZA++ J. Jagarlamudi and A. Kumaran, Cross-Lingual Information Retrieval System for Indian Languages. Working Notes for the CLEF 2007 Workshop.

  6. FIRE Fighting Mining Transliterations of Out-Of-Vocabulary Query Terms. Date-Based Document Restriction.

  7. Mining Transliterations of Out-Of-Vocabulary Query Terms Raghavendra Udupa

  8. OOV Query Terms Many OOV query terms are NEs NEs are often the focus of a query NEs form an open class of terms in all languages. Getting their transliterations right is extremely important Many OOV query terms are not NEs but transliterations of English words. E.g. सेमिनार (seminar), कार्पोंरेशन (corporation), चैम्पियन (champion), फिल्म (film)

  9. A Hypothesis The transliterations of most of the transliteratable OOV terms of a query can be found in documents relevant to the query.

  10. Empirical Validation

  11. A Practical Hypothesis The transliterations of many of the transliteratable OOV terms of a query can be found in the top results of the CLIR system for the query.

  12. Mining OOV Transliteration Equivalents Basic Idea: Pair the query with each of the top N results. Treat each pair as a comparable document pair. Mine transliteration equivalents from the comparable document pairs. “They are out there, if you know where to look”: Mining Transliterations of OOV Query Terms for Cross-Language Information Retrieval ECIR 2009, Toulouse

  13. Long Queries: MAP

  14. Short Queries: MAP

  15. FIRE 2008: MAP

  16. FIRE2008: MAP Difference (Long, official)

  17. FIRE 2008: Num_Rel_Ret

  18. FIRE 2008: P@10

  19. Mining Transliterations @ FIRE2008 Worked.

  20. Date-Based Document Restriction Raghavendra Udupa

  21. Dates Some queries contain dates CLEF 2007, Topic 407: Who was the Australian Prime Minister in 2002? CLEF 2007, Topic 411: …terrorist car bomb in Bali, Indonesia, in 2002. CLEF 2006, Topic 326: …winners in any category of the 1995 Emmy Awards. CLEF 2006, Topic 327: …earthquakes in Mexico City in 1995.

  22. Hypothesis If a query contains a date then the relevant documents for the query are likely to be from the same time period.

  23. Empirical Validation CLEF’07 LATimes 2002 CLEF’06 GH 95, LATimes 1994

  24. CLEF’06: C327 Title: Earthquakes in Mexico City Description: Find documents that provide details on the impact of or the damage caused by earthquakes in Mexico City in 1995. Narrative: Relevant document should contain some information on earthquakes in Mexico City in 1995, such as their magnitude, damages caused, panic of the inhabitants, etc. Documents on earthquakes in other places in Mexico are not relevant unless the seismic impact was also felt in Mexico City.

  25. Relevant Document <DOCNO> LA121194-0313 </DOCNO> <DOCID> 107228 </DOCID> December 11, 1994, Sunday, Home Edition A magnitude 6.3 earthquake rocked Mexico City, causing people to flee their homes in fear. There were no immediate reports of injuries or severe damage. The U.S. Geological Survey's National Earthquake Information Center in Golden, Colo., said the quake's epicenter was in Petatlan in the southwestern state of Guerrero.

  26. Date-Based Document Restriction Identify dates (if any) in the query. Restrict candidate documents to the set of documents coming from the same time period.

  27. FIRE 2008: Relevant Docs

  28. FIRE 2008: HindiEnglish MAP

  29. Date-Based Document Restriction @ FIRE2008 Hurt us. Deeper investigation needed.

More Related