1 / 28

Mono & Cross Language Experiments on Persian Text

University of Tehran Database Research Group. Mono & Cross Language Experiments on Persian Text. Persian@CLEF 2008. Abolfazl AleAhmad, Hadi Amiri, Farhad Oroumchian Database Research Group School of Electrical and Computer Engineering University of Tehran. 18 Sep 2008. Outline.

reia
Download Presentation

Mono & Cross Language Experiments on Persian Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. University of Tehran Database Research Group Mono & Cross Language Experiments on Persian Text Persian@CLEF 2008 Abolfazl AleAhmad, Hadi Amiri, Farhad Oroumchian Database Research Group School of Electrical and Computer Engineering University of Tehran 18 Sep 2008

  2. Outline • Persian Language • Persian Test Collections • Hamshahri in CLEF 2008 • UT Participants • Using Part of Speech Tagging in Persian Information Retrieval • Fusion of Retrieval Models at CLEF 2008 Ad-Hoc Persian Track • Local Cluster Analysis Using Part of Speech Tagging • Investigation on Application of Local Cluster Analysis and Part of Speech Tagging on Persian Text • Cross Language Experiments at Persian@CLEF 2008 • Next Year

  3. The Persian Language • A branch of Indo-European Languages • Official Language of Iran, Afghanistan and Tajikistan • Its morphological analysis is Comparably difficult • The word “خبر” has two plural forms: • Persian rules: “خبرها” • Arabic rules: “اخبار”

  4. Some Processing Issues • Writing Style Issues: • e.g. ”می شود“ and “میشود” are the same • e.g. ”کتابها“ and ”کتاب ها“ are the same • KASRE: • e.g. چراغ علی خانه را سوزاند has two different meanings: • CheraghAli burned the house • Ali’s lantern burned the house

  5. Some Processing Issues • Encoding 

  6. Persian in the Middle East User Population Growth on the Web (2000-2009) December 31, 2007 Source: Internet World Stats, http://internetworldstats.com/

  7. Persian Test Collections • IR Domain • Ghavanin (domain specific) • Hamshahri (news) WEB: http://ece.ut.ac.ir/dbrg/hamshahri • NLP Domain • Bijankhan (2 Million Word) WEB: http://ece.ut.ac.ir/dbrg/bijankhan

  8. Hamshahri in CLEF 2008 • News articles of Hamshahri newspaper from year 1996 to 2002 • Size of the documents varies from short news (under 1 KB) to rather long articles (e.g. 140 KB) • 22 assessors • Evaluation based on DIRECT System

  9. Hamshahri in CLEF 2008

  10. Implementation of our methods We submitted top 100 for each run

  11. Using Part of Speech Tagging in Persian Information RetrievalReza Karimpour, AminehGhorbani, AzadehPishdad, MitraMohtarami, AbolfazlAleAhmad, HadiAmiri, FarhadOroumchian

  12. Using Part of Speech Tagging in Persian Information Retrieval

  13. Using Part of Speech Tagging in Persian Information Retrieval

  14. Using Part of Speech Tagging in Persian Information Retrieval

  15. Fusion of Retrieval Models at CLEF 2008 Ad-Hoc Persian Track Zahra Aghazade, Nazanin Dehghani, Leili Farzinvash, Razieh Rahimi, Abolfazel AleAhmad, Hadi Amiri, Farhad Oroumchian Terrier Open Source Retrieval Engine: http:// ir.dcs.gla.ac.uk/terrier/

  16. Fusion of Retrieval Models at CLEF 2008 Ad-Hoc Persian Track

  17. Fusion of Retrieval Models at CLEF 2008 Ad-Hoc Persian Track • And two other variations of this operator: IOWA and NOWA

  18. Fusion of Retrieval Models at CLEF 2008 Ad-Hoc Persian Track

  19. Fusion of Retrieval Models at CLEF 2008 Ad-Hoc Persian TrackPost hoc Results

  20. Investigation on Application of Local Cluster Analysis and Part of Speech Tagging on Persian TextAmir Hossein Jadidinejad, Mitra Mohtarami,Hadi Amiri

  21. Investigation on Application of Local Cluster Analysis and Part of Speech Tagging on Persian Text But the result was not good on the test set

  22. Cross Language Experiments at Persian@CLEF 2008Abolfazl AleAhmad, Ehsan Kamalloo, Arash Zareh, Masoud Rahgozar, Farhad Oroumchian

  23. Cross Language Experiments at Persian@CLEF 2008Abolfazl AleAhmad, Ehsan Kamalloo, Arash Zareh, Masoud Rahgozar, Farhad Oroumchian Query Translation • Probabilistic Structured Queries (PSQ) • Combinatorial Translation Probability (CTP) 

  24. Cross Language Experiments at Persian@CLEF 2008Abolfazl AleAhmad, Ehsan Kamalloo, Arash Zareh, Masoud Rahgozar, Farhad Oroumchian Query Translation Results

  25. Cross Language Experiments at Persian@CLEF 2008Abolfazl AleAhmad, Ehsan Kamalloo, Arash Zareh, Masoud Rahgozar, Farhad Oroumchian Document Translation • Using Shiraz machine translation system from CRL of NMSU • Took 10 days to translate 130,000+ docs from Persian to English

  26. Cross Language Experiments at Persian@CLEF 2008Abolfazl AleAhmad, Ehsan Kamalloo, Arash Zareh, Masoud Rahgozar, Farhad Oroumchian Document Translation & Hybrid Results

  27. Next Year • Ham2 for the Next Year • Extended Version of Hamshahri Collection • 2 times larger (~1.5 GB)

  28. Questions?Thanks For Your Attention Database Research Group http://ece.ut.ac.ir/dbrg

More Related