1 / 18

Arabic NLP: Challenges & Opportunities

Arabic NLP: Challenges & Opportunities. Dr. Samir Tartir Scientific Day Faculty of Information Philadelphia University May 15 th 2013. ثمن. علم. قِ. General Information. History (Classical) Arabic has remained unchanged, intelligible and functional for more than fifteen centuries.

nenet
Download Presentation

Arabic NLP: Challenges & Opportunities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Arabic NLP: Challenges & Opportunities Dr. Samir Tartir Scientific Day Faculty of Information Philadelphia University May 15th 2013

  2. ثمن

  3. علم

  4. قِ

  5. General Information • History • (Classical) Arabic has remained unchanged, intelligible and functional for more than fifteen centuries. • Strategically important • 330 million speakers living in an important region • huge oil reserves, sacred sites. • 1.4 billion Muslims use in their prayers. • Cultural and literary heritage • Closely associated with Islam

  6. Distribution

  7. Versions • Classical • Modern • Dialects

  8. Arabic Language Characteristics • Highly structured • Highly derivational language • Morphology • Free word order • Modern Arabic lacks diacritics (short vowels)

  9. Example* *Microsoft Arabic NLP Toolkit (ATK) For Academia in the Arab World Presentation, 11/2012

  10. Arabic Language Characteristics • Synonymy and confusion of non-standardized terms • Thermometer: محر، محرار، مقياس حرارة، ميزان حرارة، ترمومتر • Technical translation • Hydrometer: جهاز قياس كثافة السوائل • Uncle, parent…

  11. Letters • One letter, one sound • Letters change shape • Hamza • No capital letters • Can use normalization

  12. Ambiguity • Homographs • قدم • Internal word structure ambiguity • بعقوبة • Syntactic ambiguity • قابلت مدير البنك الجديد • Semantic ambiguity • يحب علي احمد اكثر من ابراهيم • Anaphoric ambiguity • قابل الصحفي الوزير الذي انتقده

  13. NLP • Automatic summarization • Machine translation • Named entity recognition (NER) • Natural language generation • Natural language understanding • Optical character recognition (OCR) • Question answering • Sentiment analysis • Speech recognition • Word sense disambiguation • Information retrieval (IR) • Speech processing • Text-to-speech • Natural language search • Automated essay scoring • etc

  14. Question Answering** Hammo et al. QARAB: A Question Answering System to Support the Arabic Language. Workshop on Computational Approaches to Semitic Languages. ACL 2002

  15. Arabic NLP Issues • Lack of tools • Lack of linguistic references • Lack of training data

  16. Available Tools • Arabic Treebank • Arabic WordNet • MySQL database • SUMO Ontology • Java • Microsoft Arabic Toolkit (ATK)

  17. Summary • Arabic is difficult to deal with • Progress has been made • More work is done on different parts • Any progress is valuable • Business • Personal • Governmental

  18. Thank you

More Related