30 likes | 360 Views
Having an Arabic corpus: problems and challenges . Arabic corpora : design, construction and annotation . The use of corpora in Arabic language research . Areas of research are : 1-Lexis 2-Lexicography 3-Syntax 4-Collocation 5-NLP systems 6-Analysis tools 7-Stylistics, and
E N D
Having an Arabic corpus: problems and challenges Arabic corpora : design, construction and annotation The use of corpora in Arabic language research Areas of research are : 1-Lexis 2-Lexicography 3-Syntax 4-Collocation 5-NLP systems 6-Analysis tools 7-Stylistics, and 8-Discourse analysis • Availability • Forms and providers • Ability to be target tailored • Most famous providers (Linguistic Data Consortium , Arabic Treebank, Latifa Al- Sulaiti, European Languages Resources Association)
Nafs Corpus(under construction ) • 1- Selection of texts • 2-Putting it in the right format for processing • 3- Cleaning of the texts • 4- Transliteration and its problems • 5- MADA • 6- Nouns lists – Dictionary • 7-Propsed algorithm based on Mitkov’s knowledge- poor approach • 8-Problems due to the nature of language itself