150 likes | 167 Views
Learn about ArabCode conversion, ArabSpell system, and acolor sty package in ArabTeX notation at FLM with Unicode approach, Buckwalter transliteration, and grammar concepts.
E N D
External Tools Not Only for ArabTeX Documents Karel MokryOtakar Smrz Faculty of Mathematics and Physics Charles University in Prague Processing of Arabic at FLM
… which include • ArabCode – nontrivial conversion of encoding standards of Arabic script • ArabSpell – rule-driven spelling system suited especially for vocalized Arabic encoded in ArabTeX notation • acolor.sty – package for control over coloring in ArabTeX and LaTeX typesetting systems Processing of Arabic at FLM
ArabTeX encoding concept • Lower ASCII, human-readable, rather phonetic • Algorithmic determination of several phenomena of Arabic script • Evaluation of context, parametric interpretation • Contemporary and historical orthography <iqra’ h_a_dAan-na.s.sa bi-intibAhiN> versus Aiqora>o h`*aA{ln~aS~a bi{notibaAhK Processing of Arabic at FLM
Ordinary graphemic approach • Unicode / Unicode Transformation Format (UTF) with great descriptive scope Ux0639 / 0xD8 0xB9 (Arabic `ayn) 0000 0110 0011 1001/1101 10001011 1001 Ux004C / 0x4C (Latin L) 0000 0000 0100 1100/0100 1100 • Windows CP 1256, ISO 8859-6, ASMO 449 etc. • Buckwalter Transliteration using lower ASCII Processing of Arabic at FLM
ArabCode solution • Set of subroutines and scripts in Perl • Complex ArabTeX UTF / Unicode • Documented Unicode UTF • Quite easy UTF / Unicode Windows ISO ASMO Buckwalter etc. • CurrentlyArabTeX Windows and Windows UTF ISO ASMO Buckwalter Processing of Arabic at FLM
ArabCode method • Considering problem ArabTeX UTF / Unicode • Present: • Regular expressions – system tool, fast and safe • Rules wired-in in the code – hard to maintain, inflexible … • Future: • Finite-state transducer – most adequate, use of own implementation may slow computation down • External grammar – clear and extensible rules Processing of Arabic at FLM
ArabSpell motivation • Spell-checking of entries of human-edited lexical database • Supervision over misuse of notation, document consistency requirement • Trial and error way of teaching it • One version already applied to educational purpose documents and a book of Arabic proverbs Processing of Arabic at FLM
ArabSpell novel concept • Separation of the definition of the language and the response from the spell-checking engine • Right Linear Grammar and convenient syntax source :<code>:<text>target <text> • Nondeterministic Finite Automaton and its construction from the grammar t “” t x t source e target :<code>: Processing of Arabic at FLM
Grammar of Arabic syllable • Nonterm generative rules syllable :< "Unruly input!" >: [C][V][C+empty]syllable [C][V][C+empty] [C][ending] • Cluster definition rules … [C] :<>: <'><b><t><_t><^g><.h><_h><d><_d><r><z><s><^s><.s><.d><.t><.z><`><.g><f><q><k><l><m><n><h><w><y> [V] :<>: <a><i><u><A><I><U>:<>: Processing of Arabic at FLM
… continuation <_a>:< "Dagger 'alif occurred." >: <aa>:< "Use <A> instead!" >: <iy>:< "Use <I> instead!" >: <uw>:< "Use <U> instead!" >: [ending]:< "Invalid ending?" >:<uN> <iN> <aN><aNY><Y>:<>:<aNA><UA><aW> <aWA>:< "Silent 'alif enforced." >: [empty]:<>:<> # see [C+empty] above • Multi-functionality of the :<>: operator Processing of Arabic at FLM
ArabSpell features • Clusters enable eminent network optimization • Spelling :< Perl subroutines >: extend the class of languages beyond regular ones • Bracket matching, word repetition • Control over long-distance dependencies • Easy counting, e.g. word and sentence length • Reports in different language versions • Detailed yet flexible grammar for Arabic, models of other formalizable languages Processing of Arabic at FLM
Using acolor.sty • Typesetting Arabic script in color with ArabTeX • Text marking, hide-and-check of diacritics • Primers, textbooks, educational purposes • Coloring commands combined with original ArabTeX vocalization control • No modification of the input data themselves Processing of Arabic at FLM
… for any diacritics \coldia{red}\fullvocalize\accentshigh \nocolshadda\colother{blue}\vocalize \nocolall\colhamza{green}\vocalize Processing of Arabic at FLM
… for other marking \nocolall\colbeginning{blue}\novocalize \nocolall\colshadda{white}\novocalize \colisolated{red}\vocalize\accentslow Processing of Arabic at FLM
Acknowledgement • Arabic script displays in this presentation were typeset using the ArabTeX package for TeX and LaTeX by Prof. Dr. Klaus Lagally of the University of Stuttgart. Existence of this system has inspired our work principally. Processing of Arabic at FLM