290 likes | 388 Views
Finding Entries in an On-line Arabic Dictionary. 27 May 2010 27 th Annual HCIL Symposium Sarah C. Wayland, C. Anton Rytting, David Zajic, Timothy Buckwalter, Jason White, Corey Miller, Jeffrey Carnes, Nathanael Lynn, Paul Rodrigues, Michael Maxwell, Evelyn Browne. Arabic is not English.
E N D
Finding Entries in an On-line Arabic Dictionary 27 May 2010 27th Annual HCIL Symposium Sarah C. Wayland, C. Anton Rytting, David Zajic, Timothy Buckwalter, Jason White, Corey Miller, Jeffrey Carnes, Nathanael Lynn, Paul Rodrigues, Michael Maxwell, Evelyn Browne
Arabic is not English • Different sounds (e.g., voiceless uvular /q/, retroflex /l/, voiced velar fricative /gh/, glottal stop / ‘ /) • Different letters (مباريات) • Different morphology (templatic vs. affixative) • Written form doesn’t reflect spoken dialect • Keyboard has different layout/letters
Many informal texts diverge from Modern Standard Arabic Texts differ from classroom Arabic in orthography, morphology, and lexical content.
Many informal texts diverge from Modern Standard Arabic Texts differ from classroom Arabic in orthography, morphology, and lexical content. Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”
Some dialects use non-standard characters Orthographic Differences
Many informal texts diverge from Modern Standard Arabic Texts differ from classroom Arabic in orthography, morphology, and lexical content. Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”
Many informal texts diverge from Modern Standard Arabic Texts differ from classroom Arabic in orthography, morphology, and lexical content. Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”
Many informal texts diverge from Modern Standard Arabic Texts differ from classroom Arabic in orthography, morphology, and lexical content. Orthographic differences are based on dialect pronunciations, typographical errors,and ... “style.”
Consonants sometimes vary across dialects Phonetic Differences
Morphologically Complex * (the only forms listed in the dictionary)
The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters are often visually similar
The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters are often visually similar
The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters are often visually similar
The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters also often sound similar (with contrasts not found in English)
The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters also often sound similar (with contrasts subject to place-assimilation)
The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters also often sound similar (particularly so in some dialect pronunciations)
Putting DYM…? together H ح Aا Rر Bب • A query is checked by composing a single-string finite state automaton (FSA) with: • weighted keyboard, visual, and sound-based FSTs • a dictionary FSA (with weights for dialect variants) • The n-best paths yielding unique strings are calculated • The corresponding strings are displayed to the user visual keyboard sound-based HARB, ?ARB, OARB, ....
Show non-verbs Show verbs
Arabic is not English! • One user interface for all languages will not work • We must customize the user interface to take into account the unique structure of each language
Sarah C. Wayland swayland@casl.umd.edu 301-226-8938