1 / 29

Finding Entries in an On-line Arabic Dictionary

Finding Entries in an On-line Arabic Dictionary. 27 May 2010 27 th Annual HCIL Symposium Sarah C. Wayland, C. Anton Rytting, David Zajic, Timothy Buckwalter, Jason White, Corey Miller, Jeffrey Carnes, Nathanael Lynn, Paul Rodrigues, Michael Maxwell, Evelyn Browne. Arabic is not English.

Download Presentation

Finding Entries in an On-line Arabic Dictionary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Entries in an On-line Arabic Dictionary 27 May 2010 27th Annual HCIL Symposium Sarah C. Wayland, C. Anton Rytting, David Zajic, Timothy Buckwalter, Jason White, Corey Miller, Jeffrey Carnes, Nathanael Lynn, Paul Rodrigues, Michael Maxwell, Evelyn Browne

  2. Arabic is not English • Different sounds (e.g., voiceless uvular /q/, retroflex /l/, voiced velar fricative /gh/, glottal stop / ‘ /) • Different letters (‏مباريات) • Different morphology (templatic vs. affixative) • Written form doesn’t reflect spoken dialect • Keyboard has different layout/letters

  3. Many informal texts diverge from Modern Standard Arabic Texts differ from classroom Arabic in orthography, morphology, and lexical content.

  4. Many informal texts diverge from Modern Standard Arabic Texts differ from classroom Arabic in orthography, morphology, and lexical content. Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”

  5. Some dialects use non-standard characters Orthographic Differences

  6. Many informal texts diverge from Modern Standard Arabic Texts differ from classroom Arabic in orthography, morphology, and lexical content. Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”

  7. Many informal texts diverge from Modern Standard Arabic Texts differ from classroom Arabic in orthography, morphology, and lexical content. Orthographic differences are based on dialect pronunciations, typographical errors, and ... “style.”

  8. Many informal texts diverge from Modern Standard Arabic Texts differ from classroom Arabic in orthography, morphology, and lexical content. Orthographic differences are based on dialect pronunciations, typographical errors,and ... “style.”

  9. Consonants sometimes vary across dialects Phonetic Differences

  10. Morphologically Complex * (the only forms listed in the dictionary)

  11. The Arabic keyboard makes difficult-to-detect typos likely

  12. The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters are often visually similar

  13. The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters are often visually similar

  14. The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters are often visually similar

  15. The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters also often sound similar (with contrasts not found in English)

  16. The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters also often sound similar (with contrasts subject to place-assimilation)

  17. The Arabic keyboard makes difficult-to-detect typos likely Adjacent letters also often sound similar (particularly so in some dialect pronunciations)

  18. Putting DYM…? together H ح Aا Rر Bب • A query is checked by composing a single-string finite state automaton (FSA) with: • weighted keyboard, visual, and sound-based FSTs • a dictionary FSA (with weights for dialect variants) • The n-best paths yielding unique strings are calculated • The corresponding strings are displayed to the user visual keyboard sound-based HARB, ?ARB, OARB, ....

  19. Show non-verbs Show verbs

  20. Download Results

  21. Arabic is not English! • One user interface for all languages will not work • We must customize the user interface to take into account the unique structure of each language

  22. Sarah C. Wayland swayland@casl.umd.edu 301-226-8938

More Related