250 likes | 379 Views
Creating a dual-use pandialectal Pashto grammar AF-PAK LEARN Omaha May 17, 2010. Corey Miller ( cmiller@casl.umd.edu ), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman, Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis. Motivation.
E N D
Creating a dual-use pandialectal Pashto grammarAF-PAK LEARN OmahaMay 17, 2010 Corey Miller (cmiller@casl.umd.edu), Anne David, Michael Maxwell, Alina Twist, Claudia Brugman, Evelyn Browne, Melissa Fox, Michael Marlo, Paul Rodrigues and Tristan Purvis
Motivation • Pashto is an indispensable Afghan language critical to our nation’s security • Pashto is difficult for English speakers • Updated, comprehensive, learner-oriented Pashto materials are needed • Grammar • Easy-access dictionary
What makes Pashto difficult? • Ergativity • Up to four cases: direct, oblique, ablative, and vocative • Multiple noun and adjective declension classes • Variety of adpositions: prepositions, postpositions, and circumpositions • Retroflex consonants • Variety of verbal structures
Project components Formal Grammar Descriptive Grammar Fieldwork Dictionary Parser Parser enables easy access to dictionary
Fieldwork • Identified native speakers of Pashto from Afghanistan and Pakistan living in the US • Peshawar, Quetta, Pakistan • Kabul, Kandahar, Afghanistan • Create and run elicitation guides highlighting range of grammatical features • Review all paradigms and example sentences, note dialect variation • Digitally record all sessions
Motivation for descriptive grammar • Existing materials suffer from liabilities • dated • cover single dialect • Tegey and Robson 1996: Kabul • Penzl 1955: Kandahar • Shafeev 1964: Kandahar • lack Pashto script (T&R has it)
Goals for descriptive grammar • Contemporary data and presentation • Use of Pashto script and transcription throughout • Cover dialect variation wherever it applies
Descriptive grammar • Pashto language, orthography, phonology • Adpositions • Pronouns • Nouns • Adjectives • Verbs • Dialectology • Miscellaneous
Morphological parsing • Inputs • Formal grammar • Dictionary (Lexicon) • Output capability • Analysis: given an inflected form, produce possible headwords • Generation: given a headword, produce possible inflected forms
Uses of morphological parser • Analysis capability enables dictionary lookup of inflected forms • Generation has pedagogical uses including self-testing
How morphological analysis aids lookup • Inflected forms may differ substantially from citation forms • Experts can work around this problem, but non-experts often can’t
The parser maps inflected forms to citation forms (headwords) What does this Pashto word mean? ولم Grammatical info: first person singular present imperfective Citation form:ويشتل What does this Pashto word mean? ولم ويشتل[wishtə́l] (verb) to shoot
Conclusion • Updated descriptive grammar based on fieldwork • Formal grammar and lexicon feed parser • Parser enables simplified dictionary lookup • Faster, more informed processing of Pashto
Conclusion • Updated descriptive grammar based on fieldwork • Formal grammar and lexicon feed parser • Parser enables simplified dictionary lookup • Faster, more informed processing of Pashto
References • David, Anne and Michael Maxwell. 2008. Joint grammar development by linguists and computer scientists. Workshop on NLP for Less Privileged Languages, Third International Joint Conference on Natural Language Processing, Hyderabad, India. • Maxwell, Michael and Anne David. 2008. Interoperable Grammars. First International Conference on Global Interoperability for Language Resources, Hong Kong. • Maxwell, Michael. 2010. Standardizaton as a means to Sustainability. LREC (to appear).
References • Penzl, Herbert. 1955. A Grammar of Pashto. Washington, DC: American Council of Learned Societies. • Tegey, Habibullah and Barbara Robson. 1996. A Reference Grammar of Pashto. Washington, DC: Center for Applied Linguistics. • Shafeev, D. A. 1964. A Short Grammatical Outline of Pashto. International Journal of American Linguistics 30.