130 likes | 251 Views
APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT PARTS-OF-SPEECH TAGGER. BY TSHISHONGA AW 2859268. 11/04/08. 1. Part-of-speech (POS) tagging is the process of assigning words their part of speech tag. A part of speech tag is a label i.e. Noun, Verb , Adjectives, etc.
E N D
APPLYING VENDA TEXT TOWARDS THE DEVELOPMENT OF AN INTELLIGENT PARTS-OF-SPEECH TAGGER BY TSHISHONGA AW 2859268 11/04/08 1
Part-of-speech (POS) tagging is the process of assigning words their part of speech tag. A part of speech tag is a label i.e. Noun, Verb , Adjectives, etc. POS is done by looking at the relationship with adjacent words. A simplified form is taught to school children. The Venda language has unique diacritics. INTRODUCTION
THE DEVELOPMENT PROCESS A Venda translator • A parts-of-speech tagger that allows the user to change tags to solve for ambiguity of tags. • Compute initial Hidden Markov Models(HMMs) . • Compute test data Very ambitious A generic tagger Still ambitious Best Solution
R REQUIREMENT ANALYSIS User Requirements • Abney, S. Part-of-speech tagging and partial parsing. • Brill, E. A simple rule-based part-of-speech tagger. • Prez, L. C. IEEE information theory society newsletter. • Samuelsson, C., and Voutilainen, A. Comparing a linguistic and a stochastic tagger. pp. 246–253. • Shannon, C. E. A mathematical theory of communication. Research Prototype
USER OCCUPATION USER 1 MSc STATS USER 2 BSc Honors USER 3 BSc Microbiology USER 4 UCT MSc Sociology USABILITY TESTING
USABILITY TESTING 11/04/08 9
USER’S GUIDE First Screen Second Screen File Menu Save a file Exit View Menu Word frequency Count words Edit Word model • File Menu • Open a file • Exit • View Menu • Word frequency • Count words • Edit • Clear • Help
REFERENCES [1] Abney, S. Part-of-speech tagging and partial parsing. In Corpus-Based Methods in Language and Speech (Dordrecht, 1996), K. Church, S. Young, and G. Bloothooft, Eds., Kluwer Academic Publishers. [2] Brill, E. A simple rule-based part-of-speech tagger. In Proceedings of ANLP-92, 3rd Conference on Applied Natural Language Processing(Trento, IT, 1992), pp. 152–155.
REFERENCES [3] Prez, L. C. Ieee information theory society newsletter. ISSN 105 53, 04 (2003), pp1–10. [4] Samuelsson, C., and Voutilainen, A. Comparing a linguistic and a stochastic tagger. In Proceedings of the eighth conference on European chapter of the Association for Computational Linguistics (Morristown, NJ, USA, 1997), Association for Computational Linguistics, pp. 246–253. [5] Shannon, C. E. A mathematical theory of communication. The Bell System Technical (1948), pp1–12.
THE DEMO • Open a file • View User manual • Tagging a file. • Search for multiple occurrences of word. • Insert a diacritic. • Copy and paste. • Save a file • Exit the system