10 likes | 137 Views
no base found. Input String. Base Trie. Return false. Base Guess. ending found. Ending Trie. Return true. no postbase found. Postbase Trie. Return false. Expanded Base = Base Guess + Postbase Guess. Yup’ik language dictionary and processing software
E N D
no base found Input String Base Trie Return false Base Guess ending found Ending Trie Return true no postbase found Postbase Trie Return false Expanded Base = Base Guess + Postbase Guess Yup’ik language dictionary and processing software Eric Somerville, Researcher Dr. Frank Moore, MentorOffice of Undergraduate Research and Scholarship, University of Alaska Anchorage • Introduction • This dictionary project is the first step in a larger project to develop tools that indigenous Alaskans can use to help revitalize their languages. • Purpose • To develop software to encourage Yup’ik writing using modern technologies. • Goals • This project will develop data structures to define and store a digital dictionary for the Central Yup’ik language. • This project will develop basic word-checking software to show the functionality of this dictionary. • Process • Define Yup’ik word-forming grammar rules with clear algorithms • Create data structures to store Yup’ik morphemes • Develop word-checking software to test data structures and algorithms Yup’ik Spell Checking The current algorithm I plan to use for this spell checking software will search through a series of trie data structures to search for matching morphemes, combining them using proper Yup’ik grammar. This process begins by checking a list of bases, returning a list of possible bases to be checked. Each base will go through a process of adding appropriate postbases and endings, until the input word is either completely formed or found to be outside the dictionary. Below is a figure outlining this process. Modeling Yup’ik Grammar The Yup’ik language is a polysynthetic language. This means that most Yup’ik words are formed using a base, zeroor morepostbases, and an appropriate ending. Each time a postbase is added to the base, it is treated as an expanded base and can receive additional postbases to add meaning to the word. The bases can be defined as being either a noun or a verb, and can be classified into one of six morpho-phonological classes that help define how postbases and endings will be added. This is a brief table of nouns professor Marie Mead uses in her Yup’ik language classes here at the university. Included are two endings to show notation used for postbases and endings. What is a trie? A trie is a type of tree data structure. A data structure is made up of a series of linked nodes. The lengthy explanation… A tree structure begins with a head node, providing the starting point for the tree. This head node contains the addresses of each of the nodes that branch from it, each of which is referred to as a child node. Each of these child nodes, in turn, have links to child nodes of their own, until all the data that needs to be found in the tree has been stored. The trie data structure is a tree that can be easily used to store and look up words in a dictionary. The head node points at each beginning letter of a word. Each of these letters will point to possible letters that may follow it. This process is repeated until the longest word in the dictionary has been defined. However, a figure would be the best way to describe how this data structure works. Here is a trie storing the English words: DO, DOG, DOT, and COT. The short explanation… Revitalizing Central Yup’ik Language The Central Yup’ik language, like Alaska Native languages all over the state, has been in decline for a generation or more. The younger generations of speakers are not learning and using the language of their ancestors. We may be able to reconnect with the next generation of speakers using modern technologies. Head Node • Notice that in these examples the vowel, i, was doubled when adding class VI noun base, panig-, with the unpossessed plural ending, %:(e)t. Notice also how the k from first-person singular possessive (1s-s) ending, -ka, reverted to -qa after forming with the r base ending of qetunrar-. • Defining the symbols • The process of adding %:(e)t to class VI base, panig- is as follows: • % - the final consonant of the class VI base is retained. The base remains panig- • (e) – the letter, e, is inserted for some bases, but not others. In this case, for class VI only. This gives us: panig:et • : - when a velar marked with this character is surrounded by single vowels, the vowel-velar-vowel series is replaced with a vowel-vowel pair. Giving us: paniit D C Notice the process of adding postbases onto bases, forming extended bases, then passing the extended base back to the relevant postbase and ending tries. Five tries to get it right… The tries I’m currently planning to divide Yup’ik morphemes into are as follows: A base trie will store the list of both noun and verb bases in a single structure. A verbal-adding postbase trie will contain all postbases that can be added to a verb base. Some of these postbases will expand the verb and leave it verbal, some will expand the verb, changing it into a noun. Complimenting this trie will be the nominal-adding postbase trie, storing all postbases that can be added to noun bases. The final two tries will store verb and noun endings. O O end G T T end end end Acknowledgments Thank you Theo Sery and JeaneBreinig with the UAA English department. Theo for helping me write about these projects and Jeane for being supportive with credit. Thank you Marie Meade and Nancy Furlow with Alaska Native Studies for providing excellent education and support for Alaska Native peoples. Thank you Frank Moore and Kendrick Mock in the Computer Science department for providing excellent instruction and guidance. But thank you most of all to Herb Schroeder and everybody in the ANSEP team. It’s their community environment and financial support that enables me to continue school. References Jacobson, S. A. (1984). Yup’ik Eskimo Dictionary. Fairbanks, AK: Alaska Native Language Center. Krauss, M. E. (1980). Alaska native languages: Past, present, and future. Fairbanks, AK: Alaska Native Language Center. Opsahl, A. (ed.). (2010). Alaska company wins $25.3broadband stimulus grant. Retrieved from http://www.govtech.com/gt/742350 Reed, I., Miyaoka, O., Jacobson, S., Afcan, P., Krauss, M. (1977). Yup’ik Eskimo Grammar. Fairbanks, AK: Alaska Native Language Center. For Further Information Please contact esomervi@uaa.alaska.edu. www.camai-ellamyui.com should become available over the summer.