200 likes | 476 Views
Morphological Analysis of Hungarian in NooJ. Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics. Summary. Hungarian morphology Linguistic resources Some experiments with INTEX/NooJ The solution Examples Derivation. Hungarian morphology.
E N D
Morphological Analysis of Hungarian in NooJ Peter Vajda Hungarian Academy of Sciences Research Institute for Linguistics
Summary • Hungarian morphology • Linguistic resources • Some experiments with INTEX/NooJ • The solution • Examples • Derivation
Hungarian morphology • Agglutinative (and sometimes inflectional) • The suffixes • Can have many forms (vowel harmony) • Can change the form of the stem (there are groups of variants) • bokor (sg.) bokr – ok (pl.); alma (sg.) almá – k (pl.) • Sometimes begin with a linking vowel • plural: -k / -ak / -ek / -ok / -ök • A noun (adj., num.)can have ~ 7-800 forms • A verb can have ~ 80 forms • Orthography: there are difficulties, when digraphs are doubled • cs cscs ccs, gy gygy ggy
Nominal inflections • 18 cases (nominative, accusative, dative + grammatical relations which are expressed by prepositions in French/English) • Expression of the possessives by suffixes • Which mark the number, the person, the number of the possessed • ház-a-m, ház-a-d, ház-a (my/your/his house) • ház-a-i-m, ház-a-i-d, ház-a-i (my/your/his houses) • Anaphorical possessive • A ház Péteré The house is Péter’s; A házak Péteréi The houses are Péter’s • The maximal number of inflections can be five • barát-ai-tok-é-i-t • (I can see) those (things) of your friends’
Verbal inflections • Two tenses: present, past • three modes: indicative, conditional, imperative • definite and indefinite conjugations • Néz-ek egy asztalt Néz-em az asztalt • I watch a table I watch the table • one special form where the subject is in 1st person and the object is in the 2nd: • néz-lek (I watch you) • infinitive and „conjugated infinitive” (sometimes subjunctive in French)
The resources • Dictionary of Hungarian inflections (Elekfi,’92) • A traditional description, profound and exhaustive • Two dimensional classification: • Vowel harmony (3 classes) and • complex features of the stems (stem-types, linking vowel, etc., 55 classes) • Altogether: 1700 different sub-classes (paradigms) • systematic differences and similarities are hidden • not convenient to use in finite-state transducers • We have converted it into a database, where we can retrieve all the forms from
The experiments with INTEX/NooJ • ‘Brute-force’ method • We created one graph per sub-class for testing INTEX • 1700 sub-graphs • 45000 paths in the graphs… • Using only dictionaries (.nod) • Dictionary of stems (70000 words) • ház,ház,N+C2A+stem=1+NW • Dictionary of suffixes (one million entries) • (*)ak,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PL} • (*)am,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PSe1} • (*)at,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=ACC} • (*)at,<$1=N+C2A1+stem=1>{$0,$1L,N$1S+ana=ACC} • (*)amat,<$1=N+C2A+stem=1>{$0,$1L,N$1S+ana=PSe1+ACC} • dictionary of lexical forms (which have a zero morpheme as suffix) • ház,ház,N+ana=NOM
The linguistic solution • transform the database into a grammar based on morpho-phonological features • The grammatical features of stems and morphemes are in the dictionary • The features of the stems and the suffixes can be unified • Grammar • We have to describe the order of the morphemes • Introduce features which select from the allomorphs
The order of morphemes for nominals barát-a-i-tok-é-i-t barát,N +PS +PL +ps_2 +ps_pl +ANAP+i +ACC
Morpho-phonological features To introduce features we examine the allomorphs • HÁZ HAJÓ • HÁZ - AHAJÓ-JA • ház,,N+nonj hajó,,N+j • HÁZ - AT HAJÓ - T • ház,,N+nonj+acclink hajó,,N+j+accnolink
The plural and the accusativekalap - ot(hat, SG+ACC)kalap - ok - at(hats, PL+ACC)
Derivation • Can change or leave the category (POS) • Introduce new features • kosár kosar - ak(pl.)basket • kosar-as kosar - as - ok(pl.)basketball player • Simple cases are handled by graphs • Others are listed as lemmas in the dictionary
Assimilation and digraphs • some suffixes (eg. val/vel) enforce total assimilation: • LÉC + VEL LÉCCEL • PÉCS + VEL PÉCCSEL • PLÉD + VEL PLÉDDEL
Conclusion • We have adapted the traditional description • We have described the inflectional morphology of Hungarian in NooJ grammars/dictionaries • Handled some of the derivational morphology • Objectives • Find a simpler method for derivation • Disambiguation • Automatic methods to expand the dictionary • Automatic delegation of features