200 likes | 832 Views
Morphology For Marathi POS-Tagger Veena Dixit 11/ 10 /2005. Contents Word Morphology Marathi Morphology - definition of the task and difficulties thereto. Marathi Morphology - solutions to the challenges Different word classes Postpositions Particles Interjections Conjunctions
E N D
Contents • Word • Morphology • Marathi Morphology - definition of the task and difficulties thereto. • Marathi Morphology - solutions to the challenges • Different word classes • Postpositions • Particles • Interjections • Conjunctions • Pronouns • Adjectives • Adverbs • Verbs • Nouns
Words are the orthographical strings separated by spaces and some punctuation marks. • To syntax, words make sentences and to morphology, word has internal structure and has different inflectional forms. • Inflectional forms of a root word form a paradigm based on a principle. • Root word is the form which is stored in lexicons / dictionaries.
What is Morphology? • Morphology is the study of forms of words in the language, especially the different forms used in declensions, conjugations, and word building. It deals with the morphemes. • Morpheme isa term which refers to the smallest component of a word that (a) seems to contribute some sort of meaning, or a grammatical function to the word to which it belongs, and (b) cannot be decomposed into smaller morphemes.
Marathi MorphologyDefinition of the task and difficulties thereto • Morphological analysis of Marathi plays significant role in natural language processing because Marathi, a pan Indian Language, is rich in morphology. • Marathi, being the language of the area situated centrally, gets influenced by almost all language groups of India. • This makes the Marathi morphology more complicated.
Marathi Morphologysolutions to the challenges • Morphological analysis is done category wise. • Parameters for changes in the root word are identified. • Rules are constructed in the tabular form to facilitate computation.
Marathi Word Classes • Nouns • Pronouns • Adjectives • Verbs • Adverbs • Postpositions • Conjunctions • Interjections • Particles • Punctuation Mark
Postpositions • Postposition is the morpheme that follows the words and shows the relation between the word that is followed and other word in the sentences. • Case markers and shabdayogi avyaya are classified as postpositions in Marathi because they show same behavior. (ref. ‘Classification of Words’, Veena Dixit, proceedings of 26th AICL, Shilong, 2004 )
Postpositions(continued) • In Marathi, postpositions are attached to all classes of words except interjection. examples • When a postposition is attached to a stem it produces mainly adverb, but also, adjective and conjunction. • Postpositions are handled along with other word classes. • 5 subgroups of postpositions are identified on the basis of what is the possible order of their attachment and to which group of words they can be attached.
Particles • Strings like ही – hi_also, च – cha_only, सुद्धा– suddha_also, are • sometimes attached to other words (e.g.. खाली – khaali _under – खालीसुद्धा - khaalisuddhaa_under also / झाड -jhaaDa _ tree - झाडसुद्धा-jhaaDasuddhaa _ tree also ) • or sometimes they are written separately (e.g.. झाडाखाली -jhaaDaakhaali_ under the tree –झाडाखालीसुद्धा - jhaaDaakhaalisuddhaa_ under the tree also). • When such words are attached to other words, the word to which it is attached, does not get inflected.
Interjections • Interjections are identified from the lexicon and stored to produce the tag. Conjunctions • Conjunctions are identified from the lexicon and stored to produce the tag. • Morphology also plays a role in the case of conjunctions.
Conjunctions (continued) • When some of Marathi postpositions are attached to a pair of demonstrative pronouns, they produce a pair of conjunctions in some instances. जो – ज्यापासून (jo – jyaapaasuna --- which – from which) तो – त्यापासून (to – tyaapaasuna --- that – from that) ज्यापासून काल सुरुवात केली, त्यापासून आज नक्कीच सुरुवात करायला नको. – jyaapaasuna kaala suruvaata keli, tyaapaasun aaja nakkicha suruvaata karaayalaa nako_One should not start from the (same point) from which it was started yesterday.
Pronouns • Number of inflected forms of a pronoun and the rules describing such inflection are almost equal in number. • Number of pronouns and their respective inflected forms are finite and less when compared to verbs and nouns. • All inflected forms of the pronouns will be stored to produce the tag for pronoun. • Derivational morphology of pronoun is handled with rules.
Pronouns (continued) Inflectional forms of pronouns act either as adjectives (माझा– maajhaa_my) or as adverbs (मला – malaa_to me ) or as conjunctions (जो–ज्यापासून(jo – jyaapaasuna --- which – from which) तो – त्यापासून (to – tyaapaasuna --- that – from that)).
Pronouns (continued) • All together 29 pronouns have 526 inflectional forms, which are either words or stems. • 21 paradigms are identified generating several rules.
Adjectives • Adjectives are mainly, inflectional and non - inflectional. • Adjectives inflect for gender, number and attachment of postposition to the noun they modify. • Adjectives in Marathi agree in gender and number with the nouns they modify.
Adjectives (continued) • All inflectional adjectives belong to one paradigm, which corresponds to several rules for generating inflectional and derivational forms from an adjective. • Most of ‘aa’ ending adjectives agree with masculine nouns and further get inflected according to the gender and number of the noun they modify. (मोकळा / मोकळी / मोकळे / मोकळ्या_mokaLaa / mokaLi / mokaLe / mokaLyaa_empty) • There are some exceptions to this rule, such as, (जादा - jaada_extra,नाना – naanaa_different, वाया vaayaa_wasted).
Adverbs • Adverbs are mainly, inflectional and non - inflectional. • Adverbs inflect for attachment of postpositions. खाली – (khaali_under –-- खालपासून –khaalapaasuna _from the underneath)
Verbs and Nouns will be discussed in next sessions. Thank you. Veena Dixit 11/ 10 /2005