320 likes | 441 Views
Mental Representation and Access of Polymorphemic Words in Bangla: Evidence from Cross-modal Priming Experiments . Tirthankar Dasgupta , IIT Kharagpur Monojit Choudhury , Microsoft Research India Kalika Bali , Microsoft Research India Anupam Basu , IIT Kharagpur
E N D
Mental Representation and Access of Polymorphemic Words in Bangla: Evidence from Cross-modal Priming Experiments TirthankarDasgupta, IIT Kharagpur MonojitChoudhury, Microsoft Research India Kalika Bali , Microsoft Research India AnupamBasu , IIT Kharagpur iamtirthankar@gmail.com International Conference on Natural Language Processing( ICON 2010)
Introduction: The Mental Lexicon (ML) • Refers to the representation, organization and access of words in the human mind and • The various associations among them that help fast retrieval and comprehension
Facts about Adult Vocabulary • Native speakers can recognize a word of their language in 200ms or less • Can reject a non-word in about 500ms • Depending on definition of a word, an adult knows and uses 40000 to 150,000 words
The Nature of Organization • What words come up into your mind when I say “Bat” ? • Cat, Bath, Ball, Batsman, Sachin, …
Levels of Association Precise nature of this activation is unknown Phonology Semantics Orthography Morphology
Representation of Polymorphemic Words • Mental representation and access mechanisms of polymorphemic words • Unanalyzed full form or, • Decomposition into constituent morphemes
Current Models of Morphological Decomposition • Full Listing Model (Bradley, 1980; Butterworth, 1983) • Morphemic Model (Taft and Forster, 1975; Taft, 1981; MacKay, 1978) • Partial Decomposition Model (Stanners, 1979; Caramazza, 1988)
Full Listing Model • Polymorphic words are represented as a whole word:word words:words … computerisation computerisation search retrieve
Morphemic Model • Morphologically complex words are decomposed and represented in terms of the smaller morphemic units. Computerisation decompose compute er is tion understand compute er is tion synthesize computerisation
Full Listing vs Morphemic Model • Morphemic Model: • Easier to recognize a word when preceded by a morphologically similar word • ManlyMan • Manner Man • Full Listing: • Recognition does not depends on previous stimuli • ManlyMan • Manner Man Response time is Faster Response time is same for both the cases
Partial Decomposition Model • Different types of morphological forms processed differently. • Derived morphological forms are represented as a whole, • Representation of the inflected forms follows the morphemic model
Priming Results in increase in speed or accuracy of response to a target, based on the occurrence of a prior exposure of a prime • Thus, “Mother” (the Target) can be recognized faster when the subject has been recently exposed to “Motherhood” (the Prime) than to an unrelated word such as “House”
Hypothesis for testing the Morphemic Model • When a given target word is preceded by a morphologically related prime, then the time required to recognize the word (aka reaction time or RT) is significantly lower than if the prime is morphologically unrelated
Evidence from English (Marslen-Wilson et al., 1994) • Clear evidence of morphological decomposition • Stronger Priming is observed for morphologically related pairs • Weak priming effects (due to phonological or other associations) observed for morphologically unrelated pairs
Motivation • Existing attempts on organization and access of polymorphemic words focused primarily on English • No such investigation for Indian languages • Morphologically richer than many Indo-European languages
Morphological Processes in Bangla • Rich and productive morphology • Suffix stacking • chheleTAkei chhele + TA+ ke+ I • Rich derivational morphology • Inherited from Sanskrit, Persian and English • Abundance of compounding
Objective • We want to investigate whether the full/partial decomposition model holds for Bangla mental lexicon • Therefore we want to test the following hypothesis: “When a given target word is preceded by a morphologically related prime, then the reaction time (RT) is significantly lower than if the prime is morphologically unrelated”
Experiment Design • Cross modal priming • Subject hears a word (the prime) • Immediately at the offset of this word sees a visual probe (the target) • Morphologically and/or Phonologically related • Subject has to make a lexical decision: whether it is a valid Bangla word or not • Response time is recorded
Example nibAsi (Resident) Audio Input (Control) sAtAru(swimmer) Audio Input (Prime) nibAsa (Residence) nibAsa (Residence) Visual Probe (Target) Visual Probe (Target) 200ms Valid Invalid Valid Invalid Lexical Decision Response Time (RT) Response Time (RT) Comparison of RT
Our Experiment: Materials • 84 Prime-Target pairs • Each class contain 28 prime-target pairs
Controls • For each prime words, we select a control word that matched the prime in frequency, and number of syllable • None of the controls are morphologically related to the prime • The priming effect is measured by comparing RT to the target words
Fillers • Non words • No patterns • Deter subjects from developing strategies • Expectation about relation between prime and target • Dilute the proportion of related items • Obscure regularities in test items
Subjects • 14 highly educated (Graduate or Post-graduate) native speakers • Age varies between 22 to 35 • 4 males, 10 females
Average Reaction Time (RT) Average RT for the word classes and the p-values
Sign Test Analysis of Word Pairs • Subjects unable to recognize morphological connection between certain derivationally suffixed pairs in [M+P-] class. • suhRRida (friend) – souhArdya (friendship), uchit (appropriate) – auchitya (appropriateness) and hatyA (murder) – hi.nsA (violence) Classification of the Sign Test Values According to Classes and Sign Score Ranges
Error Rates • Incorrect judgment about validity of a word or a wrong selection made despite of a correct judgment • Error rates and RT for non-words are higher than valid words
Errors Across Classes • Maximum errors are made for fillers. • Among the valid words, the highest error rates are observed for the class [M- P+]
Targets with High Reaction Time • RT for some targets were consistently high. • Due to • Low frequency • Length in terms of no. of characters (> 5) • Presence of certain opaque or infrequent conjugates such as • ষ্ট (Sh+T), • ল্প (l+p) • ঙ্গ (~N+g), • গু (g + u) • হৃ (h+Rri)
Conclusion • Initial experimental results clearly indicate full/partial decomposition of polymorphemic Bangla words • Decomposition is not observed for certain suffixes • May be due to etymology and distribution • Orthography plays an important role in word recognition, which needs to be examined separately
Applications • Lexicon design and Morphological Analysis • Which words should be listed in lexicon and which processes modeled in the Morph Analyzer? • Readability Analysis of Text • Infrequent words might be highly readable if underlying morphemes are frequent and the morphological process is internalized by the speakers • Ex. Professorgiri, lAbanyatama • Pedagogical Applications • How to teach (second) language • People with learning disability