150 likes | 314 Views
Mohd Yunus Sharum , Muhammad Taufik Abdullah, Md Nasir Sulaiman , Masrah Azrifah Azmi Murad & Zaitul Azma Zainon Hamzah. malim – a new computational approach of malay morphology. Ainun Najwa Bt Aziz P61811 Fatimah Zawani Bt Abdullah P61028
E N D
MohdYunusSharum, Muhammad Taufik Abdullah, MdNasirSulaiman, MasrahAzrifahAzmiMurad & ZaitulAzmaZainonHamzah malim – a new computational approach of malay morphology AinunNajwa Bt Aziz P61811 Fatimah Zawani Bt Abdullah P61028 MohdRashidie B. Ramli P62451
INTRODUCTION • A major problem in Malay morphological processing is in analysis. • Existing model : finite-state, two-level formalism. • Hypothesis : higher accuracy of morphological analysis can be achieved by widening the decision-selection domain. • Implements MALIM approach using S-A-P-I.
MALAY MORPHOLOGY • Basic target of S-A-P-I is to analyze affixation, especially multiple affixations. • Affixation could be one or several of these processes (prefixation, suffixation, circumfixation and infixation). • 3 basic categories of Malay reduplication: • Full reduplication • Partial reduplication • Rhythmic reduplication
THE S-A-P-I APPROACH • Use the divide-and-conquer technique to handle Malay morphological analysis. • S-A-P-I (‘search-all-pick-if…) algorithm. • Advantage : we can search for most appropriate result, since we had gathered all possible options from the decision-selection domain. • Side-effect : multiple outputs due to ambiguity. • 2 technique to improve the analysis’ results (separating and filtering).
MALIM – MORPHOLOGICAL ANALYZER FOR LINGUISTIC INDECISION OF MALAY • A morphological analyzer which implements the S-A-P-I approach. • Developed with Perl. • Characteristic of Perl : • Support regular expression, a notation which describes regular language. • Capability of supporting lexical processing. • MALIM contains a basic set but comprehensive root lexicon as reference (root lexicon: 5710 root words).
MALIM – MORPHOLOGICAL ANALYZER FOR LINGUISTIC INDECISION OF MALAY • MALIM contains a set of 80 morphosyntatic rules. • Limitations in implementation: • Do not includes infixation analysis. • Do not includes analysis on complex affixation/reduplication. • Do not analyze rhythmic and free reduplication. • Limited in analyzing affixation / reduplication of compound word and phrase. • Overcome the limitation : use a strategy resembling direct mapping approach.
Method Experiment • Types of experiment : • Testing processing model (S-A-P-I) • Splitting lexicon (of mono-syllabic and multi-syllabic) • Morphosyntactic rule filtering • First syllabic reduplication analysis • Clitics/particles extraction • The effects of ‘cheat-list’ (direct mapping)
Method Experiment • Experiment setting : • Set 1 : MALIM (complete) • Set 2 : MALIM without lexicon splits • Set 3 : MALIM without morphosyntactic rule filtering • Set 4 : MALIM without first syllabic reduplication analysis • Set 5 : MALIM without clitics/particles extraction • Set 6 : MALIM without ‘cheat list’ • Set X : MALIM with basic capabilities (fullfills all Set 2 to Set 6) – use as control set
contribution • Introducing a new and more accurate approach of morphological analysis using S-A-P-I • Solved most of morphological problems involving Malay morphology, except involving multi-words (or compound word) and certain reduplicated words
Conclusion • MALIM only uses controlled sample data which is not from daily life usage. • Thus, this may not pose the real challenge as solving the real world problems. • So, in future, we may perform a test-run using real-life data such as from corpus to verify the performance.