250 likes | 389 Views
Applying Embodied Construction Grammar:. a description of some Afrikaans morphological constructions. Gerhard B van Huyssteen Potchefstroom University for CHE South Africa Acknowledgement: Sulené Pilon. ICLC 2003. Overview. HLT and CL in South Africa
E N D
Applying Embodied Construction Grammar: a description of some Afrikaans morphological constructions Gerhard B van Huyssteen Potchefstroom University for CHE South Africa Acknowledgement: Sulené Pilon ICLC 2003
Overview • HLT and CL in South Africa • Project: Automatic Morphological Analysis of Afrikaans • Requirements of a Formalism • Two Afrikaans Constructions • Plural Construction • Nominalising Construction • Concluding remarks ICLC 2003
HLT in South Africa • CL and NLP: • well-established research fields in USA, Europe, and other parts of the world • unexplored territory in South Africa • no catholic HLT projects for many years • Since 2000: • awareness of importance of HLT • governmental level – advisory committee of DACST (2002) • academic level – new projects & programmes ICLC 2003
CL at the PUCHE • Since 2001- prioritised CL as strategically important • establish research focus area “Language and Technology” • establish first complete graduate study programme in CL in South Africa • set up dedicated HLT laboratory • acquire text and speech corpora for: • Afrikaans • South African English • Setswana • Two related Afrikaans projects: • Spelling Checker project (funded by University) • Automatic Morphological Analysis of Afrikaans project (funded by NRF) ICLC 2003
AMAA project • Aim: to develop efficient, reusable modules for the automatic morphological analysis of Afrikaans • tokeniser –hyphenator • word segmenter – POS tagger • compound analyser –stemmer • Project team includes 4 linguists, 1 computational linguist (from University of Tilburg, Netherlands), 2 computer scientists • Problem: communication between: • different disciplines • different languages ICLC 2003
In Search of a Formalism • A formalism is a set of features used to precisely and rigorously interpret linguistic analysis (i.e. rules, principles, conditions, etc.) in logical or mathematical terms, in order to develop a calculus (cf. Crystal, 1997: 156) • Looking for: • a formal rule system (i.e. formal grammar or formalism) • for declarative purposes • not for more procedural purposes (like parsing and generation) • to represent Afrikaans morphological structure • not particularly interested in syntax, semantics, pragmatics ICLC 2003
Requirements: Formalisms • Accessibility • Transparent • Supported by literature • Efficiency • Linguistically efficient • Must be able to capture all linguistic phenomena accurately • Computationally efficient • To be implemented in a computer environment • Flexibility • Describe language structure with ease • Represent the underlying linguistic theory • Reusability • apply in different environments and applications ICLC 2003
Some specific requirements • Must represent regexp’s • developing a rule-based stemmer, using PERL • Must rank the rules • exceptions (i.e. low-level instantiations) are ranked higher than rules (i.e. schemas) • “longer” rules are ranked higher than “shorter” rules • DIM construction: -tjie is removed before –jie paaltjie hondjie • Must be compatible with CG ICLC 2003
Procedure • Identify main morphological processes • Inflection • Derivation • Compounding • Identify constructions • PLURAL construction • PAST construction • NOMINALISING construction • REDUPLICATION construction • Draw categorisation networks • Translate into ECG • Implement in stemmer ICLC 2003
Afrikaans Plural Construction • Inflectional process, realised by means of suffixation • 2 prototypical constructions: • -e: hond – honde [dogs]; bal – balle [balls] • -s: venster – vensters [windows]; tafel – tafels [tables] • Elaborations of the general schema • ’e: 3 – 3’e [3’s] • ’s: ma – ma’s [mothers] • Extensions of the general schema • -a: datum – data ICLC 2003
Categorisation Network GB van Huyssteen (PUCHE) ICLC 2003 ICLC 2003
PLURAL construction I construction SUFFIXATION subclass of AFFIXATION constructional constituents root suffix constraints constituency : [rootm/rootf] [[suffixm/suffixf]] form constraints rootfmeets suffixf suffixf .dependency dependent rootf .dependency autonomous | dependent meaning constraints profile-det suffix ICLC 2003
PLURAL construction II construction PLURAL subclass of SUFFIXATION constructional evokes INFLECTION constituents root : NOUN-SG; LET; NUM; ABBR suffix : PLURAL-SUF constraints rootm.scope-of-pred BOUNDED-REGION suffixm.scope-of-pred UNBOUNDED-REGION form meaning constraints scope-of-pred UNBOUNDED-REGION ICLC 2003
PLURAL construction III construction PLURAL-s subclass of PLURAL constructional constituents root : NOUN-SG-CN suffix : s constraints rootf: /^($C)?$V($C)$V[a-z]*$/ suffixf: /s/ rootm.profile THING ranking : 16 form constraints s /^($C)?$V($C)$V[a-z]*$/^($C)?$V($C)$V[a-z]*s$/ meaning constraints profile THING ICLC 2003
PLURAL construction IV construction PLURAL-’s subclass of PLURAL-s constructional constituents root : NOUN-SG-PROPER; NOUN-SG-CN; LETT; NUM; ABBR suffix : ’s constraints rootf : /%PROPN($V)$/ /%CN([iouá])$/ /^([a-z][^lmnrsxz])$/ /^([1-9]+[^123456])$/ /^%ABBR($V)$/ rootm.profile THING | SAR suffixf : /’s/ ranking : 13 form constraints s /%PROPN($V)$/%PROPN($V)’s$/ s /%CN($V)$/%CN($V)’s$/ s /^(/[a-z][^lmnrsxz]/)$/^([a-z][^lmnrsxz]’s)$/ s /^([1-9]+[^123456])$/^([1-9]+[^123456])’s$/ s /%ABBR($V)$/%ABBR($V)’s$/ meaning constraints profile THING ICLC 2003
PLURAL construction V construction PLURAL-specified subclass of PLURAL constructional constituents root: pad sambreel hemp seun bod Aardklop (l|spr)eeu man (m)?eeu vrou voël kasteel bal oom suffix: PLURAL-SUF constraints ranking : 1 form constraints s/pad/paaie/ s/sambreel/sambrele/ s/hemp/hemde/ s/seun/seuns/ s/bod/botte/ s/Aardklop/(Aardkloppe|Aardklops) s/(l|spr)eeu/(l|spr)eeus/ s/man/(manne|mans) s/(m)?eeu/(m)?eeue/ s/vrou/(vroue|vrouens) s/voël/(voëls|voële) s/kasteel/kastele/ s/bal/(balle|ballas) s/oom/ooms/ meaning constraints profile THING ICLC 2003
Categorisation Network GB van Huyssteen (PUCHE) ICLC 2003 ICLC 2003
NOMINALISING construction I construction NOMINALISING subclass of AFFIXATION constructional evokes DERIVATION constituents root : VERB|ADJ|ADV affix : NOM-PREFIX|NOM-SUFFIX|NOM-CIRCUMFIX constraints rootm.profile PROCESS|SAR|CAR affixm.profile THING form meaning constraints profile THING ICLC 2003
NOMINALISING construction II construction NOMINALISING-ge()[+$C]ery subclass of NOMINALISING-ge()ery constructional constituents root : VERB circumfix : ge()ery constraints rootf: /%VERB([áéíóú]$C$/ rootm.profile PROCESS circumfixf: /ge()[+$C]ery/ ranking : 1 form constraints s/%VERB([áéíóú]$C$/ge(%VERB)([áéíóú]$C$Cery$/ meaning constraints ICLC 2003
NOMINALISING construction III construction NOMINALISING-[-$V]$Cing subclass of NOMINALISING-ing constructional constituents root : VERB suffix : ing constraints rootf : /%VERB($V$V$C)/ rootm.profile PROCESS suffixf : /[-$V]$Cing/ ranking : 10 form constraints s/%VERB($V$V$C)/%VERB($V$C)ing/ meaning constraints ICLC 2003
NOMINALISING construction IV construction NOMINALISING-er subclass of NOMINALISING-SUF constructional constituents root : VERB suffix : er constraints rootf: /^(%VERB)$/ rootm.profile PROCESS suffixf: /er/ ranking : 12 form constraints s/^(%VERB)$/^(%VERB)er$/ meaning constraints attr +HUMAN ICLC 2003
Summary of adaptations • Our adaptations provided for our needs • added regexp’s as form constraints • added ranking as constructional constraints • added attributes as meaning constraints • added more CG concepts/constructs: • profile • valence factors: • profile determinacy • conceptual and phonological autonomy and dependency • constituency • ¿correspondence? • Make it therefore more accessible for us ICLC 2003
Evaluation: ECG as a Declarative Formalism • Accessible? • very little ECG material (specifically on morphology) available • isolated – “do whatever we want to do…” • Efficient • Linguistically efficient? • handled our data beautifully • Computationally efficient? • not our primary concern • improved communication with computational linguist and computer scientists • Flexibility • represents essence of Cognitive Linguistics beautifully • easy to add features/adaptations • Reusable? • not our primary concern • Main Advantage: • compatibility with Cognitive Grammar ICLC 2003
Conclusion • Your conclusion: • What are we doing wrong? • What are we missing? • Are we “abusing” ECG? ICLC 2003