190 likes | 308 Views
Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology. Miriam Butt (University of Konstanz) and Martin Forst ( NetBase Solutions). Colombo 2014. Coordination. Every attribute can only have one value
E N D
Grammar Engineering:Coordinationand MacrosMETARULEMACROInterfacing finite-state morphology Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014
Coordination • Every attribute can only have one value • So what do we do with coordinated constituents? Example: gorillas sleep and eat VP --> { … | VP: ! $ ^ CONJ VP: ! $ ^ }.
Coordination (cont’d) • Coordination can happen basically at any level of the c-structure. Example: the gorillas peel and eat the bananas V --> { … | V: ! $ ^ CONJ V: ! $ ^ }.
Coordination (cont’d) Basically any category can be coordinated. Example: the gorillas eat the bananas in the cage and in the garden PP --> { … | PP: ! $ ^ CONJ PP: ! $ ^ }.
Coordination (cont’d) How can we capture these generalizations? Via regular-expression macros! SC-COORD(CAT) = CAT: ! $ ^; CONJ CAT: ! $ ^. PP --> { ... | @(SC-COORD PP) }.
Nominal coordination • NP, N, etc. coordination is special because the NUM attribute should typically have the value pl even when the individual set members are in the sg. Examples: Mary and the gorilla like bananas. The boys and girls like bananas.
Nominal coordination (cont’d) NP-COORD(CAT) = CAT: ! $ ^; CONJ: ^ = ! (^ NUM) = pl; CAT: ! $ ^. NP --> { ... | @(NP-COORD NP) }. N --> { ... | @(NP-COORD N) }.
METARULEMACRO • Macros are nice • But can‘t we do better? • After all, it‘s pretty tedious to go into almost all rules and invoke either the SC-COORD or the NP-COORD macro • XLE has a special macro called the METARULEMACRO • Every rule goes through the METARULEMACRO unless specified otherwise
METARULEMACRO (cont’d) • Takes three arguments: _CAT, _BASECAT, and _RHS • _CAT is the category on the left-hand side of the rule • _BASECAT is the same as _CAT unless you are dealing with a complex-category rule • _RHS is the right-hand side of the rule
METARULEMACRO (cont’d) METARULEMACRO(_CAT _BASECAT _RHS)= { _RHS | e: _CAT $ { N NP }; @(NP-COORD _CAT) | e: _CAT ~$ { N NP }; @(SC-COORD _CAT) }.
Interfacing finite-state transducers • Maintaining a full-form lexicon is tedious • Many lexicon entries look alike • Is there a way to get the information about the category of a word from somewhere, ideally along with information about morphosyntactic categories such as tense, mood, case, number, person, etc? • Finite-state morphologies!
Interfacing finite-state transducers • Cascade of finite-state transducers used is specified in MORPHOLOGY section • At least two subsections: • TOKENIZE • ANALYZE • By default, the transducers listed are used both for parsing and for generation • This behavior can be altered by prefixing the names of transducer files with P! or G!
Tokenization • So far, only white spaces are considered as token boundaries • However, there are more kinds of token boundaries in real-word text • Punctuation has to be split off the preceding token • Some white spaces should not be treated as token boundaries, e.g. “Sri Lanka” • Upper-case letters at sentence beginnings should optionally be lower-cased • A finite-state tokenizer takes care of these things
Finite-state morphologies • Map surface forms to canonical form (lemma) and series of morphological tags Examples: rode ride +Verb +PastTense +123P rides ride +Verb +Pres +3sg ride +Noun +Pl children child +Noun +Pl
Interfacing Finite-state Morphology • Morphological tags need to be listed in the lexicon • Sublexical lexicon entries look like regular lexicon entries • Difference: morphcode xle instead of * • Lemmas with non-predictable subcategorization frames must be listed in the lexicon • Other lemmas can be dealt with by the -unknown entry
Lexicon entries for morphology output +Verb V-POS XLE . +PresTNS XLE @VPRES. +3sg PERS XLE @S-AGR. wait V-S XLE (^ PRED)= ‘wait<(^SUBJ)(^OBL)>’. -unknown A-S XLE @(PRED %stem); N-S XLE @(PRED %stem).
Interfacing Finite-state Morphology • Morphology output needs to be parsed by sublexical rules • Look like regular rules • Have f-annotations like regular rules • Difference: Sublexical categories are marked with the suffix _BASE
Interfacing Finite-state Morphology V --> V-S_BASE V-POS_BASE { TNS_BASE PERS_BASE | ASP_BASE }.
XLE Lookup Model • Only one entry per headword per lexicon section • Same headword may be covered by an explicit entry and by -unknown entry • In order to allow this, we need to mark the explicit entry with ; ETC sleep V-S @(INTRANS sleep); ETC. -unknown N-S @(PRED %stem).