1 / 19

Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology. Miriam Butt (University of Konstanz) and Martin Forst ( NetBase Solutions). Colombo 2014. Coordination. Every attribute can only have one value

Download Presentation

Grammar Engineering: Coordination and Macros METARULEMACRO Interfacing finite-state morphology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grammar Engineering:Coordinationand MacrosMETARULEMACROInterfacing finite-state morphology Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014

  2. Coordination • Every attribute can only have one value • So what do we do with coordinated constituents? Example: gorillas sleep and eat VP --> { … | VP: ! $ ^ CONJ VP: ! $ ^ }.

  3. Coordination (cont’d) • Coordination can happen basically at any level of the c-structure. Example: the gorillas peel and eat the bananas V --> { … | V: ! $ ^ CONJ V: ! $ ^ }.

  4. Coordination (cont’d) Basically any category can be coordinated. Example: the gorillas eat the bananas in the cage and in the garden PP --> { … | PP: ! $ ^ CONJ PP: ! $ ^ }.

  5. Coordination (cont’d) How can we capture these generalizations? Via regular-expression macros! SC-COORD(CAT) = CAT: ! $ ^; CONJ CAT: ! $ ^. PP --> { ... | @(SC-COORD PP) }.

  6. Nominal coordination • NP, N, etc. coordination is special because the NUM attribute should typically have the value pl even when the individual set members are in the sg. Examples: Mary and the gorilla like bananas. The boys and girls like bananas.

  7. Nominal coordination (cont’d) NP-COORD(CAT) = CAT: ! $ ^; CONJ: ^ = ! (^ NUM) = pl; CAT: ! $ ^. NP --> { ... | @(NP-COORD NP) }. N --> { ... | @(NP-COORD N) }.

  8. METARULEMACRO • Macros are nice • But can‘t we do better? • After all, it‘s pretty tedious to go into almost all rules and invoke either the SC-COORD or the NP-COORD macro • XLE has a special macro called the METARULEMACRO • Every rule goes through the METARULEMACRO unless specified otherwise

  9. METARULEMACRO (cont’d) • Takes three arguments: _CAT, _BASECAT, and _RHS • _CAT is the category on the left-hand side of the rule • _BASECAT is the same as _CAT unless you are dealing with a complex-category rule • _RHS is the right-hand side of the rule

  10. METARULEMACRO (cont’d) METARULEMACRO(_CAT _BASECAT _RHS)= { _RHS | e: _CAT $ { N NP }; @(NP-COORD _CAT) | e: _CAT ~$ { N NP }; @(SC-COORD _CAT) }.

  11. Interfacing finite-state transducers • Maintaining a full-form lexicon is tedious • Many lexicon entries look alike • Is there a way to get the information about the category of a word from somewhere, ideally along with information about morphosyntactic categories such as tense, mood, case, number, person, etc? • Finite-state morphologies!

  12. Interfacing finite-state transducers • Cascade of finite-state transducers used is specified in MORPHOLOGY section • At least two subsections: • TOKENIZE • ANALYZE • By default, the transducers listed are used both for parsing and for generation • This behavior can be altered by prefixing the names of transducer files with P! or G!

  13. Tokenization • So far, only white spaces are considered as token boundaries • However, there are more kinds of token boundaries in real-word text • Punctuation has to be split off the preceding token • Some white spaces should not be treated as token boundaries, e.g. “Sri Lanka” • Upper-case letters at sentence beginnings should optionally be lower-cased • A finite-state tokenizer takes care of these things

  14. Finite-state morphologies • Map surface forms to canonical form (lemma) and series of morphological tags Examples: rode ride +Verb +PastTense +123P rides ride +Verb +Pres +3sg ride +Noun +Pl children child +Noun +Pl

  15. Interfacing Finite-state Morphology • Morphological tags need to be listed in the lexicon • Sublexical lexicon entries look like regular lexicon entries • Difference: morphcode xle instead of * • Lemmas with non-predictable subcategorization frames must be listed in the lexicon • Other lemmas can be dealt with by the -unknown entry

  16. Lexicon entries for morphology output +Verb V-POS XLE . +PresTNS XLE @VPRES. +3sg PERS XLE @S-AGR. wait V-S XLE (^ PRED)= ‘wait<(^SUBJ)(^OBL)>’. -unknown A-S XLE @(PRED %stem); N-S XLE @(PRED %stem).

  17. Interfacing Finite-state Morphology • Morphology output needs to be parsed by sublexical rules • Look like regular rules • Have f-annotations like regular rules • Difference: Sublexical categories are marked with the suffix _BASE

  18. Interfacing Finite-state Morphology V --> V-S_BASE V-POS_BASE { TNS_BASE PERS_BASE | ASP_BASE }.

  19. XLE Lookup Model • Only one entry per headword per lexicon section • Same headword may be covered by an explicit entry and by -unknown entry • In order to allow this, we need to mark the explicit entry with ; ETC sleep V-S @(INTRANS sleep); ETC. -unknown N-S @(PRED %stem).

More Related