120 likes | 225 Views
Grammar Engineering: OT Marks for Parse R anking Generation. Miriam Butt (University of Konstanz) and Martin Forst ( NetBase Solutions). Colombo 2014. OT Marks. OT = Optimality Theory Classic OT only knows constraints, i.e. dispreferences.
E N D
Grammar Engineering:OT Marks for Parse RankingGeneration Miriam Butt (University of Konstanz) and Martin Forst (NetBase Solutions) Colombo 2014
OT Marks • OT = Optimality Theory • Classic OT only knows constraints, i.e. dispreferences. • OT as implemented in XLE uses both dispreference marks (default) as well as preference marks (prefixed with +) • Classic OT assumes a simple hierarchy of constraints • OT as implemented in XLE uses a “structured hierarchy”
OT Marks (cont’d) • OT marks can be introduced in lexicon entries and in rules • OT marks are projected to a separate projection, the o-structure • The o-structure (unlike the c- and the f-structure) is not really structured; just view it as a bag of OT marks OTMarkName $ o::*
OPTIMALITYORDER • Part of the grammar header • Can be modified for grammar customization • OPTIMALITYORDER is for parsing • GENOPTIMALITYORDER is for generation • OT marks can be organized into groups of equal rank OPTIMALITYORDER DisprefMark1 +PrefMark1 DisprefMark2 (DisprefMark3 DisprefMark4)
Ranking Parses with OT Marks • Start on the left of OPTIMALITYORDER • Keep parses with fewest instances of DisprefMark1; consider all others suboptimal • Among remaining parses, keep those with most instances of PrefMark1; consider all others suboptimal • Among remaining parses, keep those with fewest instances of DisprefMark2; consider all others suboptimal • Etc.
Special Marks in OPTIMALITYORDER • Without special marks in OPTIMALITYORDER all OT marks are used for ranking the parses after parsing proper has finished • Special marks can be introduced to make OT marks interact with parsing process • NOGOOD • CSTRUCTURE • STOPPOINT
NOGOOD OT Marks • If (part of) a lexicon entry or a rule projects an OT mark that is listed to the left of NOGOOD in OPTIMALITYORDER, that part of the grammar is deactivated. • Might be used for expensive constructions or particular readings of ambiguous lexical items which are known to be of no/little importance in the application domain.
CSTRUCTURE OT Marks • Intended for better performance • Resolving f-annotations is far more expensive computationally than determining possible c-structures • If we can discard certain c-structures early on, we do not even need to start resolving the associated f-annotations • Example: Guessed +MWE CSTRUCTURE
STOPPOINT OT Marks • Also intended for better performance • Only beneficial when used cautiously • (Parts of) lexical entries and rules marked with STOPPOINT OT marks are not used for first parsing attempt • If first attempt is unsuccessful, the parser activates those lexicon or rule parts and makes a second attempt • Example: Mark1 Mark2 STOPPOINT
Examples of Potential OT Marks • Prefer OBL interpretations of PPs over ADJUNCT interpretations The zookeeper waited for the gorilla. • Prefer ditransitive subcategorization frames over transitive ones The girl gave her brother money.
Generation • XLE can generate strings from well-formed f-structures. • GENOPTIMALITYORDER can be different from OPTIMALITYORDER, both wrt. OT marks used and wrt. their ranking • Transducers can also be different; typically, the generation tokenizer is more restrictive than the parsing tokenizer
Generation • For our purposes, we will parse the sentences from our exercises and regenerate. • Go to “Commands” menu of your f-structure window (bottom left) and select “Generate from this FS”