640 likes | 790 Views
Linguistics 187 Week 4. Ambiguity and Robustness. Language has pervasive ambiguity. Tokenization. Entailment. Discourse. Morphology. Syntax. Semantics. Bill fell. John kicked him. because or after?. John didn’t wait to go. now or never?. Every man loves a woman.
E N D
Linguistics 187 Week 4 Ambiguity and Robustness
Language has pervasive ambiguity Tokenization Entailment Discourse Morphology Syntax Semantics • Bill fell. John kicked him. because or after? • John didn’t wait to go.now or never? • Every man loves a woman. The same woman or each their own? • John told Tom he had to go.Who had to go? • The duck is ready to eat. Cooked or hungry? • walk untieable knot bank? Noun or Verb(untie)able or un(tieable)? river or financial? • I like Jan. |Jan|.| or |Jan.|.| (sentence end or abbreviation)
Ambiguity • Syntactically legitimate ambiguity (vs. spurious ambiguity: “boys and girls” & pushup) • Sources: • Alternative c-structure rules • Disjunctions in f-structure description • Lexical categories • XLE’s display/computation of ambiguity • Dealing with ambiguity • Recognize legitimate ambiguity • OT marks for preferences (later in the course) • Stochastic disambiguation
Syntactic Ambiguity • Lexical • part of speech • subcategorization frames • Syntactic • attachments • coordination • Implemented system highlights interactions
Lexical Ambiguity: POS • verb-noun I saw her duck. I saw [NP her duck]. I saw [NP her] [VP duck]. • noun-adjective the [N/A mean] rule that child is [A mean]. he calculated the [N mean].
Morphology and POS ambiguity • English has impoverished morphology and hence extreme POS ambiguity • leaves: leave +Verb +Pres +3sg leaf +Noun +Pl leave +Noun +Pl • will: +Noun +Sg; +Aux; +Verb +base • Even languages with extensive morphology have ambiguities
Lexical ambiguity: Subcat frames • Words often have more than one subcategorization frame • transitive/intransitive I broke it./It broke. • intransitive/oblique He went./He went to London. • transitive/transitive with infinitive I want it./I want it to leave.
Subcat-Rule interactions • OBL vs. ADJUNCT with intransitive/oblique • He went to London. [ PRED ‘go<(^ SUBJ)(^ OBL)>’ SUBJ [PRED ‘he’] OBL [PRED ‘to<(^ OBJ)>’ OBJ [ PRED ‘London’]]] [ PRED ‘go<(^ SUBJ)>’ SUBJ [PRED ‘he’] ADJUNCT { [PRED ‘to<(^ OBJ)>’ OBJ [ PRED ‘London’]]}]
OBL-ADJUNCT cont. • Passive by phrase • It was eaten by the boys. [ PRED ‘eat<(^ OBL-AG)(^ SUBJ)>’ SUBJ [PRED ‘it’] OBL-AG [PRED ‘by<(^ OBJ)>’ OBJ [PRED ‘boy’]]] • It was eaten by the window. [ PRED ‘eat<NULL(^ SUBJ)>’ SUBJ [PRED ‘it’] ADJUNCT { [PRED ‘by<(^ OBJ)>’ OBJ [PRED ‘boy’]]}]
XCOMP-ADJUNCT • to infinitives can be arguments or adjuncts (purpose clauses) • I want her to leave. [ PRED ‘want<(^ SUBJ)(^ XCOMP)>(^ OBJ)’ SUBJ [ PRED ‘I’ ] OBJ [ PRED ‘her’ ]1 XCOMP [ PRED ‘leave<(^ SUBJ)>’ SUBJ [ 1 ] ] ]
XCOMP-ADJUNCT cont. • I want money to buy that. [ PRED ‘want<(^ SUBJ)(^ OBJ)>’ SUBJ [ PRED ‘I’ ] OBJ [ PRED ‘money’ ] ADJUNCT { [ PRED ‘buy<(^ SUBJ)(^ OBJ)>’ SUBJ [ PRED ‘pro’ ] OBJ [ PRED ‘that’ ] ] } ] • But both sentences get both analyses • The syntax does not have world knowledge
OBJ-TH and Noun-Noun compounds • Many OBJ-TH verbs are also transitive • I took the cake. I took Mary the cake. • The grammar needs a rule for noun-noun compounds • the tractor trailer, a grammar rule • These can interact • I took the grammar rules • I took [NP the grammar rules] • I took [NP the grammar] [NP rules]
Syntactic Ambiguities • Even without lexical ambiguity, there is legitimate syntactic ambiguity • PP attachment • Coordination • Want to: • constrain these to legitimate cases • make sure they are processed efficiently
PP Attachment • PP adjuncts can attach to VPs and NPs • Strings of PPs in the VP are ambiguous • I see the girl with the telescope. I see [the girl with the telescope]. I see [the girl] [with the telescope]. • This ambiguity is reflected in: • the c-structure (constituency) • the f-structure (ADJUNCT attachment)
PP attachment cont. • This ambiguity multiplies with more PPs • I saw the girl with the telescope • I saw the girl with the telescope in the garden • I saw the girl with the telescope in the garden on the lawn • The syntax has no way to determine the attachment, even if humans can.
Ambiguity in coordination • Vacuous ambiguity of non-branching trees • this can be avoided (pushup) • Legitimate ambiguity • old men and women old [N men and women] [NP old men ] and [NP women ] • I turned and pushed the cart I [V turned and pushed ] the cart I [VP turned ] and [VP pushed the cart ]
Grammar Engineering and ambiguity • Large-scale grammars will have lexical and syntactic ambiguities • With real data they will interact, resulting in many parses • these parses are (syntactically) legitimate • they are not intuitive to humans (but more plausible words can make them better) • XLE provides tools to manage ambiguity • grammar writer interfaces • computation
XLE display • Four windows • c-structure (top left) • f-structure (bottom left) • packed f-structure (top right) • choice space (bottom right) • C-structure and f-structure “next” buttons • Other two windows are packed representations of all the parses • clicking on a choice will display that choice in the left windows
Example • I see the girl in the garden • PP attachment ambiguity • both ADJUNCTS • difference in ADJUNCT-TYPE
Sorting through the analyses • “Next” button on c-structure and then f-structure windows • impractical with many choices • independent vs. interacting ambiguities • hard to detect spurious ambiguity • The packed representations show all the analyses at once • (in)dependence more visible • click on choice to view • spurious ambiguities appear as blank choices • but legitimate ambiguities may also do so
Ambiguity Demo • eng-week4-demo.lfg • eng-week4-demo-test.lfg • Attachment • the girl ate the banana with the monkey • Subcategorization • the girl thought about the banana • Feature • the sheep laughed • All three (2 c-structures; 8 analyses) • the girl thought about the banana with the monkey
Options multiplied out The sheep-sg liked the fish-sg. The sheep-pl liked the fish-sg. The sheep-sg liked the fish-pl. The sheep-pl liked the fish-pl. Options packed sgpl sgpl The sheep liked the fish XLE Ambiguity Management How many sheep? How many fish? The sheep liked the fish. Packed representation is a “free choice” system • Encodes all dependencies without loss of information • Common items represented, computed once • Key to practical efficiency
nomacc nomacc Das Mädchen sah die Katze The girl saw the cat Das Mädchen-nom sah die Katze-nom Das Mädchen-nom sah die Katze-acc Das Mädchen-acc sah die Katze-nom Das Mädchen-acc sah die Katze-acc Dependent choices … but it’s wrong It doesn’t encode all dependencies, choices are not free. Again, packing avoids duplication bad The girl saw the cat The cat saw the girl bad Who do you want to succeed? I want to succeed John want intrans, succeed trans I want John to succeed want trans, succeed intrans
bad The girl saw the cat The cat saw the girl bad (pq) (pq) = Das Mädchen-nom sah die Katze-nom Das Mädchen-nom sah die Katze-acc Das Mädchen-acc sah die Katze-nom Das Mädchen-acc sah die Katze-acc p:nomp:acc q:nomq:acc Das Mädchen sah die Katze Solution: Label dependent choices • Label each choice with distinct Boolean variables p, q, etc. • Record acceptable combinations as a Boolean expression • Each analysis corresponds to a satisfying truth-value assignment • (free choice from the true lines of ’s truth table)
Ambiguity and Robustness • Large-scale grammars are massively ambiguous • Grammars parsing real text need to be robust • "loosening" rules to allow robustness increases ambiguity even more • Need a way to control the ambiguity • version of Optimality Theory (OT)
Theoretical OT • Grammar has a set of violable constraints • Constraints are ranked by each language • This gives cross-linguistic variation • Candidates (analyses) compete • John waited for Mary. vs. John waited for 3 hours. • Constraint ranking determines winning candidate • Issues for XLE • Candidates can be very ungrammatical • we have a grammar to produce grammatical analyses • even with robust, ungrammatical analyses, these are controlled • Generation, not parsing direction • we know what the string is already • for generation we have a very specified analysis
XLE OT • Incorporate idea of ranking and (dis)preference • Filter syntactic and lexical ambiguity • Reconcile robustness and accuracy • Allow parsing grammar to be used for generation
XLE OT Implementation • OT marks in • grammar rules • templates • lexical entries • CONFIG states • preference vs. dispreference • ranking • parsing vs. generation orders
The o:: projection • OT marks are not f-structure features • OT marks are in their own projection f-structure c-structure o-structure (set of OT marks)
The o:: projection • The o-structure is just a set of marks { PPadj GuessedN } • Instead of ^ and !, have o::* (NB: !f::*) PP: (^ ADJUNCT)=! PPadj $ o::* ; • the f-structure is exactly the same • there is now an additional o-structure
Importance Ranking analyses • Specify relative importance of OT marks in the CONFIG OPTIMALITYORDER Mark3 Mark2 +Mark1. • Comparing analyses • Find most important mark where the analyses differ • Prefer the analysis with the • Least number of dispreference marks (no +) • Most number of preference marks (+)
Importance • OPTIMALITYORDER Mark3 Mark2 +Mark1. Ranking analyses (continued) • an analysis with Mark2 is preferred over an analysis with Mark3 • an analysis with no mark is preferred over an analysis with Mark2 or Mark3 • an analysis with one Mark2 is preferred over one with two Mark2 • an analysis with Mark1 is preferred over an analysis with no mark • an analysis with two Mark1 is preferred over an analysis with one Mark1
Difference with Theoretical OT • Theoretical OT: only dispreference marks • XLE OT: • dispreference marks: Mark1 • preference marks: +Mark1 • NOTE: + is only indicated in the CONFIG only the name (Mark1) appears in the grammar • Deciding which to use can be difficult
Example: PP ambiguities • John waited for Mary. • John waited for 3 hours. • Rule with OT marks Using template OT(_mark)=_mark $ o::*. VP --> V (NP: (^ OBJ)=!) PP*: { (^ OBL)=! @(OTPPobl) |! $ (^ ADJUNCT) @(OTPPadj)}.
Basic Structures John waited for Mary f-str: [ PRED 'wait<SUBJ>' SUBJ [ PRED 'John'] ADJ {[ PRED 'for<OBJ>' OBJ [ PRED 'Mary' ]]}] o-str: { PPadj } John waited for Mary f-str: [ PRED 'wait<SUBJ OBL>' SUBJ [ PRED 'John'] OBL [ PRED 'for<OBJ>' OBJ [ PRED 'Mary' ]]] o-str: { PPobl }
Ranking for Example • Disprefer ADJUNCTs • OPTIMALITYORDERPPadj. • Problem: will disprefer adjuncts even when no OBL analysis is possible • Prefer OBLs • OPTIMALITYORDER+PPobl. • Problem: will prefer OBL even when the other analysis was not an ADJUNCT • Still probably better than dispreferring ADJUNCTs • Solution: local OT marks (not discussed here)
Special OT marks in XLE • Separate other marks into fields • Marks preceding • NOGOOD: remove parts of the grammar for debugging or specializing • STOPPOINT: apply on a second pass for extending grammar on failure • CSTRUCTURE: filter when the c-structure is built for speed • There is lots of discussion in the XLE documentation; the reading on the web is a bit out of date for these marks
The NOGOOD Mark • OT marks can be used to remove parts of the grammar • rules or rule parts • templates or template parts • lexical items or parts of them • Use for • grammar adaptation/sharing • grammar development • Example • OPTIMALITYORDER FrontMatter NOGOOD.
NOGOOD Example • ROOT rule allows for front matter for special corpus ROOT --> (FR-MAT: (^ ID)=! @(OT FrontMatter)) S. FR-MAT --> NUMBER (PERIOD). • 1. The light flashes.
FR-MAT • Grammars for corpora with front matter will not rank the OT mark FrontMatter (unranked marks are neutral) • Grammars for corpora without front matter will make the OT mark a NOGOOD OPTIMALITYORDER FrontMatter NOGOOD. Effective ROOT rule: ROOT --> S. • Allows rule sharing across grammars • Can also be used for debugging
Robustness • What to do if the grammar doesn't provide an analysis? • Graceful failure • FRAGMENTs • Specific relaxations • Ungrammatical analysis only if no grammatical one • Avoid ungrammatical analyses in generation
Robustness: STOPPOINT • On first pass, STOPPOINT is treated as NOGOOD Small, fast grammar for standard constructions • If first pass fails, ignore STOPPOINT and extend grammar • Relaxation possibilities precede STOPPOINT • OPTIMALITYORDER BadDetNAgr STOPPOINT.
STOPPOINT Mark example • Example: NP: this boy NP: this boys • Template call with OT mark: DEMON(_P _N) = (^ SPEC PRED)='_P' { (^ NUM)=c _N |(^ NUM)~= _N @(OT BadDetNAgr)}. • Lexical entry: this DET XLE @(DEMON %stem sg). • Ranking OPTIMALITYORDER BadDetNAgr STOPPOINT.
Structures for STOPOINT example NP: this boys f-str [ PRED 'boy' NUM pl SPEC [ PRED 'this' ]] o-str { BadDetNAgr } NP: this boy f-str [ PRED 'boy' NUM sg SPEC [ PRED 'this' ]] o-str • Parsing this boys will be slow: the grammar has to parse a second time • But the ungrammatical input gets a parse • Only put OT marks behind the STOPPOINT if they will be rarely triggered
Preference marks and STOPPOINT • Preference marks behind the STOPPOINT are tried first (counter to intuitition) • OPTIMALITYORDER +MWE STOPPOINT. • Use MWE readings if at all possible • If fail, do a second pass with the analytic (non-MWE) structure (inefficient if fail) • Example: print` qualityN * @(NOUN %STEM) @(OT MWE). The [N print quality] is excellent. I want to [V print] [NP quality documents].
CSTRUCTURE Marks • Apply marks before f-structure constraints are processed • OPTIMALITYORDER NoCloseQuote Guessed CSTRUCTURE. • Improve performance by filtering early • May loose some analyses • coverage/efficiency tradeoff
CSTRUCTURE example: Guessed • Only use guessed form if another form is not found in the morphology/lexicon • OPTIMALITYORDER Guessed CSTRUCTURE. • Trade-off: lose some parses, but much faster The foobar is good. no entry for foobar ==> parse with guessed N The audio is good. audio: only A in morphology ==> no parse
CSTRUCTURE example: Quote • Only allow unbalanced quote marks if there is no other quote mark Then I left." vs. He said, "they appeared." • METARULEMACRO: … _CAT QT: @(OT NoCloseQt); … • XLE only tries balanced version, not double unbalanced version • failure when really needed two unbalanced quotes
Combining the OT marks • All the types of OT marks can be used in one grammar • ordering of NOGOOD, CSTRUCTURE, STOPPOINT are important • Example OPTIMALITYORDER Verbmobil NOGOOD Guessed CSTRUCTURE +MWE Fragment STOPPOINT RareForm StrandedP +Obl.