450 likes | 622 Views
Prepositional Phrase Attachment. Chris Brew. Ohio State University. Prepositional Phrase Attachment. Hindle and Rooth: partial parser to get statistics Collins and Brooks: back off estimation from tree bank data + attachment decision. Merlo,Crocker and Berthouzoz: multiple PP disambiguated
E N D
Prepositional Phrase Attachment Chris Brew Ohio State University
Prepositional Phrase Attachment • Hindle and Rooth: partial parser to get statistics • Collins and Brooks: back off estimation from tree bank data + attachment decision. • Merlo,Crocker and Berthouzoz: multiple PP disambiguated • Ratnaparkhi: entirely unsupervised
Hindle and Rooth • Whittemore, Ferrara and Brunner • Structural heuristics (Kimball’s Right Association, Frazier’s Minimal Attachment) account for only 55% of behaviour • Lexical preferences do much better • H and R • note that the preferences for this experiment were provided by human judgement • ask how to get automatically a good list of lexical preferences
Discovering Lexical Association in text • Church’s part of speech analyser • Hindle’s FIDDICH partial parser • 13 million words of AP news wire
Fiddich S ? NP AUX VP ? PP DART NBAR VPREZ VPPRT NP FIN TNS PREP NP the ADJ NPL are aimed . pro+ ? in DART NBAR ? ? radical PP the PNP ADV changes PP PREP NP PNP PNP evidently PREP NBAR at VP Union Soviet in N NPL NP VING CONJ NPL N regulations IART NBAR PP remedying export an ADJ N PREP NP and of NBAR extreme customs shortage N NPL consumer goods
Extract information about words ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING
What the table means • noun column has head noun of noun phrase (or various special cases) • verb column has head verb if noun phrase was its object • prep column has following preposition • Syntax column V- if no preceding verb
Counting attachments • Parser isn’t reliable, so use a decision procedure to assign nouns and verbs to noun-attach (na) and verb-attach (va)
No preposition • add a count for <noun,NULL> or <verb,NULL> ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING
Sure Verb Attach 1: • if the noun phrase head is a pronoun add a count for <verb,prep> ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING
Sure Verb Attach 2: • if the verb is passivized, verb attach unless preposition is “by” ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING
Sure Noun Attach • if no verb available, then noun attach ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING
Ambiguous Attach 1: • if LA score > 2.0 verb attach, < -2.0 noun attach. Use stats so far for calculating score. Repeat until stable. ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of<- maybe e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING
Ambiguous Attach 2: • Share counts between noun and verb ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of <- maybe e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING
Unsure Attach: • attach to noun by default ID Verb Noun Prep Syntax a change in -V b regulation c aim PRO-+ at d remedy shortage of e good in f DART-PNP g assuage citizen h scarcity of i item as j wiper k VING l VING
LA scores va: (send (soldier NULL) (into Afghanistan)) na: (send (soldier (into Afghanistan))) LA= log2(P(va p|v,n)/P(na p|v,n)) = log2(P(va into|send,soldier)/P(na into|send,soldier)) • and we approximate this using collected countsP(va into|send,soldier) ~ P(into|send)*P(NULL|soldier)P(na into|send,soldier) ~ P(into|soldier)
Estimating the counts P(into|send) = |send,into| / | send| = .049 P(NULL|soldier) = |soldier,NULL|/ |soldier| = .800 P(into|soldier) = |soldier,into|/|soldier| = .0007 LA = log2(.049*.800/.0007) = 5.81 • which is enough to be very sure that verb attach is right
Smooth the estimates • using typical association rates of prepositions with the whole classes of nouns and verbs P(p|n) = (|n,p|+P(na|p))/( |n|+1)where P(na|p) is |any noun,p|/|any noun| and similarly for verbs • Laplace’s M-estimate again
Performance • ~ 80% correct • can get better precision by accepting lower recall (useful for exploratory text analysis) • “good enough to be added to a parser like Fidditch”
V N2 P N1 V N2 P N1 Backed-off estimation • Collins and Brooks • use N2 as well as N1
Use treebank data • similar approaches • Ratnaparkhi, Reynar and Roukos • Brill and Resnik • difficult to compare results with Hindle and Rooth, because the corpora used are different (but raw scores around 80% in both cases)
The data • 20801 training and 3097 test examples • about 95% of the quadruples in the test data had not been seen in the training set. • compare H&R 200,000 triples
The backed-off method • Katz’s approach to n-grams • If there are enough trigrams: p(wn|wn-1,wn-2) = | wn-2wn-1,wn | / | wn-2,wn-1| • otherwise back off to bigrams p(wn|wn-1,wn-2) = 1*|wn-1,wn | / |wn-1| • otherwise back off to unigram p(wn|wn-1,wn-2) = 1* 2*|wn |
Take this method and apply to PP data • Start with full quadruples • Four possible triples to back off to • Six possible pairs to back off to • Restrict attention to those containing P
How to combine counts from triples and pairs ptriple(1|v,n1,p,n2) ~p(1,v,n1,p)+p(1,v,p,n2)+p(1,n1,p,n2) p(v,n1,p)+p(v,p,n2)+p(n1,p,n2) ppair(1|v,n1,p,n2) ~p(1,v,p)+p(1,p,n2)+p(1,n1,p) p(v,p)+p(p,n2)+p(n1,p) • other combinations tried, this formula is better than simple averaging for this task
What was “enough data”? • In this task it turns out that using a threshold of 0 for the denominator is best. If there is even one instance of the quadruple, trust it. • For n-grams, it was better to ignore low counts • reason for this is not obvious, but in such situations trying things is essential.
Results • 84.1% correct without morphological analysis, 84.5% with • Quadruples more accurate than triples , in turn more accurate than doubles, etc. • But only 148 quadruples in test data, vs 764 triples, 1965 doubles, 216 singles
Comparison with Hindle and Rooth • We have 1924 test cases where H&R would have made a decision • The backoff method using just the |v,p| and |n1,p| counts (86.5%) outscores H&R style (82.1%).
Extra experiments • Setting threshold to 5 reduces performance to 81.6% • Tuples with prepositions in are the most effective.
Attaching Multiple PPs • Merlo, Crocker, Berthouzoz • For a single PP there are two structures, for 2 PPs there are 5, for 3 PPs 14 • so the problem is harder, a dumb algorithm will do poorly • Generalization of Collins/Brooks
Five structures for V NP PP PP • Structure 1 535 The agency said it will [keep]v [the debt]np [under review]pp [ for possible downgrade]pp • Structure 2 1160 Penney will [extend]v [[its involvement]np [with the service]pp]np [for at least five years]pp
Structure 3 1394 [address]v [[budget limits]np [on [credit allocations [ for the Federal Housing agency ]pp]np]pp]np Structure 4 1055 [abandon] [the everyday pricing approach] [in the face of [the poor results]]
Structure 5 539 [answering] [questions [from members of Parliament]] [after his announcement]
Algorithm • Model of PP1 as Collins and Brooks, but excluding p2 • Model of 2PPs is back off over sextuples (i,v,n1,p1,n2,p2) until we get to tuples that don’t have p1, or that don’t have p2 • then Competitive Back off
Competitive Back off • Do standard back off for PP1 using v,n1,p1 • Do standard back off for PP2 using v,n2,p2 • Do back off for PP2 using n1 instead of n2 (ie., v,n1,p2) • Combine these results using a simple procedure, with tiebreak where they conflict.
Results • PP1(2) 84.3% baseline 61.2% (choose most frequent) • PP2(5) 69.6% baseline 29.8% (choose most frequent) • PP3(14) 43.6% baseline 18.5% (choose most frequent)
Results • Take-home messages • Devise a baseline • Measure performance • Pick tasks where beating the baseline is • Impressive • Useful
Ratnaparkhi (Coling 98) • 970K unannotated sentences of WSJ • tagger, simple chunker • heuristic extraction of unambiguous cases
Heuristic extraction • (v,p,n2) if • p is a real preposition (not “of”) • v is the first verb that occurs < K words left of p • v is not a form of the verb “to be” • No noun occurs between v and p • n2 is first word < K words right of p • No verb occurs between p and n2
Heuristic extraction 2 • (n,p,n2) if • p is a real preposition (not “of”) • n is the first that occurs < K words left of p • No verb occurs between v and p • n2 is first word < K words right of p • No verb occurs between p and n2
Accuracy of extraction • Noisy data (c 69% correct) • But abundant
Evaluation • 81.91% with a back off technique • 81.85% with interpolation like H&R • Baseline for this data 70.39%
Portability • Moved to Spanish and got similar performance • H&R would have had to port Fidditch to Spanish
Where to get more information • Charniak ch 8. • Hindle and Rooth CL 19(1) pp 103-120, 1993 • Manning and Schütze, section 8.3 • Original papers