Preposition Phrase Attachment

Preposition Phrase Attachment • To what previous verb or noun phrase does a prepositional phrase (PP) attach? The woman with a poodle saw a man in the park with a poodle with a telescope on Tuesday on his bicycle

A Simplified Version • Assume ambiguity only between preceding base NP and preceding base VP: The woman had seenthe man with the telescope. Q: Does the PP attach to the NP or the VP? • Assumption: Consider only NP/VP head and the preposition

Simple Formulation • Determine attachment based on log-likelihood ratio: LLR(v, n, p) = log P(p | v) - log P(p | n) If LLR > 0 then attach to verb, If LLR < 0 attach to noun

Issues • Multiple attachment: • Attachment lines cannot cross • Proximity: • Preference for attaching to closer structures, all else being equal Chrysler will end its troubled venture with Maserati. P(with | end) = 0.118 P(with | venture) = 0.107 !!!

Hindle & Rooth (1993) • Consider just sentences with a transitive verb and PP, i.e., of the form: ... bVP bNP PP ... Q: Where does the first PP attach (NP or VP)? Indicator variables (0 or 1): VAp: Is there a PP headed by p after v attached to v? NAp: Is there a PP headed by p after n attached to n? NB: Both variables can be 1 in a sentence

Attachment Probabilities • P(attach(p) = n | v, n) = P(NAp=1 | n) • Verb attachment is irrelevant; if it attaches to the noun it cannot attach to the verb • P(attach(p) = v | v, n) = P(VAp=1, NAp=0 | v, n) = P(VAp=1 | v) P(NAp=0 | n) • Noun attachment is relevant, since the noun ‘shadows’ the verb (by proximity principle)

Estimating Parameters • MLE: P(VAp= 1 | v) = C(v,p) / C(v) P(NAp= 1 | n) = C(n,p) / C(n) • Using an unlabeled corpus: • Bootstrap from unambiguous cases: The road from Chicago to New York is long. She went from Albany towards Buffalo.

Unsupervised Training • Build initial model using only unambiguous attachments • Apply initial model and assign attachments if LLR above threshhold • Divide remaining ambiguous cases as 0.5 counts for each possibility Use of EM as principled method?

Limitations • Semantic issues: I examined the man with a stethoscope. I examined the man with a broken leg. • Other contextual features: Superlative adjectives (biggest) indicate NP • More complex sentences: The board approved its acquisitionby BigCoof Milwaukee for $32 a shareat its meetingon Tuesday.

Memory-Based Formulation • Each example has four components: V N1 P N2 examine man with stethoscope Class = V • Similarity based on information gain weighting for matching components • Need ‘semantic’ similarity measure for words: • stethoscope ~ thermometer kidney ~ leg

MVDM Word Similarity • Idea:Words are similar to the extent that they predict similar class distributions • Data sparseness is a serious problem, though! • Extend idea to task independent similarity metric...

Lexical Space • Represent ‘semantics’ of a word by frequencies of words which coöccur with it, instead of relative frequencies of classes • Each word has 4 vectors of frequencies for words 2 before, 1 before, 1 after, and 2 after

Results • Baseline comparisons: • Humans (4-tuple): 88.2% • Humans (full sentence): 93.2% • Noun always: 59.0% • Most likely for prep: 72.2% • Without Info Gain: 83.7% • With Info Gain: 84.1%

Using Many Features • Use many features of an example together • Consider interaction between features during learning • Each example represented as a feature vector: x = (f1,f2,...,fn)

kNN Geometric Interpretation Linear Separator Learning

Linear Separators • Linear separator model is a vector of weights: w = (w1,w2,...,wn) • Binary classification: Is wTx > 0 ? • ‘Positive’ and ‘Negative’ classes A threshhold other than 0 is possible by adding dummy element of “1” to all vectors – the threshhold is just the weight for that element

Error-Based Learning • Initialize w to be all 1’s • Cycle x through examples repeatedly (random order): • If wTx > 0 but x is really negative, then decrease w’s elements • If wTx < 0 but x is really positive, then decrease w’s elements

Winnow • Initialize w to be all 1’s • Cycle v through examples repeatedly (random order): a) If wTx < 0 but x is really positive, then b) If wTx > 0 but x is really negative, then:

Issues • No negative weights possible! • Balanced Winnow: Formulate weights as sum of 2 weight vectors: w = w+- w- Learn each vector separately, w+ regularly, and w- with polarity reversed • Multiple classes: • Learn one weight vector for each class (learning X vs. not-X) • Choose highest value result for example

PP Attachment Features • Words in each position • Subsets of the above, e.g: <v=run,p=with> • Word classes at various levels of generality: stethoscope  medical instrument  instrument device  instrumentation  artifact  object  physical thing • Derived from WordNet – handmade lexicon • 15 basic features plus word-class features

Results including preposition of: Transform Backoff MBL Winnow 81.9 84.5 84.4 84.8 Results • Results without preposition of:

Preposition Phrase Attachment