1 / 38

Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution

This study explores the potential of using the web as a training set for resolving structural ambiguity in natural language processing tasks. The approach utilizes n-gram association models, web-derived surface features, and paraphrases for improved performance.

xaviern
Download Presentation

Using the Web as an Implicit Training Set: Application to Structural Ambiguity Resolution

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using the Web as an Implicit Training Set:Application to Structural Ambiguity Resolution Preslav Nakov andMarti HearstComputer Science Division and SIMSUniversity of California, Berkeley Supported by NSF DBI-0317510 and a gift from Genentech

  2. Motivation • Huge datasets trump sophisticated algorithms. • “Scaling to Very Very Large Corpora for Natural Language Disambiguation”, ACL 2001 (Banko & Brill, 2001) • Task: spelling correction • Raw text as “training data” • Log-linear improvement even to billion words • Getting more data is better than fine-tuning algorithms. • How to generalize to other problems?

  3. Web as a Baseline • “Web as a baseline” (Lapata & Keller 04;05): applied simple n-gram models to: • machine translation candidate selection • article generation • noun compound interpretation • noun compound bracketing • adjective ordering • spelling correction • countability detection • prepositional phrase attachment • All unsupervised • Findings: • Sometimes rival best supervised approaches. • => Web n-grams should be used as a baseline. Significantly better than the best supervised algorithm. Not significantly different from the best supervised algorithm.

  4. Our Contribution • Potential of these ideas is not yet fully realized • We introduce new features • paraphrases • surface features • Applied to structural ambiguity problems • Data sparseness: need statistics for every possible word and for word combinations • Problems (unsupervised): • Noun compound bracketing • PP attachment • NP coordination state-of-the-art results (Nakov&Hearst, 2005) this work

  5. Task 1: Prepositional Phrase Attachment

  6. PP attachment (a) Peter spent millions of dollars. (noun) (b) Peter spent time with his family. (verb) quadruple: (v, n1, p, n2) (a) (spent, millions, of, dollars) (b) (spent, time, with, family) PP combines with the NP to form another NP PP is an indirect object of the verb Human performance: • quadruple: 88% • whole sentence: 93%

  7. Related Work Supervised • (Brill & Resnik, 94): transformation-based learning, WordNet classes, P=82% • (Ratnaparkhi & al., 94): ME, word classes (MI), P=81.6% • (Collins & Brooks, 95): back-off, P=84.5% • (Stetina & Makoto, 97): decision trees, WordNet, P=88.1% • (Toutanova & al., 04): morphology, syntax, WordNet, P=87.5% • (Olteanu & Moldovan, 05): in context, parser, FrameNet, Web, SVM, P=92.85% Unsupervised • (Hindle & Rooth, 93): partially parsed corpus, lexical associations over subsets of (v,n1,p), P=80%,R=80% • (Ratnaparkhi, 98): POS tagged corpus, unambiguous cases for (v,n1,p), (n1,p,n2), classifier: P=81.9% • (Pantel & Lin,00): collocation database, dependency parser, large corpus (125M words), P=84.3% Ratnaparkhi dataset Unsup. state-of-the-art

  8. Related Work: Web Unsup. • (Volk, 00): Altavista, NEAR operator, German, compare Pr(p|n1) vs. Pr(p|v), P=75%, R=58% • (Volk, 01): Altavista, NEAR operator, German, inflected queries, Pr(p,n2|n1) vs. Pr(p,n2|v), P=75%, R=85% • (Calvo & Gelbukh, 03): exact phrases, Spanish, P=91.97%, R=89.5% • (Lapata & Keller,05): Web n-grams, English, Ratnaparkhi dataset, P in low 70’s • (Olteanu & Moldovan, 05): supervised, English, in context, parser, FrameNet, Web counts, SVM, P=92.85%

  9. PP-attachment: Our Approach • Unsupervised • (v,n1,p,n2) quadruples, Ratnaparkhi test set • Google and MSN Search • Exact phrase queries • Inflections: WordNet 2.0 • Adding determiners where appropriate • Models: • n-gram association models • Web-derived surface features • paraphrases

  10. Probabilities: Estimation • Using page hits as a proxy for n-gram counts • Pr(w1|w2) = #(w1,w2) / #(w2) • #(w2) word frequency; query for “w2” • #(w1,w2) bigram frequency; query for “w1 w2” • Pr(w1,w2|w3) = #(w1,w2,w3) / #(w3)

  11. N-gram models • (i) Pr(p|n1) vs. Pr(p|v) • (ii) Pr(p,n2|n1) vs. Pr(p,n2|v) • I eat/v spaghetti/n1 with/p a fork/n2. • I eat/vspaghetti/n1 with/psauce/n2. • Pr or # (frequency) • smoothing as in (Hindle & Rooth, 93) • back-off from (ii) to (i) • N-grams unreliable, if n1 or n2 is a pronoun. • MSN Search: no rounding of n-gram estimates

  12. Web-derived Surface Features P R • Example features • open the door / with a key verb (100.00%, 0.13%) • open the door (with a key) verb (73.58%, 2.44%) • open the door –with a key verb (68.18%, 2.03%) • open the door , with a key verb (58.44%, 7.09%) • eat Spaghetti with sauce noun (100.00%, 0.14%) • eat ? spaghetti with sauce noun (83.33%,0.55%) • eat , spaghetti with sauce noun (65.77%,5.11%) • eat : spaghetti with sauce noun (64.71%,1.57%) • Summing achieves high precision, low recall. sum compare sum

  13. Paraphrases v n1p n2 • v n2n1 (noun) • v p n2n1 (verb) • p n2 * v n1 (verb) • n1p n2 v (noun) • v PRONOUN p n2 (verb) • BE n1 p n2 (noun)

  14. Paraphrases: pattern (1) • v n1pn2 v n2n1 (noun) • Can we turn “n1p n2” into a noun compound “n2 n1”? • meet/v demands/n1from/p customers/n2  • meet/v the customer/n2 demands/n1 • Problem: ditransitive verbs like give • gave/v an apple/n1 to/p him/n2 • gave/v him/n2 an apple/n1 • Solution: • no determiner before n1 • determiner before n2 is required • the preposition cannot be to

  15. Paraphrases: pattern (2) • v n1pn2 v p n2n1 (verb) • If “p n2” is an indirect object of v, then it could be switched with the direct object n1. • had/v a program/n1in/p place/n2 • had/v in/p place/n2aprogram/n1 Determiner before n1 is required to prevent “n2 n1” from forming a noun compound.

  16. Paraphrases: pattern (3) • v n1pn2 p n2 * v n1 (verb) • “*” indicates a wildcard position (up to three intervening words are allowed) • Looks for appositions, where the PP has moved in front of the verb, e.g. • I gave/v an apple/n1to/p him/n2 • to/p him/n2 I gave/v an apple/n1

  17. Paraphrases: pattern (4) • v n1p n2 n1p n2 v (noun) • Looks for appositions, where “n1 p n2” has moved in front of v • shaken/v confidence/n1 in/p markets/n2  • confidence/n1 in/p markets/n2 shaken/v

  18. Paraphrases: pattern (5) • v n1 p n2  v PRONOUN p n2 (verb) • n1 is a pronoun  verb (Hindle&Rooth, 93) • Pattern (5) substitutes n1 with a dative pronoun (him or her), e.g. • put/v a client/n1 at/p odds/n2  • put/v himat/p odds/n2 pronoun

  19. Paraphrases: pattern (6) • v n1 p n2  BEn1 p n2 (noun) • BE is typically used with a noun attachment • Pattern (6) substitutes v with a form of to be (isor are), e.g. • eat/v spaghetti/n1 with/p sauce/n2  • isspaghetti/n1 with/p sauce/n2 to be

  20. Evaluation Ratnaparkhi dataset • 3097 test examples, e.g. prepare dinner for family V shipped crabs from province V • n1 or n2 is a bare determiner: 149 examples • problem for unsupervised methods left chairmanship of the N is the of kind N acquire securities for an N • special symbols: %, /, & etc.: 230 examples • problem for Web queries buy % for 10 V beat S&P-down from % V is 43%-owned by firm N

  21. Results For prepositions other then OF. (of noun attachment) Smoothing is not needed on the Web Models in bold are combined in a majority vote. Simpler but not significantly different from 84.3% (Pantel&Lin,00). Checking directly for...

  22. Task 2: Coordination

  23. Coordination & Problems • (Modified) real sentence: • The Department of Chronic Diseases andHealth Promotion leads and strengthens global efforts to prevent and control chronic diseases or disabilities and to promote health and quality of life. • Problems: • boundaries: words, constituents, clauses etc. • interactions with PPs: [health and [quality of life]] vs. [[health and quality] of life] • or meaning and: chronic diseases or disabilities • ellipsis

  24. NC coordination: ellipsis • Ellipsis • car and truck production • means car production and truck production • No ellipsis • president and chief executive • All-way coordination • Securities and Exchange Commission

  25. NC Coordination: ellipsis • Quadruple (n1,c,n2,h) • Penn Treebank annotations • ellipsis: (NPcar/NN and/CC truck/NN production/NN). • no ellipsis: (NP (NPpresident/NN) and/CC (NP chief/NN executive/NN)) • all-way: can be annotated either way • This is a problem a parser must deal with. Collins’ parser always predicts ellipsis, but other parsers (e.g. Charniak’s) try to solve it.

  26. Related Work • (Resnik, 99): similarity of form and meaning, conceptual association, decision tree, P=80%, R=100% • (Rus & al., 02): deterministic, rule-based bracketing in context, P=87.42%, R=71.05% • (Chantree & al., 05): distributional similarities from BNC, Sketch Engine (freqs., object/modifier etc.), P=80.3%, R=53.8% • (Goldberg, 99): different problem (n1,p,n2,c,n3), adapts Ratnaparkhi (99) algorithm, P=72%, R=100%

  27. N-gram models (n1,c,n2,h) • (i) #(n1,h) vs. #(n2,h) • (ii) #(n1,h) vs. #(n1,c,n2)

  28. Surface Features sum compare sum

  29. Paraphrases n1 c n2 h • n2 c n1h (ellipsis) • n2 h c n1 (NO ellipsis) • n1 h c n2 h (ellipsis) • n2 h c n1 h (ellipsis)

  30. Paraphrases: Pattern (1) • n1 c n2 h  n2 c n1 h (ellipsis) • Switch places of n1 and n2 • bar/n1 and/c pie/n2 graph/h  • pie/n2 and/c bar/n1 graph/h

  31. Paraphrases: Pattern (2) • n1 c n2h n2 h c n1 (NO ellipsis) • Switch places of n1 and n2 h • president/n1 and/c chief/n2 executive/h  • chief/n2 executive/h and/c president/n1

  32. Paraphrases: Pattern (3) h • n1 c n2h n1 h c n2 h (ellipsis) • Insert the elided head h • bar/n1 and/c pie/n2 graph/h • bar/n1 graph/h and/c pie/n2 graph/h

  33. Paraphrases: Pattern (4) h • n1 c n2h n2 h c n1 h (ellipsis) • Insert the elided head h, but alsoswitch n1 andn2 • bar/n1 and/c pie/n2 graph/h • pie/n2 graph/h and/c bar/n1 graph/h

  34. (Rus & al.,02) Heuristics • Heuristic 1: no ellipsis • n1=n2 • milk/n1 and/c milk/n2 products/h • Heuristic 4: no ellipsis • n1 and n2 are modified by an adjective • Heuristic 5: ellipsis • only n1 is modified by an adjective • Heuristic 6: no ellipsis • only n2 is modified by an adjective We use a determiner.

  35. Number Agreement • Introduced by Resnik (93) (a) n1&n2 agree, but n1&h do not  ellipsis; (b) n1&n2 don’t agree, but n1&h do no ellipsis; (c) otherwise leave undecided.

  36. Results428 examples from Penn TB Bad, compares bigram to trigram. Models in bold are combined in a majority vote. Comparable to other researchers (but no standard dataset).

  37. Conclusions & Future Work • Tapping the potential of very large corpora for unsupervised algorithms • Go beyond n-grams • Surface features • Paraphrases • Results competitive with best unsupervised • Results can rival supervised algorithms’ • Future Work • other NLP tasks • better evidence combination There should be even more exciting features on the Web!

  38. The End Thank you!

More Related