1 / 15

Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation

Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation. Daisuke Kawahara 1 and Sadao Kurohashi 1,2. 1 National Institute of Information and Communications Technology. 2 Kyoto University. LREC2010, 2010/ 05 /20. Background. NLP analyzers so far

winter
Download Presentation

Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation Daisuke Kawahara1and Sadao Kurohashi1,2 1National Institute of Information and Communications Technology 2Kyoto University LREC2010, 2010/05/20

  2. Background • NLP analyzers so far • (Mainly) supervised, (relatively) knowledge-poor • e.g., PP-attachment or parsing Mary ate the salad with a fork Mary ate the saladwith mushrooms • Only 1.5% of bilexical dependency was learned [Bikel, 04] • Toward knowledge-oriented NLP • Automatically compile case frames and integrate them into NLP analyzers/applications

  3. Related work • Subcategorization frames • [Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe and Carroll, 97] [Korhonen, 02] … e.g., She greeted me. • NP(sbj) greet NP(obj) e.g., She gave him a book. • NP(sbj) give NP(obj) NP(obj)

  4. Related work • Subcategorization frames • [Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe and Carroll, 97] [Korhonen, 02] … • (Manually compiled) semantic frames • FrameNet [Baker et al., 98], PropBank [Palmer et al., 05] • Japanese semantic case frames • Semantic marker-based: [Haruno, 95] [Utsuro et al., 96] • Example-based: [Kawahara and Kurohashi, 06]

  5. Compilation of Japanese semantic case frames[Kawahara and Kurohashi, 06] ga: nominative, wo: accusative, ni: dative, de: instrument

  6. Predicate-argument structures Clustering Parsing andfiltering [Kawahara and Uchimoto, 08] Compilation of English case frames Sentenceextraction Sentences WordNet Case frames Dependency parser 89.9% → 91.5% (short sentences)

  7. Examples of obtained case frames [Kawahara and Uchimoto, 08] • surface cases and prepositions • sbj, obj, obj2, sbar, pp:for, pp:in, …

  8. Predicate-argument structures Clustering Parsing andfiltering Compilation of English case frames Sentenceextraction Sentences WordNet Case frames Dependency parser 89.9% → 91.5% (short sentences)

  9. Procedure • Apply POS tagging and chunking to a raw corpus • Filter out unreliable and inappropriate sentences and chunks • Extract predicate-argument structures and apply PP-attachment disambiguation if a PP exists sbj:[I] pred:[borrow] obj:[the kits]pp:with:[a $ 25.00 deposit] NP:[I] VP:[borrowed] NP:[the kits] PP:[with]NP:[a $ 25.00 deposit] O:, O:and … NP:[I] VP:[borrowed] NP:[the kits] PP:[with]NP:[a $ 25.00 deposit] I borrowed the kits with a $25.00 deposit, and … Example:

  10. 1. POS tagging and chunking • POS tagging • Tsuruoka’s tagger [Tsuruoka and Tsujii, 05]accuracy: 97.1% • Chunking • YamChachunker [Kudo and Matsumoto, 01]precision: 93.89%, recall: 93.06%, F: 93.47

  11. 2. Filtering of unreliable sentences and chunks • sentences to be discarded • a sentence that begins with a VP or a PP • a sentence that ends with a question mark • a sentence that has a comma being adjacent to a VP • a sentence that contains a sign (-, ;, …) • a sentence that does not have an NP before a VP • a sentence in which the first VP is a participle or an infinitive • chunks to be discarded • chunks following the first comma outside an NP • chunks following wh-clauses • chunks following the second VP except participles and infinitives Coverage: 17.9%

  12. Evaluation of filtering results • VP • precision: 96.46% (517/536) • 12/19 are not harmfule.g., “successfully contended” • precision: 98.69% (529/536) • NP • precision: 96.18% (1559/1621) • 38/62 are not harmfule.g., “about 10,000 diamond miners” • precision: 98.52% (1597/1621) His firm favors selected computer, drug and pollution-control stocks.

  13. 3. Extract predicate-argument structures from chunks • Use straightforward rules • VP → pred • NP preceding the predicate → sbj • NP following the predicate → obj • NP following “obj” → obj2 • SBAR → sbar • a pair of adjoining PP and NP → pp

  14. Experiments • From 2G English sentences, we acquired 2.4G predicate-argument structures • Manual evaluation of 200 predicate-argument structures: 97% is correct • incorrect objects of say, know and so on • incorrect detection of “sbar” • Errors of PP-attachment disambiguation sbj:[the super-user] pred:[raise] obj:[the hard limits] sbj:[it] pred:[strengthen] obj:[the action] sbj:[he] pred:[raise] obj:[a hand] sbj:[this web page] pred:[be linked] pp:to:[any other web sites] sbj:[a user] pred:[view] obj:[items] pp:from:[your catalog] sbj:[you] pred:[read] obj:[this] He said the assets to be sold would be ...

  15. Conclusion and future work • Acquired high-quality predicate-argument structures for case frame compilation • Real use of English predicates • Future work • Apply clustering to compile case frames [Kawahara and Uchimoto, 08] • Integrate case frames to parsing (and other applications) • cf. [Zeman, 02] for subcategorization frames [Kawahara and Kurohashi, 06] for case frames

More Related