150 likes | 255 Views
Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation. Daisuke Kawahara 1 and Sadao Kurohashi 1,2. 1 National Institute of Information and Communications Technology. 2 Kyoto University. LREC2010, 2010/ 05 /20. Background. NLP analyzers so far
E N D
Acquiring Reliable Predicate-argument Structures from Raw Corpora for Case Frame Compilation Daisuke Kawahara1and Sadao Kurohashi1,2 1National Institute of Information and Communications Technology 2Kyoto University LREC2010, 2010/05/20
Background • NLP analyzers so far • (Mainly) supervised, (relatively) knowledge-poor • e.g., PP-attachment or parsing Mary ate the salad with a fork Mary ate the saladwith mushrooms • Only 1.5% of bilexical dependency was learned [Bikel, 04] • Toward knowledge-oriented NLP • Automatically compile case frames and integrate them into NLP analyzers/applications
Related work • Subcategorization frames • [Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe and Carroll, 97] [Korhonen, 02] … e.g., She greeted me. • NP(sbj) greet NP(obj) e.g., She gave him a book. • NP(sbj) give NP(obj) NP(obj)
Related work • Subcategorization frames • [Brent, 93] [Ushioda et al., 93] [Manning, 93] [Briscoe and Carroll, 97] [Korhonen, 02] … • (Manually compiled) semantic frames • FrameNet [Baker et al., 98], PropBank [Palmer et al., 05] • Japanese semantic case frames • Semantic marker-based: [Haruno, 95] [Utsuro et al., 96] • Example-based: [Kawahara and Kurohashi, 06]
Compilation of Japanese semantic case frames[Kawahara and Kurohashi, 06] ga: nominative, wo: accusative, ni: dative, de: instrument
Predicate-argument structures Clustering Parsing andfiltering [Kawahara and Uchimoto, 08] Compilation of English case frames Sentenceextraction Sentences WordNet Case frames Dependency parser 89.9% → 91.5% (short sentences)
Examples of obtained case frames [Kawahara and Uchimoto, 08] • surface cases and prepositions • sbj, obj, obj2, sbar, pp:for, pp:in, …
Predicate-argument structures Clustering Parsing andfiltering Compilation of English case frames Sentenceextraction Sentences WordNet Case frames Dependency parser 89.9% → 91.5% (short sentences)
Procedure • Apply POS tagging and chunking to a raw corpus • Filter out unreliable and inappropriate sentences and chunks • Extract predicate-argument structures and apply PP-attachment disambiguation if a PP exists sbj:[I] pred:[borrow] obj:[the kits]pp:with:[a $ 25.00 deposit] NP:[I] VP:[borrowed] NP:[the kits] PP:[with]NP:[a $ 25.00 deposit] O:, O:and … NP:[I] VP:[borrowed] NP:[the kits] PP:[with]NP:[a $ 25.00 deposit] I borrowed the kits with a $25.00 deposit, and … Example:
1. POS tagging and chunking • POS tagging • Tsuruoka’s tagger [Tsuruoka and Tsujii, 05]accuracy: 97.1% • Chunking • YamChachunker [Kudo and Matsumoto, 01]precision: 93.89%, recall: 93.06%, F: 93.47
2. Filtering of unreliable sentences and chunks • sentences to be discarded • a sentence that begins with a VP or a PP • a sentence that ends with a question mark • a sentence that has a comma being adjacent to a VP • a sentence that contains a sign (-, ;, …) • a sentence that does not have an NP before a VP • a sentence in which the first VP is a participle or an infinitive • chunks to be discarded • chunks following the first comma outside an NP • chunks following wh-clauses • chunks following the second VP except participles and infinitives Coverage: 17.9%
Evaluation of filtering results • VP • precision: 96.46% (517/536) • 12/19 are not harmfule.g., “successfully contended” • precision: 98.69% (529/536) • NP • precision: 96.18% (1559/1621) • 38/62 are not harmfule.g., “about 10,000 diamond miners” • precision: 98.52% (1597/1621) His firm favors selected computer, drug and pollution-control stocks.
3. Extract predicate-argument structures from chunks • Use straightforward rules • VP → pred • NP preceding the predicate → sbj • NP following the predicate → obj • NP following “obj” → obj2 • SBAR → sbar • a pair of adjoining PP and NP → pp
Experiments • From 2G English sentences, we acquired 2.4G predicate-argument structures • Manual evaluation of 200 predicate-argument structures: 97% is correct • incorrect objects of say, know and so on • incorrect detection of “sbar” • Errors of PP-attachment disambiguation sbj:[the super-user] pred:[raise] obj:[the hard limits] sbj:[it] pred:[strengthen] obj:[the action] sbj:[he] pred:[raise] obj:[a hand] sbj:[this web page] pred:[be linked] pp:to:[any other web sites] sbj:[a user] pred:[view] obj:[items] pp:from:[your catalog] sbj:[you] pred:[read] obj:[this] He said the assets to be sold would be ...
Conclusion and future work • Acquired high-quality predicate-argument structures for case frame compilation • Real use of English predicates • Future work • Apply clustering to compile case frames [Kawahara and Uchimoto, 08] • Integrate case frames to parsing (and other applications) • cf. [Zeman, 02] for subcategorization frames [Kawahara and Kurohashi, 06] for case frames