1 / 23

Progress update

Progress update. Lin Ziheng. System overview. Components – Connective classifier. Features from Pitler and Nenkova ( 20 09 ): Connective: because Self category: IN Parent category: SBAR Left sibling category: none Right sibling category: S Right sibling contains a VP: yes.

jaegar
Download Presentation

Progress update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Progress update Lin Ziheng

  2. System overview

  3. Components – Connective classifier • Features from Pitler and Nenkova(2009): • Connective: because • Self category: IN • Parent category: SBAR • Left sibling category: none • Right sibling category: S • Right sibling contains a VP: yes

  4. Components – Connective classifier • New features • Conn POS • Prev word + conn: even though, particularly since • Prev word POS • Prev word POS + conn POS • Conn + Next word • Next word POS • Conn POS + Next word POS • All lemmatized verbs in the sentence containing conn

  5. Components – Argument labeler

  6. Argument labeler – Argument position classifier • Relative positions of Arg1 • Arg1 and Arg2 in the same sentence: SS (60.9%) • Arg1 in the immediately previous sentence: IPS (30.1%) • Arg1 in some non-adjacent previous sentence: NAPS (9.0%) • Arg1 in some following sentence: FS (0%, only 8 instances) • FS ignored

  7. Argument labeler – Argument position classifier • Features: • Connective string • Conn POS • Conn position in the sentence: first, second, third, third last, second last, or last • Prev word • Prev word POS • Prev word + conn • Prev word POS + conn POS • Second prev word • Second prev word POS • Second prev word + conn • Second prev word POS + conn POS

  8. Argument labeler – Argument extractor • SS cases: handcrafted a set of syntactically motivated rules to extract Arg1 and Arg2

  9. Argument labeler – Argument extractor • An example:

  10. Argument labeler – Argument extractor • IPS cases: label the sentence containing the connective as Arg2 and the immediately previous sentence as Arg1 • NAPS cases: • Arg1 locates in the second previous sentence in 45.8% of the NAPS cases • Use the majority decision and assume Arg1 is always in the second previous sentence

  11. Components – Explicit classifier • Prasad et al. (2008) reported human agreements of 94% on Level 1 classes and 84% on Level 2 types • A baseline using only connectives as features gives 95.7% and 86% on Sec. 23 • Difficult to improve acc. on testing section • 3 types of features: • Connective string • Conn POS • Conn + prev word

  12. Components – Non-explicit classifier • Non-explicit: Implicit, AltLex, EntRel, NoRel • 11 Level 2 types for Implicit/AltLex, plus EntRel and NoRel 13 types • 4 feature sets from Lin et al. (2009) • Contextual features • Constituent parse features • Dependency parse features • Word-pair features • 3 features to capture AltLex: Arg2_word1, Arg2_word2, Arg2_word3

  13. Components – Attribution span labeler • Two steps: split the text into clauses, and decide which clauses are attribution spans • Rule-based clause splitter: • first split a sentence into clauses by punctuations • for each clause, we further split it if one of the following production links if found: VPSBAR, SSINV, SS, SINVS, SSBAR, VPS

  14. Components – Attribution span labeler • Attr span classifier features: (curr, prev and next clauses) • Unigrams of curr • Lowercased and lemmatized vers in curr • The first and last terms of curr • The last term of prev • The first term of next • The last term of prev + the first term of curr • The last term of curr + the first term of next • The position of curr in the sentence • Punctuations rules extracted from curr

  15. Evaluation • Train: 02-21, dev: 22, test: 23 • Each component is tested • without and with error propagation (EP) from previous component • with gold standard (GS) parse trees and sentence boundaries, and with automatic (Auto) parser and sentence splitter

  16. Evaluation – Connective classifier • GS: increased acc and F1 by 2.05% and 3.05% • Auto: increased acc and F1 by 1.71% and 2.54% • Contextual info is helpful

  17. Evaluation – Argument position classifier • Able to accurately label SS • But performs badly on the NAPS class • Due to the similarity between IPS and NAPS classes

  18. Evaluation – Argument extractor • Human agreements on partial and exact matches: 94.5% and 90.2% • Exact F1 much lower than partial F1 • Due to small portions of text deleted

  19. Evaluation – Explicit classifier • Baseline: using only connective strings • 86% • GS + no EP F1 increased by 0.44%

  20. Evaluation – Non-explicit classifier • Majority baseline: all classified as EntRel • Adding EP degrades F1 by ~13%, but still outperforms baseline by ~6%

  21. Evaluation – Attribution span labeler • When EP added: the decrease of F1 is largely due to the drop in precision • When Auto added: the decrease of F1 is largely due the drop in recall

  22. Evaluation – The whole pipeline • Definition: a relation is correct if its relation type is classified correctly, and both Arg1 and Arg2 are partially or exactly matched • GS + EP • Partial: 46.38% F1 • Exact: 31.72% F1

  23. On-going changes • Joint learning • Change rule-based argument extractor to a machine learning approach

More Related