230 likes | 371 Views
Progress update. Lin Ziheng. System overview. Components – Connective classifier. Features from Pitler and Nenkova ( 20 09 ): Connective: because Self category: IN Parent category: SBAR Left sibling category: none Right sibling category: S Right sibling contains a VP: yes.
E N D
Progress update Lin Ziheng
Components – Connective classifier • Features from Pitler and Nenkova(2009): • Connective: because • Self category: IN • Parent category: SBAR • Left sibling category: none • Right sibling category: S • Right sibling contains a VP: yes
Components – Connective classifier • New features • Conn POS • Prev word + conn: even though, particularly since • Prev word POS • Prev word POS + conn POS • Conn + Next word • Next word POS • Conn POS + Next word POS • All lemmatized verbs in the sentence containing conn
Argument labeler – Argument position classifier • Relative positions of Arg1 • Arg1 and Arg2 in the same sentence: SS (60.9%) • Arg1 in the immediately previous sentence: IPS (30.1%) • Arg1 in some non-adjacent previous sentence: NAPS (9.0%) • Arg1 in some following sentence: FS (0%, only 8 instances) • FS ignored
Argument labeler – Argument position classifier • Features: • Connective string • Conn POS • Conn position in the sentence: first, second, third, third last, second last, or last • Prev word • Prev word POS • Prev word + conn • Prev word POS + conn POS • Second prev word • Second prev word POS • Second prev word + conn • Second prev word POS + conn POS
Argument labeler – Argument extractor • SS cases: handcrafted a set of syntactically motivated rules to extract Arg1 and Arg2
Argument labeler – Argument extractor • An example:
Argument labeler – Argument extractor • IPS cases: label the sentence containing the connective as Arg2 and the immediately previous sentence as Arg1 • NAPS cases: • Arg1 locates in the second previous sentence in 45.8% of the NAPS cases • Use the majority decision and assume Arg1 is always in the second previous sentence
Components – Explicit classifier • Prasad et al. (2008) reported human agreements of 94% on Level 1 classes and 84% on Level 2 types • A baseline using only connectives as features gives 95.7% and 86% on Sec. 23 • Difficult to improve acc. on testing section • 3 types of features: • Connective string • Conn POS • Conn + prev word
Components – Non-explicit classifier • Non-explicit: Implicit, AltLex, EntRel, NoRel • 11 Level 2 types for Implicit/AltLex, plus EntRel and NoRel 13 types • 4 feature sets from Lin et al. (2009) • Contextual features • Constituent parse features • Dependency parse features • Word-pair features • 3 features to capture AltLex: Arg2_word1, Arg2_word2, Arg2_word3
Components – Attribution span labeler • Two steps: split the text into clauses, and decide which clauses are attribution spans • Rule-based clause splitter: • first split a sentence into clauses by punctuations • for each clause, we further split it if one of the following production links if found: VPSBAR, SSINV, SS, SINVS, SSBAR, VPS
Components – Attribution span labeler • Attr span classifier features: (curr, prev and next clauses) • Unigrams of curr • Lowercased and lemmatized vers in curr • The first and last terms of curr • The last term of prev • The first term of next • The last term of prev + the first term of curr • The last term of curr + the first term of next • The position of curr in the sentence • Punctuations rules extracted from curr
Evaluation • Train: 02-21, dev: 22, test: 23 • Each component is tested • without and with error propagation (EP) from previous component • with gold standard (GS) parse trees and sentence boundaries, and with automatic (Auto) parser and sentence splitter
Evaluation – Connective classifier • GS: increased acc and F1 by 2.05% and 3.05% • Auto: increased acc and F1 by 1.71% and 2.54% • Contextual info is helpful
Evaluation – Argument position classifier • Able to accurately label SS • But performs badly on the NAPS class • Due to the similarity between IPS and NAPS classes
Evaluation – Argument extractor • Human agreements on partial and exact matches: 94.5% and 90.2% • Exact F1 much lower than partial F1 • Due to small portions of text deleted
Evaluation – Explicit classifier • Baseline: using only connective strings • 86% • GS + no EP F1 increased by 0.44%
Evaluation – Non-explicit classifier • Majority baseline: all classified as EntRel • Adding EP degrades F1 by ~13%, but still outperforms baseline by ~6%
Evaluation – Attribution span labeler • When EP added: the decrease of F1 is largely due to the drop in precision • When Auto added: the decrease of F1 is largely due the drop in recall
Evaluation – The whole pipeline • Definition: a relation is correct if its relation type is classified correctly, and both Arg1 and Arg2 are partially or exactly matched • GS + EP • Partial: 46.38% F1 • Exact: 31.72% F1
On-going changes • Joint learning • Change rule-based argument extractor to a machine learning approach