270 likes | 467 Views
End-to-End Discourse Parser Evaluation. Sucheta Ghosh , Sara Tonelli , Giuseppe Riccardi , Richard Johansson Department of Information Engineering and Computer Science University of Trento, Italy. Content. Introduction Discourse Parser: what + why + how
E N D
End-to-End Discourse Parser Evaluation SuchetaGhosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering and Computer Science University of Trento, Italy
Content Introduction Discourse Parser: what + why + how Discourse Parser & Penn Discourse TreeBank (PDTB) Our contribution Architecture Feature Result Conclusion End2End Disc Pars Eval
Introduction What: we refer to coherent structured group of sentences or expressions as a discourse Why: discourse structure to represent the meaning of the document How : Process flow: data (discourse) segmentation discourse parsing discourse structure Discourse structure includes relations (connective and its arguments ) lexically anchored in the document text Common Data Sources: Rhetorical Structure Tree (RST) & Penn Discourse TreeBank (PDTB ) We used this End2End Disc Pars Eval
Examples from PDTB(1) Arg1 -> I never gamble too far. Explicit Connective -> In particular Arg2 -> I quit after one try, whether I win or lose. [EXPANSION ] • Each annotated relation includes a connective, two arguments and a sense label of connective • Connective occur between two arguments or at the beginning of sentence or inside argument • The top-level senses of three-layered hierarchy: TEMPORAL, CONTINGENCY, COMPARISON, EXPANSION End2End Disc Pars Eval
Examples from PDTB(2) When Mr. Green won a $240,000 verdict in a land condemnation case against the State in June 1983, he says, Judge O’Kicki unexpectedly awarded him an additional $100,000. [TEMPORAL ] As an indicator of the tight grain supply situation in the U.S., market analysts said that late Tuesday the Chinese government, which often buys U.S. grains in quantity, turned instead to Britain to buy 500,000 metric tons of wheat. [COMPARISON ] SinceMcDonald’s menu prices rose this year, the actual deadline may have been more. [CONTINGENCY ] (Arg1 italicized, connectives underlined, Arg2 boldfaced) End2End Disc Pars Eval
PDTB Corpus Statistics Arg2 always in same sentence as connective 60.9% of the annotated Arg1 in same sentence as connective, 39.1% is in the previous sentence (30.1% adjacent, 9.0% non adjacent) We used this statistic information to establish baseline End2End Disc Pars Eval
Our Contribution Developed end-to-end discourse parser to retrieve discourse structure with explicit connective, 2 arg spans starting with text paragraph Evaluation Established system with Gold-standard data (PTB+PDTB) Evaluated with baseline Implemented same method in automated system Improvement of the automated system in terms of applicability Overlapping discourse segmentation technique (+2/-2 window) applied on the complete text Followed chunking strategy for classification The discourse model is a cascaded CRF End2End Disc Pars Eval
End-to-End Architecture End2End Disc Pars Eval
Features Features used for Arg1 and Arg2 segmentation and labeling. F1. Token (T) F2. Sense of Connective (CONN) F3. IOB chain (IOB) F4. PoS tag F5. Lemma (L) F6. Inflection (INFL) F7. Main verb of main clause (MV) F8. Boolean feature for MV (BMV) Additional feature used only for Arg1 F9. Arg2 Labels For more details: Ghosh et al IJCNLP 2011 End2End Disc Pars Eval
Features: Arg1 Features used for Arg1 and Arg2 segmentation and labeling. F1. Token (T) F2. Sense of Connective (CONN) F3. IOB chain (IOB) F4. PoS tag F5. Lemma (L) F6. Inflection (INFL) F7. Main verb of main clause (MV) F8. Boolean feature for MV (BMV) Additional feature used only for Arg1 F9. Arg2 Labels For more details: Ghosh et al IJCNLP 2011 End2End Disc Pars Eval
Features: Arg2 Features used for Arg1 and Arg2 segmentation and labeling. F1. Token (T) F2. Sense of Connective (CONN) F3. IOB chain (IOB) F4. PoS tag F5. Lemma (L) F6. Inflection (INFL) F7. Main verb of main clause (MV) F8. Boolean feature for MV (BMV) Additional feature used only for Arg1 F9. Arg2 Labels For more details: Ghosh et al IJCNLP 2011 End2End Disc Pars Eval
Evaluation & Baseline Metrics: Precision, Recall and F1 measure Scoring schemes: Exact Match:correct if classified span exactly coincides with gold standard span Baseline (On the basis of statistics given at annotation manual): Arg2: bylabeling all tokens of the text span between the connective and the beginning of the next sentence Arg1: by labeling all tokens in the text span from the end of the previous sentence to the connective position; if the connective occurs at the beginning of a sentence,labelingprevious sentence. End2End Disc Pars Eval
Exact Arg2 Results: Comparison Viewgraph P R F1 Baseline 0.53 0.46 0.49 Gold - Standard 0.84 0.74 0.79 Automatic 0.80 0.74 0.77 AutoConn+GoldSPT 0.82 0.70 0.76 GoldConn+AutoSPT 0.76 0.61 0.68 Lightweight(Auto) 0.72 0.56 0.63 End2End Disc Pars Eval
Exact Arg1 Results: Comparison Viewgraph P R F1 Baseline 0.19 0.19 0.19 Gold - Standard 0.68 0.39 0.49 Automatic 0.63 0.28 0.39 AutoConn+GoldSPT 0.67 0.31 0.43 GoldConn+AutoSPT 0.62 0.31 0.41 Lightweight(Auto) 0.60 0.27 0.37 End2End Disc Pars Eval
Features The IOB(Inside-Outside-Begin) chain all constituents on the path between the root note and the current leaf node of the tree. For example IOB chain feature for ``flashed“: I-S/E-VP/E-SBAR/E-S/C-VP , where B-, I-, E- and C- indicate whether the given token is respectively at the beginning, inside, at the end of the constituent, or a single token chunk. End2End Disc Pars Eval
Conclusion • The Automatic end2end system results nearly same with Gold standard • We lead towards a “lightweight” version of the pipeline – shallow & less dependence of SPTs • We wish to explore more features • We improved our result by 5 points for Arg1 classification using a previous sentence feature (Ghosh et al IJCNLP 2011) End2End Disc Pars Eval
Thank you SuchetaGhosh, Sara Tonelli, Giuseppe Riccardi, Richard Johansson Department of Information Engineering and Computer Science University of Trento, Italy {ghosh, riccardi}@disi.unitn.it End2End Disc Pars Eval
Previous Work Task limited to retrieving the argument heads (Wellner et al 2007, Elwell et al 2008) Dinesh et al. (2005) extracted complete arguments with boundaries, but only for a restricted class of connectives The identification of Arg1 has been only partially addressed in previous works (Prasad 2010) Automatic surface-sense classification (at class level) already reached the upper bound of inter-annotator agreement (Pitler and Nenkova, 2009) End2End Disc Pars Eval
Data & Tools Corpus Used: Penn Discourse Tree Bank (PDTB) For Gold Standard System: Penn Tree Bank (PTB) corpus is used Third party software/scripts used: Stanford Syntactic Tree Parser (by Klein & Manning 2003) AddDiscourse (Explicit Connective Classification) (Pitler and Nenkova 2008) ChunkLink.pl to extract IOB chains (by Sabine Buchholtz: CoNLL Shared Task 2000) RootExtractor: Syntactic Parse Tree (SPT) processors (by Richard Johansson) Morpha (Minnen et al 2001) Conditional Random Field: CRF++ by TakuKudo End2End Disc Pars Eval
Overall Architecture • Syntactic tree parser is used for automatic systems • Connective Detection and classification tool is used for automatic systems • PDTB & PTB are not used during end-to-end automatic testing phase End2End Disc Pars Eval
End2End Testing Phase End2End Disc Pars Eval
Conditional Random Field • We use the CRF++ tool (http://crfpp.sourceforge.net/) for sequence labeling classification (Lafferty et al., 2001), with second-order Markov dependency between tags. • Beside the individual specification of a feature in the feature description template, the features in various combinations are also represented. • We used this tool because the output of CRF++ is compatible to CoNLL 2000 chunking shared task, and we view our task as a discourse chunking task. • On the other hand, linear-chain CRFs for sequence labeling offer advantages over both generative models like HMMs and classifiers applied at each sequence position. Also Sha and Pereira (2003) claim that, as a single model, CRFs outperform other models for shallow parsing. End2End Disc Pars Eval
Hill Climbing Algorithm • function HILL-CLIMBING ( problem) returns a state that is a local maximum • current 9— MAKE-NODE(problem.INITIAL-STATE) • loop do • neighbor highest-valued successor of current • if (neighbor.VALUE< current.VALUE) then return current.STATE • current 9<— neighbor [Artificial Intelligence: Stuart J. Russel] • The hill climbing search algorithm, the most basic local search technique. At each step the current node is replaced by the best neighbor; • Here neighbor with the highest VALUE, but if a heuristic cost estimate h is used, we would find the neighbor with the lowest h. • Hill climbing is greedy, fast local search • We optimized this selected set with feature ablation technique, leaving 1 feature each time End2End Disc Pars Eval
Features The IOB(Inside-Outside-Begin) chain corresponds to the syntactic categories of all the constituents on the path between the root note and the current leaf node of the tree. The corresponding feature would be I-S/E-VP/E-SBAR/E-S/C-VP, where B-, I-, E- and C- indicate whether the given token is respectively at the beginning, inside, at the end of the constituent, or a single token chunk. In this case, ``flashed" is at the end of every constituent in the chain, except for the last VP, which dominates one single leaf. End2End Disc Pars Eval
Result: Gold-lbl & Auto AutomaticSys Output Gold-labeled Sys Output (Baseline result in blue color) End2End Disc Pars Eval
Combo Result Auto Conn + Gold SPT Gold Conn + Auto SPT End2End Disc Pars Eval
Result: replc. IOB chain End2End Disc Pars Eval