310 likes | 451 Views
ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey. Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures. Project Overview. Open research problem: Integrating syntactic parsing and semantic role labeling (SRL) Approach
E N D
ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures
Project Overview Open research problem: • Integrating syntactic parsing and semantic role labeling (SRL) Approach • Retraining a history-based generative lexicalized parser (Bikel, 2002) • Semantically-enriched training corpus (Penn Treebank + PropBank-derived semantic role annotations)
Semantic Roles • Relationship that a syntactic constituent has with a predicate • Predicate-argument relations • PropBank (Palmer et al., 2005)
PropBank Predicate-Argument Relations Frameset: hate.01 ARG0: experiencer ARG1: target
PropBank Argument Types • ARG0 - ARG5: arguments associated with a verb predicate, defined in the PropBank Frames scheme. • ARGM-XXX: adjunct-like arguments of various sorts, where XXX is the type of the adjunct. Types include locative (LOC), temporal (TMP) , manner (MNR), etc. • ARGA: causative agents. • rel: the verb of the proposition.
Current Approaches • Semantic role labeling (SRL) task: • Identify, given a verb: • which nodes of the syntactic tree are arguments of that verb, and • what semantic role each such argument plays with regard to the verb.
Current Approaches • “Pipelined” approach • Parsing → Pruning → ML-techniques → post-processing • CoNLL-2005 (Carreras and Márquez, 2005) • SVM, Random Fields, Random Forests, … • Various lexical parameters
An Integrated Approach to Semantic Parsing • Integrate syntactic and semantic parsing • Retrain parser using semantically-enriched corpus (Treebank + PropBank-derived semantic roles) • Parser itself performs semantic role labeling (SRL)
Project Components • “Off-the-shelf”: • Parser (Bikel, 2002) emulating Collins’ (1999) model 2 • Penn Treebank Release 2 (Marcus et al., 1993) • PropBank 1.0 (Palmer, 2005) • Written for project (mainly in Python): • Scripts to annotate Treebank with PropBank data • Script to generate new head-finding rules for Bikel’s parser • SRL evaluation scripts • Utility scripts (pre-processing, etc.)
Appending Semantic Roles to Treebank Syntactic Category Labels wsj/15/wsj_1568.mrg 16 2 gold hate.01 vn--a 0:1-ARG0 2:0-rel 3:1-ARG1
Syntactic Bracketing Evaluation • Parseval measures (Black, et al., 1992)
Syntactic Bracketing Evaluation • Harmonic mean of precision and recall:
Baseline Syntactic Bracketing Performance Parse Time: 114:41 Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)
Semantically-Augmented Treebanks • N: augment node labels with ARGNs only • N-C: augment node label with conflated ARGNs only • M: augment node labels with ARGMs only • M-C: augment node labels with conflated ARGMs only • NMR: augment node labels with ARGNs, ARGMs and rels
Syntactic Bracketing Evaluation Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)
Semantic Evaluation • Evaluating by terminal number and height • Evaluating by terminal span • How strictly to evaluate?
Semantic Role Labeling Evaluation Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)
Semantic Role Labeling Evaluation Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)
Adding More Information • Co-index the semantic role labels with governing predicate (verb) • i.e. include the appropriate roleset name in each semantic label augmentation
Adding More Information • Data sparseness • Time efficiency • Need to make some sort of generalizations • “Syntacto-semantic” verb classes • VerbNet (Kipper et al., 2002)
Future Ideas • Integrate the (un co-indexed) output from the re-trained parser into a pipelined SRL system • Syntactic parsing informed by semantic roles? • Recoding the parser to take better advantage of the semantic roles • Reranking n-best parser outputs based on semantic roles
Summary • Retrained a history-based generative lexicalized parser with semantically-enriched corpus • Corpus annotation • Generating head-finding rules • Evaluated parser’s performance • Syntactic parsing (evalb) • Semantic parsing (SRL)
References • Bikel, Daniel M. 2002. Design of a Multi-lingual, Parallel-processing Statistical Parsing Engine. In Proceedings of HLT2002, San Diego, California. • Black, Ezra, Frederick Jelinek, John D. Lafferty, David M. Magerman, Robert L. Mercer and Salim Roukos. 1992. Towards History-based Grammars: Using Richer Models for Probabilistic Parsing. In Proceedings DARPA Speech and Natural Language Workshop, Harriman, New York, pages 134-139. Morgan Kaufmann. • Carreras, Xavier and Lluís Màrquez. 2005. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In Proceedings of CoNLL-2005, pages152-164. • Collins, Michael John. 1999. Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia.
References • Kipper, Karin, Hoa Trang Dang and Martha Palmer. 2000. Class-Based Construction of a Verb Lexicon. In Proceedings of Seventeenth National Conference on Artificial Intelligence, Austin, Texas. • Marcus, Mitchell P., Beatrice Santroini and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313-330. • Palmer, Martha, Daniel Gildea and Paul Kingsbury. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1):71-106. • Yi, Szu-ting and Martha Palmer. 2005. The integration of syntactic parsing and semantic role labeling. In Proceedings of CoNLL-2005, pages 237-240.