1 / 31

Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey. Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures. Project Overview. Open research problem: Integrating syntactic parsing and semantic role labeling (SRL) Approach

ramiro
Download Presentation

Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACL4 Project NCLT Seminar Presentation, 7th June 2006 Conor Cafferkey Towards Parsing Unrestricted Text into PropBank Predicate-Argument Structures

  2. Project Overview Open research problem: • Integrating syntactic parsing and semantic role labeling (SRL) Approach • Retraining a history-based generative lexicalized parser (Bikel, 2002) • Semantically-enriched training corpus (Penn Treebank + PropBank-derived semantic role annotations)

  3. Treebank Syntactic Bracketing Style

  4. Treebank Syntactic Bracketing Style

  5. Semantic Roles • Relationship that a syntactic constituent has with a predicate • Predicate-argument relations • PropBank (Palmer et al., 2005)

  6. PropBank Predicate-Argument Relations Frameset: hate.01 ARG0: experiencer ARG1: target

  7. PropBank Argument Types • ARG0 - ARG5: arguments associated with a verb predicate, defined in the PropBank Frames scheme. • ARGM-XXX: adjunct-like arguments of various sorts, where XXX is the type of the adjunct. Types include locative (LOC), temporal (TMP) , manner (MNR), etc. • ARGA: causative agents. • rel: the verb of the proposition.

  8. Current Approaches • Semantic role labeling (SRL) task: • Identify, given a verb: • which nodes of the syntactic tree are arguments of that verb, and • what semantic role each such argument plays with regard to the verb.

  9. Current Approaches • “Pipelined” approach • Parsing → Pruning → ML-techniques → post-processing • CoNLL-2005 (Carreras and Márquez, 2005) • SVM, Random Fields, Random Forests, … • Various lexical parameters

  10. An Integrated Approach to Semantic Parsing • Integrate syntactic and semantic parsing • Retrain parser using semantically-enriched corpus (Treebank + PropBank-derived semantic roles) • Parser itself performs semantic role labeling (SRL)

  11. Project Components • “Off-the-shelf”: • Parser (Bikel, 2002) emulating Collins’ (1999) model 2 • Penn Treebank Release 2 (Marcus et al., 1993) • PropBank 1.0 (Palmer, 2005) • Written for project (mainly in Python): • Scripts to annotate Treebank with PropBank data • Script to generate new head-finding rules for Bikel’s parser • SRL evaluation scripts • Utility scripts (pre-processing, etc.)

  12. Appending Semantic Roles to Treebank Syntactic Category Labels wsj/15/wsj_1568.mrg 16 2 gold hate.01 vn--a 0:1-ARG0 2:0-rel 3:1-ARG1

  13. Syntactic Bracketing Evaluation • Parseval measures (Black, et al., 1992)

  14. Syntactic Bracketing Evaluation • Harmonic mean of precision and recall:

  15. Baseline Syntactic Bracketing Performance Parse Time: 114:41 Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)

  16. Semantically-Augmented Treebanks • N: augment node labels with ARGNs only • N-C: augment node label with conflated ARGNs only • M: augment node labels with ARGMs only • M-C: augment node labels with conflated ARGMs only • NMR: augment node labels with ARGNs, ARGMs and rels

  17. Syntactic Bracketing Evaluation Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)

  18. Semantic Evaluation

  19. Semantic Evaluation • Evaluating by terminal number and height • Evaluating by terminal span • How strictly to evaluate?

  20. Semantic Role Labeling Evaluation Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)

  21. Semantic Role Labeling Evaluation Parsing Section 00, trained with sections 02-21 of Penn Treebank (1918 sentences)

  22. Syntactic Nodes that Play Multiple Semantic Roles

  23. Adding More Information • Co-index the semantic role labels with governing predicate (verb) • i.e. include the appropriate roleset name in each semantic label augmentation

  24. Co-indexing the Semantic Augmentations

  25. Adding More Information • Data sparseness • Time efficiency • Need to make some sort of generalizations • “Syntacto-semantic” verb classes • VerbNet (Kipper et al., 2002)

  26. Co-indexing with VerbNet classes

  27. Future Ideas • Integrate the (un co-indexed) output from the re-trained parser into a pipelined SRL system • Syntactic parsing informed by semantic roles? • Recoding the parser to take better advantage of the semantic roles • Reranking n-best parser outputs based on semantic roles

  28. Summary • Retrained a history-based generative lexicalized parser with semantically-enriched corpus • Corpus annotation • Generating head-finding rules • Evaluated parser’s performance • Syntactic parsing (evalb) • Semantic parsing (SRL)

  29. References • Bikel, Daniel M. 2002. Design of a Multi-lingual, Parallel-processing Statistical Parsing Engine. In Proceedings of HLT2002, San Diego, California. • Black, Ezra, Frederick Jelinek, John D. Lafferty, David M. Magerman, Robert L. Mercer and Salim Roukos. 1992. Towards History-based Grammars: Using Richer Models for Probabilistic Parsing. In Proceedings DARPA Speech and Natural Language Workshop, Harriman, New York, pages 134-139. Morgan Kaufmann. • Carreras, Xavier and Lluís Màrquez. 2005. Introduction to the CoNLL-2005 Shared Task: Semantic Role Labeling. In Proceedings of CoNLL-2005, pages152-164. • Collins, Michael John. 1999. Head-driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania, Philadelphia.

  30. References • Kipper, Karin, Hoa Trang Dang and Martha Palmer. 2000. Class-Based Construction of a Verb Lexicon. In Proceedings of Seventeenth National Conference on Artificial Intelligence, Austin, Texas. • Marcus, Mitchell P., Beatrice Santroini and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313-330. • Palmer, Martha, Daniel Gildea and Paul Kingsbury. 2005. The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Linguistics, 31(1):71-106. • Yi, Szu-ting and Martha Palmer. 2005. The integration of syntactic parsing and semantic role labeling. In Proceedings of CoNLL-2005, pages 237-240.

  31. http://student.dcu.ie/~cafferc2/

More Related