Exploiting Constituent Dependencies for Tree Kernel-based Semantic Relation Extraction

Exploiting Constituent Dependencies for Tree Kernel-based Semantic Relation Extraction Longhua Qian School of Computer Science and Technology Soochow University, Suzhou, China 19 Aug. 2008 COLING 2008, Manchester, UK

Outline • 1. Introduction • 2. Related Work • 3. Dynamic Syntactic Parse Tree • 4. Entity-related Semantic Tree • 5. Experimental results • 6. Conclusion and Future Work

1. Introduction • Information extraction is an important research topic in NLP. • It attempts to find relevant information from a large amount of text documents available in digital archives and the WWW. • Information extraction by NIST ACE • Entity Detection and Tracking (EDT) • Relation Detection and Characterization (RDC) • Event Detection and Characterization (EDC)

RDC • Function • RDC detects and classifies semantic relationships (usually of predefined types) between pairs of entities.Relation extraction is very useful for a wide range of advanced NLP applications, such as question answering and text summarization. • E.g. • The sentence “Microsoft Corp. is based in Redmond, WA” conveys the relation “GPE-AFF.Based” between “Microsoft Corp” (ORG) and “Redmond” (GPE).

2. Related work • Feature-based methods • have dominated the research in relation extraction over the past years. However, relevant research shows that it’s difficult to extract new effective features and further improve the performance. • Kernel-based methods • compute the similarity of two objects (e.g. parse trees) directly. The key problem is how to represent and capture structured information in complex structures, such as the syntactic information in the parse tree for relation extraction.

Kernel-based related work • Zelenko et al. (2003), Culotta and Sorensen (2004), Bunescu and Mooney (2005) described several kernels between shallow parse trees or dependency trees to extract semantic relations. • Zhang et al. (2006), Zhou et al. (2007) proposed composite kernels consisting of a linear kernel and a convolution parse tree kernel, with the latter effectively capture structured syntactic information inherent in parse trees.

Structured syntactic information • A tree span for relation instance • part of a parse tree used to represent the structured syntactic information including two involved entities. • Two currently used tree spans • SPT(Shortest Path-enclosed Tree): the sub-tree enclosed by the shortest path linking the two entities in the parse tree (Zhang et al., 2006) • CS-SPT(Context-Sensitive Shortest Path-enclosed Tree): Dynamically determined by further extending the necessary predicate-linked path information outside SPT. (Zhou et al., 2007)

Current problems • Noisy information • Both SPT and CS-SPT may still contain noisy information. In other words, more noise could be pruned away from these tree spans. • Useful information • CS-SPT only captures part of context-sensitive information only relating to predicate-linked path. That is to say, more information outside SPT/CS-SPT may be recovered so as to discern their relationships.

Our solution • Dynamic Syntactic Parse Tree (DSPT) • Based on MCT (Minimum Complete Tree), we exploit constituent dependencies to dynamically prune out noisy information from a syntactic parse tree and include necessary contextual information. • Unified Parse and Semantic Tree (UPST) • Instead of constructing composite kernels, various kinds of entity-related semantic information, are unified into a Dynamic Parse and Semantic Tree.

3. Dynamic Syntactic Parse Tree • Motivation of DSPT • Dependency plays a key role in relation extraction, e.g. the dependency tree (Culotta and Sorensen, 2004) or the shortest dependency path (Bunescu and Mooney, 2005). • Constituent dependencies • In a parse tree, each CFG rule has the following form: • P  Ln…L1 H R1…Rm • Where the parent node P depends on the head childH,this is what we callconstituent dependency. • Our hypothesis stipulates that the contribution of the parse tree to establishing a relationship is almost exclusively concentrated in the path connecting the two entities, as well as the head children of constituent nodes along this path.

Generation of DSPT • Starting from the Minimum Complete Tree, along the path connecting two entities, the head child of every node is found according to various constituent dependencies. • Then the path nodes and their head children are kept while any other nodes are removed from the parse tree. • Eventually we arrive at a tree span called Dynamic Syntactic Parse Tree (DSPT)

Constituent dependencies (1) • Modification within base-NPs • Base-NPs do not directly dominate an NP themselves • Hence, all the constituents before the headword may be removed from the parse tree, while the headword and the constituents right after the headword remain unchanged. • Modification to NPs • Contrary to the first one, these NPs are recursive, meaning that they contain another NP as their child. They usually appear as follows: • NPNP SBAR [relative clause] • NPNP VP [reduced relative] • NPNP PP [PP attachment] • In this case, the right side (e.g. “NP VP”) can be reduced to the left hand side, which is exactly a single NP.

Constituent dependencies (2) • Arguments/adjuncts to verbs: • This type includes the CFG rules in which the left side contains S, SBAR or VP. Both arguments and adjuncts depend on the verb and could be removed if they are not included in the path connecting the two entities. • Coordination conjunctions: • In coordination constructions, several peer conjuncts may be reduced into a single constituent, for we think all the conjuncts play an equal role in relation extraction. • Modification to other constituents: • Except for the above four types, other CFG rules fall into this type, such as modification to PP, ADVP and PRN etc. These cases occur much less frequently than others.

Some examples of DSPT

4.Entity-related Semantic Tree For the example sentence “they ’re here”, which is excerpted from the ACE RDC 2004 corpus, there exists a relationship “Physical.Located” between the entities “they” [PER] and “here” [GPE.Population-Center]. The features are encoded as “TP”, “ST”, “MT” and “PVB”, which denote type, subtype, mention-type of the two entities, and the base form of predicate verb if existing (nearest to the 2nd entity along the path connecting the two entities) respectively.

Three EST setups (a) Bag of Features (BOF): all feature nodes uniformly hang under the root node, so the tree kernel simply counts the number of common features between two relation instances. (b) Feature-Paired Tree (FPT): the features of two entities are grouped into different types according to their feature names, e.g. “TP1” and “TP2” are grouped to “TP”. This tree setup is aimed to capture the additional similarity of the single feature combined from different entities, i.e., the first and the second entities. (c) Entity-Paired Tree (EPT): all the features relating to an entity are grouped to nodes “E1” or “E2”, thus this tree kernel can further explore the equivalence of combined entity features only relating to one of the entities between two relation instances.

Construction of UPST • Motivation • we incorporate the EST into the DSPT to produce a Unified Parse and Semantic Tree (UPST) to investigate the contribution of the EST to relation extraction. • How • Detailed evaluation (Qian et al., 2007) indicates that the kernel achieves the best performance when the feature nodes are attached under the top node. • Therefore, we also attach three kinds of entity-related semantic trees (i.e. BOF, FPT and EPT) under the top node of the DSPT right after its original children.

5. Experimental results • Corpus Statistics • The ACE RDC 2004 data contains 451 documents and 5702 relation instances. It defines 7 entity major types, 7 major relation types and 23 relation subtypes. • Evaluation is done on 347 (nwire/bnews) documents and 4307 relation instances using 5-fold cross-validation. • Corpus processing • parsed using Charniak’s parser (Charniak, 2001) • Relation instances are generated by iterating over all pairs of entity mentions occurring in the same sentence.

Classifier • Tools • SVMLight (Joachims 1998) • Tree Kernel Toolkits (Moschitti 2004) • The training parameters C (SVM) and λ (tree kernel) are also set to 2.4 and 0.4 respectively. • One vs. others strategy • which builds K basic binary classifiers so as to separate one class from all the others.

Contributions of various dependencies • Two modes: • --[M1] Respective: every constituent dependency is individually applied on MCT. • --[M2] Accumulative: every constituent dependency is incrementally applied on the previously derived tree span, which begins with the MCT and eventually gives rise to a Dynamic Syntactic Parse Tree (DSPT).

Contributions of various dependency • The table shows that the final DSPT achieves the best performance of 77.4%/65.4%/70.9in precision/recall/F-measure respectively after applying all the dependencies, with the increase of F-measure by 8.2 units over the baseline MCT. • This indicates that reshaping the tree by exploiting constituent dependencies may significantly improve extraction accuracy largely due to the increase in recall. • And modification within base-NPs contributes most to performance improvement, acquiring the increase of F-measure by 4.4 units. This indicates the local characteristic of semantic relations, which can be effectively captured by NPs around the two involved entities in the DSPT.

Comparison of different UPST setups • Compared with DSPT, Unified Parse and Semantic Trees (UPSTs) significantly improve the F-measure by average ~4 units due to the increase both in precision and recall. • Among the three UPSTs, UPST (FPT) achieves slightly better performance than the other two setups.

Improvements of different tree setups over SPT • It shows that Dynamic Syntactic Parse Tree (DSPT) outperforms both SPT and CS-SPT setups. • Unified Parse and Semantic Tree with Feature-Paired Tree performs best among all tree setups.

Comparison with best-reported systems • It shows that Our composite kernel achieves the so far best performance. • And our UPST performs best among tree setups using one single kernel, and even better than the two previous composite kernels.

6. Conclusion • Dynamic Syntactic Parse Tree (DPST), which is generated by exploiting constituent dependencies, can significantly improve the performance over currently used tree spans for relation extraction. • In addition to individual entity features, combined entity features (especially bi-gram) contribute much when they are integrated with a DPST into a Unified Parse and Semantic Tree.

Future Work • we will focus on improving performance of complex structured parse trees, where the path connecting the two entities involved in a relationship is too long for current kernel methods to take effect. • Our preliminary experiment of applying some discourse theory exhibits certain positive results.

References • Bunescu R. C. and Mooney R. J. 2005. A Shortest Path Dependency Kernel for Relation Extraction. EMNLP-2005 • Chianiak E. 2001. Intermediate-head Parsing for Language Models. ACL-2001 • Collins M. and Duffy N. 2001. Convolution Kernels for Natural Language. NIPS-2001 • Collins M. and Duffy, N. 2002. New Ranking Algorithm for Parsing and Tagging: Kernel over Discrete Structure, and the Voted Perceptron. ACL-02 • Culotta A. and Sorensen J. 2004. Dependency tree kernels for relation extraction. ACL’2004. • Joachims T. 1998. Text Categorization with Support Vector Machine: learning with many relevant features. ECML-1998 • Moschitti A. 2004. A Study on Convolution Kernels for Shallow Semantic Parsing. ACL-2004 • Qian, Longhua, Guodong Zhou, Qiaoming Zhu and Peide Qian. 2007. Relation Extraction using Convolution Tree Kernel Expanded with Entity Features. PACLIC21 • Zelenko D., Aone C. and Richardella A. 2003. Kernel Methods for Relation Extraction. Journal of MachineLearning Research. 2003(2): 1083-1106 • Zhang M., , Zhang J. Su J. and Zhou G.D. 2006. A Composite Kernel to Extract Relations between Entities with both Flat and Structured Features. COLING-ACL’2006. • Zhao S.B. and Grisman R. 2005. Extracting relations with integrated information using kernel methods. ACL’2005. • Zhou G.D., Su J., Zhang J. and Zhang M. 2005. Exploring various knowledge in relation extraction. ACL’2005. • Zhou, Guodong, Min Zhang, Donghong Ji and Qiaoming Zhu. 2007. Tree Kernel-based Relation Extraction with Context-Sensitive Structured Parse Tree Information. EMNLP/CoNLL-2007

End Thank You!

Exploiting Constituent Dependencies for Tree Kernel-based Semantic Relation Extraction

Exploiting Constituent Dependencies for Tree Kernel-based Semantic Relation Extraction

Presentation Transcript

Semantic Relation Extraction for Linking Named Entities to Biomedical Databases

Relation Extraction

Relation Extraction

Information Extraction Lecture 7 – Relation Extraction

Exploiting Statistical Dependencies in Sparse Representations

Kernel Methods for Relation Extraction

Tree Kernel-based Semantic Relation Extraction using Unified Dynamic Relation Tree

Relation Extraction

Lecture 14 Relation Extraction

Robust Semantic Processing for Information Extraction

Exploiting Geographic Dependencies for Real Estate Appraisal

Relation Extraction

Information Extraction Lecture 7 – Relation Extraction

Exploiting Under-specification for Semantic Co-ordination

A Study of Hybrid Similarity Measures for Semantic Relation Extraction

Exploiting Background Knowledge for Relation Extraction

Relation Extraction

Palm Kernel Oil Extraction