120 likes | 130 Views
This study explores the influence of automatic named-entity recognition on relation extraction, presenting a novel approach for extracting relations between named entities from natural language documents. The effectiveness of kernel methods and the impact of noise are evaluated through experiments.
E N D
Relation extraction and the influence of automatic named-entity recognition Presenter : Shao-Wei Cheng Authors : CLAUDIO GIULIANO, ALBERTO LAVELLI, and LORENZA ROMANO TSLP 2007
Outline • Motivation • Objective • Methodology • Named-entity recognition • Kernel Methods for Relation Extraction • Experiments • Conclusion • Personal Comments
Motivation • Information extraction aims at extracting structured information from unstructured or semi-structured textual documents. • As a matter of fact, NER performance is far from perfect, and its influence on relation-extraction performance is still an area of investigation. • Named Entity Recognition • Relation Extraction 3
Objectives • The authors present an approach for extracting relations between named entities from natural language documents. • Evaluated the effect of automatic named-entity recognition on a novel approach to relation extraction. • Relation Extraction • Named Entity Recognition If the relation held, then it is labeled 1, otherwise, it is labeled -1.
Methodology • Named Entity Recognition • Method:CRFs are provided in MALLET. • Processing • (a) the word itself, • (b) the PoS tag of the token, • (c) orthographic predicates • (d) gazetteers of locations, people names and organizations, • (e) character-n-gram predicates for 2 ≦ n ≦ 3. • MO:Corrected entities • MC:Entity boundaries known, but classification not. • MR&C:Entity boundaries and classification aren’t known. “The [New Deal]LOC describes the program of US president Franklin [D. Roosevelt]PER” 5
Methodology • Relation Extraction • Method:SVM. • Kernel methods: • KGC:Global Context Kernel • KLC :Local Context Kernel • KSL :Shallow Linguistic Kernel
Experiments • Dataset • From the papers of Roth and Yih • Evaluation • Cross-validation:Precision, Recall and F-measure • Statistical significance:approximate randomization. • Confidence interval:percentile bootstrap. • The effectiveness of the kernel method. • The influence of the noise. • Compare this approach against the method proposed in Roth and Yih. 7
Experiments • The effectiveness of the kernel method. • Relation extraction training and testing by the correct entities. • Testing by MC • Training by the correct entities. • * Training by the MC. • Testing by MR&C • Training by the correct entities. • * Training by the MR&C. 8
Training by the MO Training by the MR&C Experiments • The influence of the noise. 9
Experiments • Compare this approach against the method proposed in Roth and Yih. • The entities are correctly identified. • The entity boundaries are known. 10
Conclusion • The method has already demonstrated state-of-the-art performance when applied in the extraction of protein-protein interactions from biomedical literature. • The experiments reported that applied to the newswire domain, the combined kernel is still consistently superior, mainly in term of precision, to its basic parts and that it significantly outperforms previously proposed approaches even in presence of noise introduced by an automatic entity tagger. • Evaluate the contribution of syntactic information to relation extraction. • Extend the application of the proposed methodology to a different and wider set of relations. • The possibility of reducing the dimension of the training set using unsupervised technique.
Personal Comments • Advantage • … • Drawback • … • Application • Relation extraction • Named-entity recognition