1 / 24

Background

A Graph-based Approach to Named Entity Categorization in Wikipedia Using Conditional Random Fields Yotaro Watanabe, Masayuki Asahara and Yuji Matsumoto Nara Institute of Science and Technology EMNLP-CoNLL 2007 29 th June Prague, Czech. Background. Named Entity

Download Presentation

Background

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Graph-based Approach to Named Entity Categorization in Wikipedia Using Conditional Random FieldsYotaro Watanabe, Masayuki Asahara and Yuji MatsumotoNara Institute of Science and TechnologyEMNLP-CoNLL 2007 29th JunePrague, Czech

  2. Background • Named Entity • Proper nouns (e.g. Shinzo Abe (Person), Prague (Location)), time/date expressions (e.g. June 29 (Date)) and numerical expressions (e.g. 10%) • In many NLP applications (e.g. IE, QA), Named Entities play an important role • Named Entity Recognition task (NER) • Treated as sequential tagging problem • Machine learning methods have been proposed • Recall is usually low • Large scale NE dictionary is useful for NER Semi-automatic methods to compile NE dictionaries have been demanded

  3. Resource for NE dictionary construction • Wikipedia • Multi-lingual encyclopedia on the Web • 382,613 gloss articles (as of June 20, 2007, Japanese) • Gloss indices are composed by nouns or proper nouns • HTML (Semi-structured text) • Lists(<LI>) and Tables(<TABLE>) can be used as clues for NE type categorization • Linked articles are glossed by anchor texts in articles • Each article has one or more categories Wikipedia has useful information for NE categorization Can be considered as a suitable resource

  4. Objective • Extract Named Entities by assigning proper NE labels for gloss indices of Wikipedia Person Person Location Natural Object Organization Product

  5. Use of Wikipedia features • Features of Wikipedia articles • Anchors of an article refer to the other related articles • Anchors in list elements have dependencies each other • =>Make3 assumptions about dependencies between anchors Assumption 1 : The latter element in a list item tends to be in an attribute relation to the former element PERSON VOCATION • Burt Bacharach…composer • Dillard & Clark • Carpenters • Karen Carpenter ORGANIZATION ORGANIZATION Assumption 2 : The elements in the same itemization tends to be in the same NE category PERSON an example of a list structure Assumption 3 : The nested element tends to be in a part-of relation to the upper element

  6. Overview of our approach • Focus on HTML list structure in Wikipedia • Make 3 assumptions about dependencies between anchors • Formalize NE categorization problem as labeling NE classes to anchors in lists • Define 3 kinds of cliques (edges: Sibling, Cousin and Relative ) between anchors based on 3 assumptions • Construct graphs based on 3 defined cliques • CRFs for NE categorization in Wikipedia • Define potential functions over 3 edges (and nodes) to provide conditional distribution over the graphs • Estimate MAP label assignment over the graphs using Conditional Random Fields

  7. Conditional Random Fields (CRFs) y1 y2 y3 ・・・ yn x • Conditional Random Fields[Lafferty 2001] • Discriminative, Undirected Models • Define conditional distribution p(y|x) • Features • Arbitrary features can be used • Globally optimize on all possible label assignments • Can deal with label dependencies by defining potential functions for cliques (2 or more nodes)

  8. Use of dependencies for categorization • NE categorization problem as labeling classes to anchors • The edges of the constructed graphs corresponds to a particular dependency • Estimate MAP label assignment over the constructed graphs using Conditional Random Fields • Our formulation: Can extract anchors without gloss articles : article exists • Dillard & Clark..country rock • Carpenters • Karen Carpenter : article does not exist

  9. Clique definition based on HTML tree structure <UL> • Dillard & Clark…country rock • Carpenters • Karen Carpenter <LI> <LI> <A> <A> <A> <UL> Dillard & Clark country rock Carpenters Sibling The latter element tends to be in an attribute or a concept of the former element <LI> Sibling Cousin <A> Relative Cousin The elements tend to have a common NE category (e.g. ORGANIZATION) Karen Carpenter Use these 3 relations as cliques of CRFs Relative The latter element tends to be in a constituent part of the former element

  10. A graph constructed from 3 clique definitions S S C R C C C C S S R Sibling The latter element tends to be an attribute or a concept of the former element • Burt Bacharach…”On my own”…1986 • Dillard & Clark • Gene Clark • Carpenters …”As Time Goes By”…2000 • Karen Carpenter Cousin The elements tend to have a common attribute (e.g. ORGANIZATION) The latter element tends to be a constituent part of the former element Relative Estimate the MAP label assignment over the graph S : Sibling C : Cousin R : Relative

  11. Model : Potential function for Sibling, Cousin and Relative cliques : Potential function for nodes S S C • Constructed graphs include cycles : exact inference is computationally expensive • ->Introduce Tree-based Reparameterization (TRP)[Wainwright 2003] for approximate inference R C C C S S R

  12. Experiments • The aims of experiments are: • Compare graph-based approach (relational) to node-wise approach (independent) to investigate how the relational classification improves classification accuracy • Investigate the effect of defined cliques • Compare CRFs models to baseline models based on SVMs • Show the effectiveness of using marginal probability for filtering NE candidates.

  13. Dataset • Dataset • Randomly sampled 2300 articles (Japanese version as of October 2005) • Anchors in list elements(<LI>) are hand-annotated with NE class label • We used Extended Named Entity Hierarchy (Sekine et al. 2002) • We reduced the number of classes to 13 from the original 200+ in order to avoid data sparseness • Classification target :16136 (14285 of those are NEs)

  14. Experiments (CRFs) S S S S S S C C C R R R C C C C C C C C C C S S C S S S S C R R R SCR model SC model SR model CR model S S C R C C C S S C R S model R model I model C model • To investigate which clique type contributes classification accuracy: • We construct models that constitute of possible combinations of defined cliques • 8 models (SCR, SC, SR, CR, S, C, R, I) • Classification is performed on each connected subgraph

  15. Experimental settings (Baseline) , Evaluation I model • Baseline:Support Vector Machines (SVMs) [Vapnik 1998] • We perform two models: • I model: each anchor text is classified independently • P model: anchor texts are ordered by linear position in HTML, and performed history-based classification (j-1th classification result is used in j-th classification) • For multi-class classification : one-versus-rest • Evaluation • 5-fold cross validation, by F1-value P model

  16. Results (F1-value) ALL : whole dataset , no article : anchors without articles S S S S S S C C C R R R C C C C C C C C C C S S C S S S S C R R R SCR model SC model SR model CR model P model S S C R C C C S S C R S model R model I model I model C model SVMs CRFs

  17. Results (F1-value) Performed McNemar paired test on labeling disagreements => difference was significant (p < 0.01) 1. Graph-based vs. Node-wise ALL : whole dataset , no article : anchors without articles S S S S S S C C C R R R C C C C C C C C C C S S C S S S S C R R R SCR model SC model SR model CR model P model S S C R C C C S S C R S model R model I model I model C model SVMs CRFs

  18. Results (F1-value) 2. Which clique is most contributed? => Cousin clique ALL : whole dataset , no article : anchors without articles S S S S S S C C C R R R C C C C C C C C C C S S C S S S S C R R R SCR model SC model SR model CR model P model S S Cousin cliques provided the highest accuracy improvements compare to sibling and relative cliques C R C C C S S C R S model R model I model I model C model SVMs CRFs

  19. Results (F1-value) 3. CRFs vs. SVMs Significance Test: McNemar paired test on labeling disagreements ALL : whole dataset , no article : anchors without articles S S S S S S C C C R R R C C C C C C C C C C S S C S S S S C R R R SCR model SC model SR model CR model P model S S C R C C C S S C R S model R model I model I model C model SVMs CRFs

  20. Filtering NE candidates using marginal probability • Construct dictionaries from extracted NE candidates • Methods with lower cost are desirable • Extract only confident NE candidates -> Use of marginal probability that provided by CRFs • Marginal probability • probability of a particular label assignment for a node • This can be regarded as “confidence” of a classifier yi

  21. Precision-Recall Curve At this point, recall value is about 0.57 and precision value is about 0.97 Using the proper thresholding of marginal probability, NE dictionary can be constructed with lower cost Precision-Recall curve obtained by thresholding the marginal probability of the MAP estimation in the CR model of CRFs

  22. Summary and future work • Summary • Proposed a method for categorizing NEs in Wikipedia • Defined 3 kinds of cliques (Sibling, Cousin and Relative) over HTML tree • Graph-based model achieved significant improvements compare to Node-wise model, and baseline methods (SVMs) • NEs can be extracted with lower cost by exploiting marginal probability

  23. Summary and Future work • Future work • Use fine-grained NE classes • For many NLP applications (e.g. QA, IE), NE dictionary with fine grained label sets will be a useful resource • Classification with statistical methods becomes difficult in case that the label set is large, because of the insufficient positive examples • Incorporate hierarchical structure of label sets into our models (Hierarchical Classification) • Previous work suggest that exploiting hierarchical structure of label sets improve classification accuracy

  24. Thank you.

More Related