1 / 20

A Study of Hybrid Similarity Measures for Semantic Relation Extraction

A Study of Hybrid Similarity Measures for Semantic Relation Extraction. Presenter : Bei -YI Jiang Authors : Universit´e catholique de Louvain, Belgium 2012 . Association for Computing Machinery. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.

Download Presentation

A Study of Hybrid Similarity Measures for Semantic Relation Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Study of Hybrid Similarity Measures for Semantic Relation Extraction Presenter : Bei-YI JiangAuthors : Universit´e catholique de Louvain, Belgium2012. Association for Computing Machinery

  2. Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments

  3. Motivation • The quality of the relationsprovided by existing extractors is still lower than the quality of the manually constructed relations. • Most studies are still not taking into account the whole range of existing measures, combining mostly sporadically different methods.

  4. Objectives • To development of new relation extraction methods. • The method is a systematic analysis of 16 baseline measures, and their combinations with 8 fusion methodsand 3 techniques for the combination set selection.

  5. Methodology • norm function • similarity scores • knn function

  6. Methodology-Single Similarity Measures • Measures Based on a Semantic Network(5) • exploit the lengths of the shortest paths between terms in a network • probability of terms derived from a corpus • Wu and Palmer, Leacock and Chodorow, Resnik, Jiangand Conrath, and Lin

  7. Methodology-Single Similarity Measures • Web-based Measures(3) • Web search engines • rely on the number of times the terms co-occur in the documents • Normalized Google Distance(NGD) • Measures of Semantic Relatedness(MSR) • YAHOO!, BING, GOOGLE over the domain wikipedia.org

  8. Methodology-Single Similarity Measures • Corpus-based Measures(5) • Distributional Measures • Bag-of-words Distributional Analysis(BDA) • Syntactic Distributional Analysis(SDA) • Pattern-based Measure • PatternWiki • Other Corpus-based Measures • Latent Semantic Analysis(LSA) • Normalized Google Distance(NGD)

  9. Methodology-Single Similarity Measures • Definition-based Measures(3) • WktWiki • Gloss Vectors • Extended Lesk

  10. Methodology- Hybrid Similarity Measures • Combination Methods • Input:a set of similarity matrices{S1, . . . , SK} produced by K single measures • Output:a combined similarity matrix Scmb • 1. Mean • 2. Mean-Nnz • 3. Mean-Zscore • 4. Median • 5. Max • 6. Rank Fusion • 7. Relation Fusion • 8. Logit

  11. Methodology- Hybrid Similarity Measures • Combination Methods • Mean. A mean of K pairwise similarity scores: • Mean-Nnz. A mean of those pairwisesimilarity scores which have a non-zero value:

  12. Methodology- Hybrid Similarity Measures • Combination Methods • Mean-Zscore. A mean of K similarity scores transformed into Z-scores: • Median. A median of K pairwise similarities:

  13. Methodology- Hybrid Similarity Measures • Combination Methods • Max. A maximum of K pairwise similarities: • Rank Fusion.

  14. Methodology- Hybrid Similarity Measures • Combination Methods • Relation Fusion. • Logit.

  15. Methodology- Hybrid Similarity Measures • Combination Sets • Expert choice of measures • Forward stepwise procedure • Logistic regression

  16. Experiments • Evaluation • Human Judgements Datasets. • MC, RG,WordSim353 • Semantic Relations Datasets. • BLESS, SN

  17. Experiments

  18. Experiments

  19. Conclusions • The results have shown that the hybrid measures outperform the single measures on all datasets. • A combination of 15 baseline corpus-, web-, network-, and dictionary-based measures with Logistic Regression provided the best results.

  20. Comments • Advantages • higher performance • Applications

More Related