200 likes | 307 Views
A Study of Hybrid Similarity Measures for Semantic Relation Extraction. Presenter : Bei -YI Jiang Authors : Universit´e catholique de Louvain, Belgium 2012 . Association for Computing Machinery. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation.
E N D
A Study of Hybrid Similarity Measures for Semantic Relation Extraction Presenter : Bei-YI JiangAuthors : Universit´e catholique de Louvain, Belgium2012. Association for Computing Machinery
Outlines • Motivation • Objectives • Methodology • Experiments • Conclusions • Comments
Motivation • The quality of the relationsprovided by existing extractors is still lower than the quality of the manually constructed relations. • Most studies are still not taking into account the whole range of existing measures, combining mostly sporadically different methods.
Objectives • To development of new relation extraction methods. • The method is a systematic analysis of 16 baseline measures, and their combinations with 8 fusion methodsand 3 techniques for the combination set selection.
Methodology • norm function • similarity scores • knn function
Methodology-Single Similarity Measures • Measures Based on a Semantic Network(5) • exploit the lengths of the shortest paths between terms in a network • probability of terms derived from a corpus • Wu and Palmer, Leacock and Chodorow, Resnik, Jiangand Conrath, and Lin
Methodology-Single Similarity Measures • Web-based Measures(3) • Web search engines • rely on the number of times the terms co-occur in the documents • Normalized Google Distance(NGD) • Measures of Semantic Relatedness(MSR) • YAHOO!, BING, GOOGLE over the domain wikipedia.org
Methodology-Single Similarity Measures • Corpus-based Measures(5) • Distributional Measures • Bag-of-words Distributional Analysis(BDA) • Syntactic Distributional Analysis(SDA) • Pattern-based Measure • PatternWiki • Other Corpus-based Measures • Latent Semantic Analysis(LSA) • Normalized Google Distance(NGD)
Methodology-Single Similarity Measures • Definition-based Measures(3) • WktWiki • Gloss Vectors • Extended Lesk
Methodology- Hybrid Similarity Measures • Combination Methods • Input:a set of similarity matrices{S1, . . . , SK} produced by K single measures • Output:a combined similarity matrix Scmb • 1. Mean • 2. Mean-Nnz • 3. Mean-Zscore • 4. Median • 5. Max • 6. Rank Fusion • 7. Relation Fusion • 8. Logit
Methodology- Hybrid Similarity Measures • Combination Methods • Mean. A mean of K pairwise similarity scores: • Mean-Nnz. A mean of those pairwisesimilarity scores which have a non-zero value:
Methodology- Hybrid Similarity Measures • Combination Methods • Mean-Zscore. A mean of K similarity scores transformed into Z-scores: • Median. A median of K pairwise similarities:
Methodology- Hybrid Similarity Measures • Combination Methods • Max. A maximum of K pairwise similarities: • Rank Fusion.
Methodology- Hybrid Similarity Measures • Combination Methods • Relation Fusion. • Logit.
Methodology- Hybrid Similarity Measures • Combination Sets • Expert choice of measures • Forward stepwise procedure • Logistic regression
Experiments • Evaluation • Human Judgements Datasets. • MC, RG,WordSim353 • Semantic Relations Datasets. • BLESS, SN
Conclusions • The results have shown that the hybrid measures outperform the single measures on all datasets. • A combination of 15 baseline corpus-, web-, network-, and dictionary-based measures with Logistic Regression provided the best results.
Comments • Advantages • higher performance • Applications