Measuring Semantic Similarity between Words Using HowNet

Measuring Semantic Similarity between Words Using HowNet ICCSIT 2008 Liuling DAI , Yuning XIA , Bin LIU , ShiKun WUSchool of Computer Science, Beijing Institute of Technology

HowNet • W_C=工夫 • DEF={Ability|能力:host={human|人}} • DEF={Strength|力量:host={group|群體}{human|人}} • DEF={time|時間} • Word : 工夫 • Concept : {Ability|能力:host={human|人}} • Sememe : Ability|能力

Algorithms • Similarity between sememes • Similarity between concepts • Similarity between words • Amendment with thesaurus

Similarity between sememes • Strategy 1 • Strategy 2 • d : Distance between S1 and S2 • h : Depth of the first common parent node of the two sememes • α , β : Parameters to adjust d,h

Similarity between concepts • Word “Doctor” • DEF={human|人:{own|有:possession={Status|身分:domain={education|教育},modifier={HighRank|高等:degree={most|最}}},possessor={~}}} • Human → Primary sememe • Status, own … → Modifying sememe • Possession , domain …→ Descriptors

Similarity between concepts • P , Q : Two concepts. Assume P has less number of modifying sememe. • P_i , Q_j : ith, jth modifying sememe of P , Q. • S , T : Descriptor set of P , Q • α,β,γ : Weight of 3 parts

Similarity between words • One word may has many concepts. • Choose the most similar pair.

Amendment with thesaurus • Some words are missing and some DEFs are too rough in in HowNet. • Using Chinese thesaurus TongyiciCilin(同義詞詞林)應為哈爾濱工業大學IR-Lab的哈工大信息檢索研究室同義詞詞林擴展版 • d : Distance between W1 and W2

Similarity between words • Sim1 : Eq. 6 (Similarity in HowNet) • Sim2 : Eq. 7 (Similarity in TongyiciCilin) • α,β,γ,η : Parameters to scale the weights of the two parts.

Evaluation • Dataset • RG-65 • Rubenstein and Goodenough established synonymy judgments for 65 pairs of nouns.They invited 51 human judges to assign every pair a score between 0.0 and 4.0 to indicate semantic similarity. • MC-28 • Miller and Charles follow this idea and restricted themselves to 30 pairs of nouns selected from Rubenstein and Goodenough’s list, divided equally amongst words with high, intermediate and low similarity. • For measuring similarity between Chinese words , translate RG-65 into Chinese manually.

Evaluation • Parameters • Similarity between sememes • Strategy 1 : α = 1.6 , β = 0.16 • Strategy 2 : α = 0.2 , β = 0.16 • Similarity between concepts • α = 0.54 , β = 0.36 , γ = 0.1 • Similarity between words • On Chinese dataset :α = 0.95,β = 0.05,γ = 0.95,η = 0.05 • On English dataset : α = 0.95,β = 0.05,γ = 0.45,η = 0.55

Result • HAPI : HowNet_Get_Concept_Similarity in HowNet API

Result • In addition, They compare results to eight groups of measures that rely on WordNet. • Table 1. Correlations coefficient of algorithms

RG-65

MC-30 & RG-30

Measuring Semantic Similarity between Words Using HowNet

Measuring Semantic Similarity between Words Using HowNet

Presentation Transcript

Measuring Semantic Similarity between Words Using Web Search Engines Danushka Bollegala, Yutaka Matsuo, Mitsuru Ishizuk

Align, Disambiguate, and Walk A Unified Approach for Measuring Semantic Similarity

Semantic Textual Similarity (STS) Workshop

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES [2009]

Semantic Similarity in a Taxonomy

Using Transportation Distances for Measuring Melodic Similarity

Experiments on Using Semantic Distances Between Words in Image Caption Retrieval

Evaluating semantic similarity using GML in Geographic Information Systems

An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC

Using Similarity

Algorithmic Detection of Semantic Similarity

Measuring the Semantic Web

Measuring the Semantic Similarity of Texts

MEASURING THE SIMILARITY BETWEEN IMPLICIT SEMANTIC RELATIONS USING WEB SEARCH ENGINES

Measuring the Structural Similarity of Semistructured Documents Using Entropy

Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words

Minimum Spanning Trees Displaying Semantic Similarity

A Knowledge-Rich Approach to Measuring the Similarity between Bulgarian and Russian Words

An Investigation into the Relationship between Semantic and Content Based Similarity Using LIDC

Semantic Similarity Measurement and Geographic Applications Similarity approaches