1 / 15

Measuring Semantic Similarity between Words Using HowNet

Measuring Semantic Similarity between Words Using HowNet. ICCSIT 2008 Liuling DAI , Yuning XIA , Bin LIU , ShiKun WU School of Computer Science, Beijing Institute of Technology. HowNet. W_C= 工夫 DEF={Ability| 能力 :host={human| 人 }} DEF={Strength| 力量 :host={group| 群體 }{human| 人 }}

huey
Download Presentation

Measuring Semantic Similarity between Words Using HowNet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Measuring Semantic Similarity between Words Using HowNet ICCSIT 2008 Liuling DAI , Yuning XIA , Bin LIU , ShiKun WUSchool of Computer Science, Beijing Institute of Technology

  2. HowNet • W_C=工夫 • DEF={Ability|能力:host={human|人}} • DEF={Strength|力量:host={group|群體}{human|人}} • DEF={time|時間} • Word : 工夫 • Concept : {Ability|能力:host={human|人}} • Sememe : Ability|能力

  3. Algorithms • Similarity between sememes • Similarity between concepts • Similarity between words • Amendment with thesaurus

  4. Similarity between sememes • Strategy 1 • Strategy 2 • d : Distance between S1 and S2 • h : Depth of the first common parent node of the two sememes • α , β : Parameters to adjust d,h

  5. Similarity between concepts • Word “Doctor” • DEF={human|人:{own|有:possession={Status|身分:domain={education|教育},modifier={HighRank|高等:degree={most|最}}},possessor={~}}} • Human → Primary sememe • Status, own … → Modifying sememe • Possession , domain …→ Descriptors

  6. Similarity between concepts • P , Q : Two concepts. Assume P has less number of modifying sememe. • P_i , Q_j : ith, jth modifying sememe of P , Q. • S , T : Descriptor set of P , Q • α,β,γ : Weight of 3 parts

  7. Similarity between words • One word may has many concepts. • Choose the most similar pair.

  8. Amendment with thesaurus • Some words are missing and some DEFs are too rough in in HowNet. • Using Chinese thesaurus TongyiciCilin(同義詞詞林)應為哈爾濱工業大學IR-Lab的哈工大信息檢索研究室同義詞詞林擴展版 • d : Distance between W1 and W2

  9. Similarity between words • Sim1 : Eq. 6 (Similarity in HowNet) • Sim2 : Eq. 7 (Similarity in TongyiciCilin) • α,β,γ,η : Parameters to scale the weights of the two parts.

  10. Evaluation • Dataset • RG-65 • Rubenstein and Goodenough established synonymy judgments for 65 pairs of nouns.They invited 51 human judges to assign every pair a score between 0.0 and 4.0 to indicate semantic similarity. • MC-28 • Miller and Charles follow this idea and restricted themselves to 30 pairs of nouns selected from Rubenstein and Goodenough’s list, divided equally amongst words with high, intermediate and low similarity. • For measuring similarity between Chinese words , translate RG-65 into Chinese manually.

  11. Evaluation • Parameters • Similarity between sememes • Strategy 1 : α = 1.6 , β = 0.16 • Strategy 2 : α = 0.2 , β = 0.16 • Similarity between concepts • α = 0.54 , β = 0.36 , γ = 0.1 • Similarity between words • On Chinese dataset :α = 0.95,β = 0.05,γ = 0.95,η = 0.05 • On English dataset : α = 0.95,β = 0.05,γ = 0.45,η = 0.55

  12. Result • HAPI : HowNet_Get_Concept_Similarity in HowNet API

  13. Result • In addition, They compare results to eight groups of measures that rely on WordNet. • Table 1. Correlations coefficient of algorithms

  14. RG-65

  15. MC-30 & RG-30

More Related