Design and Evaluation of Semantic Similarity Measures for Concepts from the Same or Different Ontologies

Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies Euripides G.M. Petrakis Giannis Varelas Angelos Hliaoutakis Paraskevi Raftopoulou WMS'06, Chania, Crete

Semantic Similarity • Relates to computing the conceptual similarity between terms which are not necessarily lexicacally similar • “car”-“automobile”-“vehicle”, • “drug”- “medicine” • Tool for making knowledge commonly understandable in applications such as IR, information communication in general WMS'06, Chania, Crete

Methodology • Terms from different communicating sources are represented by ontologies • Map two terms to an ontology and compute their relationship in that ontology • Terms from different ontologies: Discover linguistic relationships or affinities between terms in different ontologies WMS'06, Chania, Crete

Contributions • We investigate several Semantic Similarity Methods and we evaluate their performance • http://www.intelligence.tuc.gr/similarity • We propose a novel semantic similarity measure for comparing concepts from different ontologies WMS'06, Chania, Crete

Ontologies • Tools of information representation on a subject • Hierarchical categorization of terms from general to most specific terms • object  artifact  construction  stadium • Domain Ontologies representing knowledge of a domain • e.g., MeSH medical ontology • General Ontologies representing common sense knowledge about the world • e.g., WordNet WMS'06, Chania, Crete

WordNet • A vocabulary and a thesaurus offering a hierarchical categorization of natural language terms • More than 100,000 terms • Nouns, verbs, adjectives and adverbs are grouped into synonym sets (synsets) • Synsets represent terms or concepts with similar meaning • stadium, bowl, arena, sports stadium – (a large structure for open-air sports or entertainments) WMS'06, Chania, Crete

WordNet Hierarchies • The synsets are also organized into senses • Senses: Different meanings of the same term • The synsets are related to other synsets higher or lower in the hierarchy by different types of relationships e.g. • Hyponym/Hypernym (Is-A relationships) • Meronym/Holonym (Part-Of relationships) • Nine noun and several verb Is-A hierarchies WMS'06, Chania, Crete

A Fragment of the WordNet Is-A Hierarchy WMS'06, Chania, Crete

MeSH • MeSH: ontology for medical and biological terms by the N.L.M. • Organized in IS-A hierarchies • More than 15 taxonomies, more than 22,000 terms • No part-of relationships • The terms are organized into synsets called “entry terms’’ WMS'06, Chania, Crete

A Fragment of the MeSH Is-A Hierarchy WMS'06, Chania, Crete

Semantic Similarity Methods • Map terms to an ontology and compute their relationship in that ontology • Four main categories of methods: • Edge counting: path length between terms • Information content: as a function of their probability of occurrence in a corpus • Feature based: similarity between their properties (e.g., definitions) or based on their relationships to other similar terms • Hybrid: combine the above ideas WMS'06, Chania, Crete

Example • Edge counting distance between “conveyance” and “ceramic” is 2 • An information content method, would associate the two terms with their common subsumer and with their probabilities of occurrence in a corpus WMS'06, Chania, Crete

X-Similarity • Relies on matching between synsets and set description sets • A,B: synsets or term description sets • Do the same with all IS-A, Part-Of relationships and take their maximum WMS'06, Chania, Crete

Example • S(Hypothyroidism, Hyperthyroidism) = 0.387 WMS'06, Chania, Crete

Evaluation • The most popular methods are evaluated • All methods applied on a set of 38 term pairs • Their similarity values are correlated with scores obtained by humans • The higher the correlation of a method the better the method is WMS'06, Chania, Crete

Evaluation on WordNet WMS'06, Chania, Crete

Evaluation on MeSH WMS'06, Chania, Crete

Cross Ontology Measures • We used 40 MeSH terms pairs • One of the terms is a also a WordNet term • We measured correlation with scores obtained by experts WMS'06, Chania, Crete

Comments • Edge counting/Info. Content methods work by exploiting structure information • Good methods take the position of the terms into account • Higher similarity for terms which are close together but lower in the hierarchy e.g., [Li et.al. 2003] • X – Similarity performs at least as good as other Feature-Based methods • Outperforms other Cross-Ontology methods WMS'06, Chania, Crete

Conclusions • Semantic similarity methods approximated the human notion of similarity reaching correlation up to 83% • Cross ontology similarity is a difficult problem that required further investigation • Work towards integrating Sem. Sim within IntelliSearch information Retrieval System for Web documents • http://www.intelligence.tuc.gr/intellisearch WMS'06, Chania, Crete

Try our system on the Web http://www.intelligence.tuc.gr/similarity Implementation: Giannis Varelas Spyros Argyropoulos WMS'06, Chania, Crete

www.intelligence.tuc.gr/similarity WMS'06, Chania, Crete

Design and Evaluation of Semantic Similarity Measures for Concepts from the Same or Different Ontologies

Design and Evaluation of Semantic Similarity Measures for Concepts from the Same or Different Ontologies

Presentation Transcript

The G.M. Case

EURIPIDES

Euripides’ Medea

Euripides’ Alcestis

Euripides

Euripides’ Medea

Evdoxios Baratis, Euripides G.M. Petrakis

G.M . JAHEDUL ISLAM

Sophocles Vs. Euripides

Euripides

G.M . JAHEDUL ISLAM

Euripides’ Phœnissæ

Euripides’ Bacchæ

Angelos Nikolaou

Euripides

Angelos Stergiou Biopharmaceuticals

Euripides’ Electra

Euripides and Women