220 likes | 394 Views
Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies. Euripides G.M. Petrakis Giannis Varelas Angelos Hliaoutakis Paraskevi Raftopoulou. Semantic Similarity.
E N D
Design and Evaluation of Semantic Similarity Measures for Concepts Stemming from the Same or Different Ontologies Euripides G.M. Petrakis Giannis Varelas Angelos Hliaoutakis Paraskevi Raftopoulou WMS'06, Chania, Crete
Semantic Similarity • Relates to computing the conceptual similarity between terms which are not necessarily lexicacally similar • “car”-“automobile”-“vehicle”, • “drug”- “medicine” • Tool for making knowledge commonly understandable in applications such as IR, information communication in general WMS'06, Chania, Crete
Methodology • Terms from different communicating sources are represented by ontologies • Map two terms to an ontology and compute their relationship in that ontology • Terms from different ontologies: Discover linguistic relationships or affinities between terms in different ontologies WMS'06, Chania, Crete
Contributions • We investigate several Semantic Similarity Methods and we evaluate their performance • http://www.intelligence.tuc.gr/similarity • We propose a novel semantic similarity measure for comparing concepts from different ontologies WMS'06, Chania, Crete
Ontologies • Tools of information representation on a subject • Hierarchical categorization of terms from general to most specific terms • object artifact construction stadium • Domain Ontologies representing knowledge of a domain • e.g., MeSH medical ontology • General Ontologies representing common sense knowledge about the world • e.g., WordNet WMS'06, Chania, Crete
WordNet • A vocabulary and a thesaurus offering a hierarchical categorization of natural language terms • More than 100,000 terms • Nouns, verbs, adjectives and adverbs are grouped into synonym sets (synsets) • Synsets represent terms or concepts with similar meaning • stadium, bowl, arena, sports stadium – (a large structure for open-air sports or entertainments) WMS'06, Chania, Crete
WordNet Hierarchies • The synsets are also organized into senses • Senses: Different meanings of the same term • The synsets are related to other synsets higher or lower in the hierarchy by different types of relationships e.g. • Hyponym/Hypernym (Is-A relationships) • Meronym/Holonym (Part-Of relationships) • Nine noun and several verb Is-A hierarchies WMS'06, Chania, Crete
A Fragment of the WordNet Is-A Hierarchy WMS'06, Chania, Crete
MeSH • MeSH: ontology for medical and biological terms by the N.L.M. • Organized in IS-A hierarchies • More than 15 taxonomies, more than 22,000 terms • No part-of relationships • The terms are organized into synsets called “entry terms’’ WMS'06, Chania, Crete
A Fragment of the MeSH Is-A Hierarchy WMS'06, Chania, Crete
Semantic Similarity Methods • Map terms to an ontology and compute their relationship in that ontology • Four main categories of methods: • Edge counting: path length between terms • Information content: as a function of their probability of occurrence in a corpus • Feature based: similarity between their properties (e.g., definitions) or based on their relationships to other similar terms • Hybrid: combine the above ideas WMS'06, Chania, Crete
Example • Edge counting distance between “conveyance” and “ceramic” is 2 • An information content method, would associate the two terms with their common subsumer and with their probabilities of occurrence in a corpus WMS'06, Chania, Crete
X-Similarity • Relies on matching between synsets and set description sets • A,B: synsets or term description sets • Do the same with all IS-A, Part-Of relationships and take their maximum WMS'06, Chania, Crete
Example • S(Hypothyroidism, Hyperthyroidism) = 0.387 WMS'06, Chania, Crete
Evaluation • The most popular methods are evaluated • All methods applied on a set of 38 term pairs • Their similarity values are correlated with scores obtained by humans • The higher the correlation of a method the better the method is WMS'06, Chania, Crete
Evaluation on WordNet WMS'06, Chania, Crete
Evaluation on MeSH WMS'06, Chania, Crete
Cross Ontology Measures • We used 40 MeSH terms pairs • One of the terms is a also a WordNet term • We measured correlation with scores obtained by experts WMS'06, Chania, Crete
Comments • Edge counting/Info. Content methods work by exploiting structure information • Good methods take the position of the terms into account • Higher similarity for terms which are close together but lower in the hierarchy e.g., [Li et.al. 2003] • X – Similarity performs at least as good as other Feature-Based methods • Outperforms other Cross-Ontology methods WMS'06, Chania, Crete
Conclusions • Semantic similarity methods approximated the human notion of similarity reaching correlation up to 83% • Cross ontology similarity is a difficult problem that required further investigation • Work towards integrating Sem. Sim within IntelliSearch information Retrieval System for Web documents • http://www.intelligence.tuc.gr/intellisearch WMS'06, Chania, Crete
Try our system on the Web http://www.intelligence.tuc.gr/similarity Implementation: Giannis Varelas Spyros Argyropoulos WMS'06, Chania, Crete
www.intelligence.tuc.gr/similarity WMS'06, Chania, Crete