150 likes | 276 Views
Evaluating a Generalization of the Winkler Extension in the Context of Ontology Mapping. Maurice Hermans. Outline. Ontologies Ontology M apping Research Question String Similarities Winkler Extension Proposed Extension Evaluation Results Conclusion. Ontologies.
E N D
Evaluating a Generalization of the Winkler Extension in the Context of OntologyMapping Maurice Hermans
Outline • Ontologies • OntologyMapping • Research Question • String Similarities • Winkler Extension • ProposedExtension • Evaluation • Results • Conclusion Bachelor Conference
Ontologies • Provide a vocabulary of terms that describe a domain of interest • There are several ways in which ontologies can differ: • Encoding • Lexical • Syntactic • Semantic • Semiotic Bachelor Conference
OntologyMapping • Knowledge systems used in the same domain can be built according to different specifications and requirements • Thismakes it very hard to exchange data between multiple knowledge systems which do not use the same ontology • Ontology mapping frameworks provide knowledge systems with the capacity to exchange information with other knowledge systems which use different ontologies. Bachelor Conference
Research Question To what extend can string similarities, applied to concept names, be improved such that these are better suitedforontologymapping? Bachelor Conference
String Similarities • Levenshtein • Uses the number of edit operations required to convert string one string toanother • Jaro • Uses the numberof matching characters between two strings and their relativeposition • Jaccard • Compares the sets of tokens of two strings • SoftTFIDF • Includestokens which are similar according to a secondary similarityfunction Bachelor Conference
Winkler Extension • Usesthe lengthof the of the longest common prefix of s and t to assign a more favourablerating • Most commonlyusedwiththe Jarosimilarity • Where: Sim is the basis similarityand P’ the length of the common prefix bounded at 4 Bachelor Conference
ProposedExtension • Uses the length of the longest common substring (LCS) of sandtto assign more favourableratings • ) • Where: Sim is the basis similarity, LCS the length of the longest common substringand S the scalingfor the bonus Bachelor Conference
Example Two partial ontologies from the OAEI dataset Bachelor Conference
Evaluation • Two datasets are used: • 2010 OntologyAlignment Evaluation Initiative • Dataset createdby Cohen et al. 2000 • Similarities are evaluatedusingprecisionandrecallvalues Bachelor Conference
Results OAEI Cohen Optimalweightforboth datasets is around 0.8 Bachelor Conference
Results OAEI Cohen Bachelor Conference
Results OAEI Cohen Bachelor Conference
Results OAEI Cohen Bachelor Conference
Conclusion Bachelor Conference