420 likes | 628 Views
Ontology Alignment. Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology. ?. a. ?. ?. ?. How should I use them? !!!. b. ?. ?. ?. c. d. The Problem. Like the Web, the Semantic Web by design will be distributed and heterogeneous.
E N D
Ontology Alignment Semantic Web - Spring 2006 Computer Engineering Department Sharif University of Technology
? a ? ? ? How should I use them? !!! b ? ? ? c d The Problem • Like the Web, the Semantic Web by design will be distributed and heterogeneous. • Ontology is used in it to support interoperability and common understanding between different parties. • Ontologies themselves may have some heterogeneities. • Ontology Alignment is needed to find semantic relationships among entities of ontologies.
Terminology • Mapping:a formal expression that states the semantic relation between two entities belonging to different ontologies. • Ontology Alignment:a set of correspondences between two or more (in case of multi-alignment) ontologies. These correspondences are expressed as mappings. • Ontology Coordination:broadest term that applies whenever knowledge from two or more ontologies must be used at the same time in a meaningful way (e.g. to achieve a single goal). • Ontology Transformation:a general term for referring to any process which leads to a new ontology o0 from an ontology o by using a transformation function t.
An Example of Alignment Car : Ontology A ( ? ) Automobile : Ontology B
Terminology cont. • Ontology Translation:an ontology transformation function t for translating an ontology o written in some language L into another ontology o’ written in a distinct language L’. • Ontology Merging:the creation of a new ontology from two (possibly overlapping) source ontologies. This concept is closely related to that of integrationin the database community. • Ontology Reconciliation:a process that harmonizes the content of two (or more) ontologies, typically requiring changes on one of the two sides or even on both sides.
Object Thing Vehicle Automobile Bus Car Sport Car Family Car Family Car Luxury Car Sport Car Porsche BMW An Example of Ontology Merging
An Example of Ontology Merging Object Thing Vehicle Automobile Bus Car Sport Car Family Car Family Car Luxury Car Sport Car Porsche BMW
Object Vehicle Bus Car Family Car Family Car Luxury Car Sport Car Porsche BMW An Example of Ontology Merging Thing Automobile Sport Car
An Example of Ontology Merging Object, Thing Vehicle Bus Car, Automobile Sport Car Luxury Car Family Car BMW Porsche
Forms of Heterogeneity in Ontologies • Syntactic: depend on the choice of the representation • OWL, RDFS, DAML, N3, DATALOG, PROLOG, … • Terminological: all forms of mismatches that are related to the process of naming the entities (e.g. individuals, classes, properties, relations) that occur in an ontology. • Typical Examples: • different words are used to name the same entity (synonymy); • the same word is used to name different entities (polysemy); • words from different languages (English, French, etc.) are used to name entities; • syntactic variations of the same word (different acceptable spellings, abbreviations, use of optional prefixes or suffixes, etc.). • Mismatches at the terminological level are not as deep as those occurring at the conceptual level. However, Most real cases have to do with the terminological level (e.g., with the way different people name the same entities), and therefore this level is at least as crucial as the other one.
Heterogeneity in Ontologies, cont. • Conceptual: we encounter mismatches which have to do with the content of an ontology. • Metaphysical differences:which have to do with how the world is “broken into pieces”. • Coverage: cover different portions – possibly overlapping– of the world. • Granularity: One ontology provides a more (or less) detailed description of the same entities. • Perspective: an ontology may provide a viewpoint, which is different from the viewpoint adopted in another ontology.
Heterogeneity in Ontologies, cont. Metaphysical differences:
Overcoming Heterogeneity • One common approach to the problems of heterogeneity is the definition of relations across the heterogeneous representations. • These relations can be used for transforming expression of one ontology into a form compatible with that of the other. • This may happen at any level: • syntactic:through semantic-preserving transducers; • terminological:through functions mapping lexical information; • conceptual:through general transformation of the representations (sometimes requiring a complete prover for some languages);
Structure of Mapping • Alignment:a process that starts from two representations o and o’ and produces a set of mappings between pairs of (simple or complex) entities <e, e’> belonging to O and O’ respectively. • Intuitively, we will assume that in general a mapping can be described as a quadruple: <e, e’, n , R> • e and e’ are the entities between which a relation is asserted by the mapping. • n is a degree of trust (confidence) in that mapping. • R is the relation associated to a mapping, where R identifies the relation holding between e and e’. • simple set-theoretic relation • a fuzzy relation • a probabilistic distribution over a complete set of relations • a similarity measure
Similarity • There are many ways to assess the similarity between two entities. The most common way amounts to defining a measure of this similarity. • The characteristics which can be asked from these measures:
Overcoming Heterogeneity Using Similarity • Local Methods • Terminological Methods • String Based Methods • Token Based Methods • Language Based Methods • Structural Methods • Internal Structure • External Structure • Extensional (based on instances) Methods • When the classes share the same instances • When they do not
Terminological Methods • Terminological methods compare strings. • Can be applied to: • name, • label • comments concerning entities • URI • Take advantage of the structure of the string (as a sequence of letter). • The main idea in using such measures is the fact that usually similar entities have similar names and descriptions in different ontologies.
Terminological M., cont. (Normalization) • There are a number of normalization procedures that help improving the results of subsequent comparison: • Case normalization: consists of converting each alphabetic character in the strings in their down case counterpart; • Diacritics suppression: replacing characters with diacritic signs with their most frequent replacement (replacing Montréal with Montreal); • Blank normalization:Normalizing all blank characters (blank, tabulation, carriage return) into a single blank character; • Link stripping: normalizing some links between words (like replacing apostrophes and blank underline into dashes; • Stopword elimination: eliminates words that can be found in a list (usually like, “to”, “a". . . ).
Terminological M., cont. (String Based) • Substring Similarity • Hamming Distance • N-Gram Distance • Edit Distance • Jaro Similarity • Token Based Distances • Term Frequency Inverse Document Frequency (TF/IDF) • Path Distance : not only the labels of objects but the sequence of labels of entities to which those bearing the label are related.
Terminological M., cont (String Methods) • In string edit distance, the operations usually considered are insertion of a character, replacement of a character by another and deletion of a character. • Levenstein Distance is an Edit Distance with all costs to 1.
Terminological M., cont. (Language Based) • Rely on using NLP techniques to find associations between instances of concepts or classes. • Intrinsic methods: perform the terminological matching with the help of morphological and syntactic analysis to perform term normalization. (Stemming) : going go • Extrinsic methods: make use of external resources such as dictionaries and lexicons (Wordnet). • Resnik Semantic Similarity
Structural Methods • The structure of entities that can be found in ontology can be compared, instead of comparing their names or identifiers. • Internal Structure:use criteria such as the range of their properties (attributes and relations), their cardinality, and the transitivity and/or symmetry of their properties to calculate the similarity between them. • External Structure: The similarity comparison between two entities from two ontologies can be based on the position of entities within their hierarchies.
Structural Methods (External) • If two entities from two ontologies are similar, their neighbors might also be somehow similar. • Criteria for deciding that the two entities are similar include: • Their direct super-entities are already similar. • Their sibling-entities are already similar. • Their direct sub-entities are already similar. • All (or most) of their descendant-entities (entities in the sub tree rooted at the entity in question) • are already similar. • All (or most) of their leaf-entities are already similar. • All (or most) of entities in the paths from the root to the entities in question are already • similar.
Structural Methods (External), cont. • Existing Approaches: • Structural topological dissimilarity on hierarchies • Upward Cotopic Distance
Extensional (based on instances) Methods • Compares the extension of classes, i.e., their set of instances rather than their interpretation. • Conditions in which such techniques can be used: • When the classes share the same instances • When they do not
Global Methods • After calculation of local similarity, it is remain to compute the alignment. This involve some kind of more global treatments, including: • aggregating the results of these base methods in order to compute the similarity between • compound entities • developing a strategy for computing these similarities in spite of cycles and non linearity in • the constraints governing similarities • organizing the combination of various similarity / alignment algorithms • involving the user in the loop • finally extracting the alignments from the resulting (dis)similarity
Global similarity computation • The computation of compound similarity is still local because it only provides similarity considering the neighborhood of a node. • Similarity may involve the ontologies as a whole and the final similarity values may ultimately depend on all the ontologies. • The distance defined by local methods can be defined in a circular way. (for instance if the distance between two classes depends on the distances between their instances which themselves depends on the distance between their classes or if there are circles in the ontology). • Strategies must be defined in order to compute this global similarity. • Similarity Flooding • Similarity equation fix point
Global similarity (Similarity Flooding) • Two ontologies are first translated into directed labeled graphs. • Creates another graph G whose nodes are pairs of nodes of the initial graphs and there is an edge between (o1, o’1) and (o2, o’2) labeled by p whenever there are edges (o1, p, o2) in the first graph and (o’1, p, o’2) in the second one. • computes initial similarity values between nodes (based on their labels for instance) and then iterates steps of re-computing the similarities between nodes in function of the similarity between their adjacent nodes at the previous step. • It stops when no similarity changes more than a particular threshold or after a predetermined number of steps. • Use a weighted linear aggregation in which the weight of an edge is the inverse of the number of other edges with the same label reaching the same couple of entities.
Learning Methods • Like in many other fields, learning methods developed in machine learning reveals useful in ontology alignment. • Two particular areas: • supervised learning in which the ontology alignment algorithm learns how to work through the presentation of many good alignment (positive examples) and bad alignments (negative examples). • it is difficult to know which techniques works well for which ontology features. • An ontology alignment algorithm learnt with several ontology pairs, might not necessarily work well for a new ontology pair. • Learning from data in which a population of instances is communicated to the algorithm together with theirs relations and the classes they belong to.
Users Feed Back • The support of effective interaction of the user with the system components is one concern of ontology alignment. • User input can take place in many areas of alignment: • Assessing initial similarity between some terms; • Invoking and composing alignment methods; • Accepting or refusing similarity or alignment provided by the various methods.
Alignment Extraction • The ultimate alignment goal is a satisfactory set of correspondences between ontologies. • Manual Extraction: Display the entity pairs with their similarity scores and/or ranks and leaving the choice of the appropriate pairs up to the user of the alignment tool. • Automatic Extraction: • Using Thresholds • Hard threshold retains all the correspondence above threshold n; • Delta method consists in using as a threshold the highest similarity value to which a particular constant value d is subtracted; • Proportional method: consists in using as a threshold the a percentage of the highest similarity value; • Percentage: retains the n% correspondences above the others.
Alignment Extraction, cont. • Automatic Extraction • Using Optimization of the result • if an injective mapping is required then some choices need to be made in order to maximize the “quality” of the alignment. • that is typically measured on the total similarity of the aligned entity pairs. • A greedy alignment algorithm could construct the correspondences step-wise, at each step selecting the most similar pair and deleting its members from the table. The algorithm will then stop whenever no pair remains whose similarity is above the threshold. (Not Optimal) • Optimal Solution: Stable Marriage
An Example: Anchor Prompt Method • The Anchor-PROMPT (an extension of PROMPT) is an ontology merging and alignment tool for possible matching terms. • Implemented in Protégé http://protege.stanford.edu • Incremental algorithm • Takes as input two ontologies and a set of anchors-pairs of related terms. • Anchors are identified with the help of string-based techniques, or defined by a user. • Then it refines them based on the ontology structures and users feedback.
Perform automatic updates Find conflicts Make suggestions The PROMPT Algorithm Make initial suggestions Select the next operation
After a User Performs an Operation • For each operation • perform the operation • consider possible conflicts • identify conflicts • propose solutions • analyze local context • create new suggestions • reinforce or downgrade existing suggestions
Conflicts • Conflicts that PROMPT identifies • name conflicts • dangling references • redundancy in a class hierarchy • slot-value restrictions that violate class inheritance
Anchor-PROMPT:Using Non-Local Contexts • Input: • A set of anchor pairs • Output: • A set of related terms with similarity scores • Where do anchors come from? • Lexical matching • Interactive tools • User-specified Ontology 1 Ontology 2