530 likes | 612 Views
Learn about the complexities of matching ontologies, resolving heterogeneity issues, and the process of aligning schema in the semantic web. Understand the motivations, challenges, and applications in various domains.
E N D
Ontology Alignment/Matching Prafulla Palwe
Agenda • Introduction • Being serious about the semantic web • Living with heterogeneity • Heterogeneity problem • I have a plan for you • Matching Problem • Matching Operation • Motivation • Schema Matching Vs Ontology Matching • Correspondence • Alignment • Matching Process • Sequential composition • Parallel composition • Application Domains • Traditional • Emergent • Classification • Matching Dimensions • Basic Techniques • Element Level • Structure Level • Summary and Challenges
Introduction • Being serious about the semantic web - • It is not one guy's ontology • It is not several guys' common ontology • It is many guys and girls' many ontologies • So it is a mess, but a meaningful mess
Introduction • Living with heterogeneity - • The semantic web will be: • Huge • Dynamic • Heterogeneous • These are not bugs, they are features. • We must learn to live with them.
Introduction • Heterogeneity problem – • Resources being expressed in different ways must be reconciled before being used. • Mismatch between formalized knowledge can occur when: • different languages are used; • different terminologies are used; • different modeling is used.
Introduction • I have a plan for you – Reconciliation
Matching Problem • Matching Operation • Definition – Matching operation takes as input ontologies, each consisting of a set of discrete entities (e.g., tables, XML elements, classes, properties) and determines as output the relationships (e.g., equivalence, subsumption) holding between these entities
Matching Problem • Motivation – • 2 XML Schemas • 2 Ontologies
Matching Problem • Schema mapping Vs ontology mapping • Differences - • Schemas often do not provide explicit semantics for their data • Relational schemas provide no generalization • Ontologies are logical systems that constrain the meaning • Ontology definition as set of logical axioms • Commonalities - • Schemas and ontologies provide a vocabulary of terms that describes the domain of interest • Schemas and ontologies constrain the meaning of terms used in the vocabulary.
Matching Problem • Correspondence • Definition – • Given 2 ontologies O and O’ , a correspondence between M between O and O’ is a 5-uple : <id,e,e’,R,n> such that: • id is a unique identifier of the correspondence. • e and e’ are entities of O and O’ (e.g. XML Elements, classes) • R is a relation (e.g. equivalence (=), disjointness (_|_)) • n is a confidence measure in some mathematical structure (typically in the [0,1] range)
Matching Problem • Alignment • Definition – • Given 2 ontologies O and O’, an alignment A between O and O’: • Is a set of correspondence on O and O’ • With some cardinality: 1-1, 1-* etc. • Some additional metadata (method, date, properties etc)
Matching Process • General Basic Matching Process
Matching Process • Sequential Composition
Matching Process • Parallel composition
Matching Process • Similarity Filter, alignment extractor and alignment filter –
Matching Process • Aggregation Operations – • There are many different ways to aggregate matcher results, usually depending on confidence/similarity: • Triangular norms (min, weighted products) useful for selecting only the best results • Multidimensional distances (Eudidean distance, weighted sum) useful for taking into account all dimensions • Fuzzy aggregation (min, weighted average) useful for aggregating competing algorithms and averaging their results • Other specific measures (e.g., ordered weighted average)
Application Domains • Traditional - • Ontology evolution • Schema integration • Catalog integration • Data integration
Application Domains • Ontology Evolution
Application Domains • Catalog Integration
Application Domains • Emergent • P2P information sharing • Agent communication • Web service composition • Query answering on the web
Application Domains • P2P information sharing
Application Domains • Web Service Composition
Application Domains • Agent communication
Classifications • Matching Dimensions • Input Dimensions • Underlying models (e.g. XML, OWL) • Schema Level Vs Instance Level • Process Dimensions • Approximate Vs Exact • Interpretation of the input • Output Dimensions • Cardinality • Equivalence Vs Diverse relations • Graded Vs Absolute Confidence
Classifications • Three Layers • Upper Layer • Granularity of match • Interpretation of the input information • Middle Layer • Represents classes of elementary (basic) matching techniques • Lower Layer • Based on the kind of input which is used by elementary matching techniques
Classifications • Classification of schema based techniques
Basic Techniques • Element Level Techniques • String based – • Prefix - • Takes an input 2 strings and checks whether the first string starts with the second • e.g. net = network but also hot = hotel • Suffix – • Takes an input 2 strings and checks whether the first string ends with the second • e.g. ID = PID but also word = sword • Edit Distance – • Takes as input 2 strings and calculates the number of edit operations (insertion,deletion,substitution) of characters required to transform one string into other normalized by length of the max string. • editDistance(NKN, Nikon) = 0.4
Basic Techniques • Language based – • Tokenization – • Parses names into tokens by recognizing punctuation, cases • Hands-Free_Kits <hands, free, kits> • Lemmatization – • Analyses morphologically tokens in order to find all their possible basic forms • Kits Kit • Elimination – • Discards empty tokens that are articles, prepositions, conjuctions • a, the, by, type of, their, from
Basic Techniques • Structure Level Techniques • Ontologies are viewed as graph-like structure containing terms and their inter-relationships. • Taxonomy based • Bounded path matching • These take 2 paths with links between classes defined by the hierarchical relations, compare terms and their positions along these paths and identify similar terms. • Super(sub)-concept rules • If super concepts are the same, the actual concepts are similar to each other
Basic Techniques • Tree based • Children • 2 non leaf schema elements are structurally similar if their immediate children sets are highly similar • Leaves • 2 non leaf schema elements are structurally similar if their leaf sets are highly similar, even if their immediate children are not.
Summary and Challenges • Summary • Ontology Matching and alignment is the process of developing the common or most common structure/semantic terms out of 2 or more different ontologies/structures/schemas. • Different efficient and complex algorithms using basic techniques of matching process, can be developed for matching and alignment generation. • Challenges • Developing generic and highly efficient matching and alignment generation algorithms.