200 likes | 523 Views
A Classification of Schema-based Matching Approaches. Pavel Shvaiko. Meaning Coordination and Negotiation Workshop, ISWC 8 th November 2004, Hiroshima, Japan. Introduction Classification of schema-based matching approaches Matching systems Conclusions Future work. Outline.
E N D
A Classification of Schema-based Matching Approaches Pavel Shvaiko Meaning Coordination and Negotiation Workshop, ISWC 8th November 2004, Hiroshima, Japan
Introduction Classification of schema-based matching approaches Matching systems Conclusions Future work Outline
Semantic Web and the Match operator • Information sources (e.g., database schemas, taxonomies or ontologies) can be viewed as graph-like structures containing terms and their inter-relationships • Match is one of the key operators for enabling the Semantic Web since it takes two graph-like structures and produces a mapping between the nodes of the graphs that “correspond” semantically to each other
Example: Two XML schemas HT FT
Schema matching vs Ontology alignment Differences: • Database schemas often do not provide explicit semantics for their data • Ontologies are logical systems that themselves incorporate semantics (intuitive or formal) E.g., ontology definitions as a set of logical axioms • Ontology data models are richer (the number of primitives is higher, and they are more complex) then schema data models E.g., OWL allows defining new classes as unions or intersections of other classes Commonalities: • Ontologies can be viewed as schemas for knowledge bases • Techniques developed for both problems are of a mutual benefit
Mapping element, Mis a 5-tuple < ID, e1, e2, n, R > n = {x[0,1]} R = { =, , , , } Parameters (e.g., weights, thresholds) S1 {M} {M'} Match S2 Auxiliary Information (e.g., lexicons, thesauri) Matching
Individual matchers Schema-based Instance-based Hybrid Composite manual composition automatic composition Element-level Element-level Structure-level Constraint-based Constraint-based Constraint-based Linguistic Linguistic Names Descriptions Types Keys Graph matching IR (word frequencies, key terms) Value pattern and ranges Schema matching approaches Combined matchers Taxonomy from [E. Rahm, P. Bernstein, 2001]
Semantic view on matching What is missing in the taxonomy of schema matching approaches we have just seen ? Two new criteria: • Heuristic vs formal: • heuristic techniques try to guess relations which may hold between similar labels or graph structures • formal techniques have model-theoretic semantics which is used to justify their results • Implicit vs explicit: • Implicit techniques are syntax driven techniques • E.g., techniques, which consider labels as strings, or analyze data types, or soundex of schema/ontology elements • Explicit techniques exploit the semantics of labels • E.g., thesauruses, ontologies
Individual matchers Schema-based Heuristic vs Formal Element-level Structure-level Implicit vs Explicit Constraint-based Constraint-based Linguistic Names Descriptions Types Keys Graph matching Schema Matching Approaches
Formal Techniques Heuristic Techniques Element-level Structure-level Structure-level Element-level Explicit Explicit Implicit Explicit Explicit Implicit Ontology-based Reasoner-based Constraint-based Auxiliary Information String-based Constraint-based Constraint-based - OWL properties • Graph matching • Children • Leaves - Taxonomic structure - Propositional SAT - Modal SAT - Names - Descriptions - Precompiled dictionary - Lexicons - Type similarity - Key properties Schema-based Matching Approaches
Heuristic Techniques • Element-level explicit techniques • Precompiled dictionary (Cupid, COMA) E.g., syn key - "NKN:Nikon = syn“ • Lexicons (S-Match, CTXmatch) E.g., WordNet: Camera is a hypernym for Digital Camera, therefore, Digital_CamerasPhoto_and_Cameras • Structure-level explicit techniques • Taxonomic structure (Anchor-Prompt, NOM) E.g., Given that Digital_CamerasPhoto_and_Cameras, then FJFLM and FujiFilm can be found as an appropriate match Example
Element-level explicit techniques • OWL properties (NOM) E.g., sameClassAs constructor explicitly states that one class is equivalent to the other Digital_Cameras = Camera DigitalPhoto_Producer Formal Techniques • Structure-level explicit techniques • Propositional satisfiability (SAT) (S-Match, CTXmatch) The approach is to translate the matching problem, namely the two graphs (trees) and mapping queries into propositional formula and then to check it for its validity • Modal SAT (S-Match) The idea is to enhance propositional logics with modal logic (or ALC DL)operators. Therefore, the matching problem is translated into a modal logic formula which is further checked for its validity using sound and complete satisfiability search procedures. Example
Characteristics of state of the art matchers Conclusions
Uses of Classification • The classification proposed provides a common conceptual basis, and hence can be used for comparing (analytically) different existing schema/ontology matching systems • It can help in designing a new matching system, or an elementary matcher, taking advantages of state of the art solutions
Provide a more detailed view on the general properties of matching algorithms Add to the classification language-based techniques, e.g., tokenization, lemmatization, elimination Extend classification by taking into account DL-based matchmaking solutions Extend classification by adding new appearing matching techniques and systems implementing them, e.g., OLA, QOM Compare matching systems also experimentally, with the help of benchmarks Future Work
Knowledge Web project: http://knowledgeweb.semanticweb.org/ Project website at DIT - ACCORD: http://www.dit.unitn.it/~accord/ P. Shvaiko: A classification of schema-based matching approaches. Technical Report, DIT-04-93, University of Trento, 2004. E. Rahm, P. Bernstein: A survey of approaches to automatic schema matching. In Very Large Databases Journal, 10(4):334-350, 2001. F. Giunchiglia, P.Shvaiko: Semantic matching. In The Knowledge Engineering Review Journal, 18(3):265-280, 2003. P. Bouquet, L. Serafini, S. Zanobini: Semantic coordination: a new approach and an application. In Proceedings of ISWC, 130-145, 2003. References