180 likes | 379 Views
Découverte de mappings entre schemas : les différentes approches Schema Matching : Different Approaches. Khalid Saleem LIRMM. RDF Schema. XML Schema. XML. RDF. OWL. Schema and Ontology. Schema represents Database Community
E N D
Découverte de mappingsentre schemas :les différentes approchesSchema Matching : Different Approaches Khalid Saleem LIRMM
RDF Schema XML Schema XML RDF OWL Schema and Ontology • Schema represents Database Community • Schemas often do not provide explicit semantics of their data (ER, XML document schema). • Ontology represents the AI Community • Ontologies are logical systems that themselves obey some formal semantics. Designed to be interpreted by computers for reasoning (OWL) • Schemas and Ontologies are similar in the sense that • Both provide a vocabulary of terms that describes a domain • Both constraint the meaning of terms used in vocabulary (Hierarchy/ relations)
<class-def> <name>branch</name> <slot-constraint> <name>is-part-of</name> <has-value>tree</has-value> </slot-constraint> </class-def> XML class-def animal %plants are a class that is disjoint from animals class-def plant subclass-ofNOT animal %it isnecessary but not sufficientfor a tree to be a plant: class-def tree subclass-of plant %branches arePART OFtrees class-def branch slot-constraint is-part-of has-value tree %it isnecessary and sufficientfor a carnivore to be an animal: class-defdefined carnivore subclass-of animal slot-constraints eats value-type animal %herbivores eat only plantsORpart of plants class-defdefined herbivore subclass-of animal slot-constraint eats value-type plant OR (slot-constraint is-part-of has-value plant) DAML+OIL Schema vs Ontology : examples
Books Source A Books Source B price book-title author-name listed-price title a-fname a-lname 16,50 Nous Les Dieux Bernard Werber 24 Pompei Robert Harris 26,60 Harry Potter J. K. Rowling 11,50 Marie Des Intrigues Juliette Benzoni Match • Takes two schemas/ontologies as input and produces a mapping between elements of the two schemas that correspond semantically to each other complex match 1-1 match
Schema Matching vs Ontology Matching • Schema matching is usually performed with the help of techniques trying to guess the meaning encoded in the schemas • Ontology matching try to exploit knowledge explicitly encoded in the ontologies.` In real world applications : Solutions from both domains are mutually beneficial
Application Domains • Traditional (Static) • Schema Integration • Data warehousing • E-commerce • Catalogue Integration • New Frontiers (Dynamic) • Semantic Query Processing • Agent Communication • Web Services Integration • P2P Databases
Basic Classification of Matchers [RB01] • Schema vs Data Instance • Element vs Structure • Language vs Constraint • String based : Prefix, Suffix e.g. auth: author • Tokenization, Lemmatization, Eliminition [GSY04] Tool_Kit :(Tool,Kit), Kits:Kit, IsRelatedTo : Related • Data Types, Value domain e.g. 1..12 : month • Match Cardinalities - 1:1, 1:n, n:m (Tel Res, Other) : (Tel Day, Evening, Night) • Auxiliary Information • Global Schema, Dictionaries, Thesauri, Previous Match Decisions, User Input
Basic Classification of Matchers [SE05] • Structure Level Techniques • Graph Matching • Children • Leaves • Relations • Taxonomy based Techniques e.g if super concept is same then sub concepts are same or vice versa • Model Based • ER, XML or XML schema, OWL, OO etc. Combinational Matchers [RB01] • Hybrid Matcher • Multiple/Composite Matcher
Match Dimensions [SE05] For Match Algorithms designing We need the knowledge for its utilization i.e. Dimensions • Input of the Algorithm • Data or Schema, Element level or Structure Level • Characteristics of the Matching Process • Require exact or approximate matching • Performance over quality • Output of the Algorithms • Output is a graded result, or part of a set of match algorithms which are combined together for a map result
Existing Matching Tools • Cupid[MBR01] • COMA (COMA++)[ADMR05] • Similarity Flooding • SemInt • Artemis • DIKE • TransScm • AutoMed • Charlie[TBBT04] Ontologies Specific • NOM/ QOM • OLA • Anchor-PROMPT • S-Match [GSY04] • HICAL • SKAT
Matching Tools continued Machine Learning • GLUE (LSD, CGLUE)[DMDH02] • Automatch • These tools do not completely fulfil the requirements for large scale schema matching because • Not fully automated • Emphasise less on search space optimisation
b b b w b a p t w w f f t t n h o d g n n n n p p i i t h a t p r n n Our Approach a: author b: book d: detail f: information g: general h: birth i: isbn n: name o: own-books p: publisher r: price t: title w: writer • Motivation : • Large Scale Scenario Peer-to-peer Information Systems over the XML Web • Our Schema Matching and Integration Approach • Tree Mining Techniques • Name Matcher • Element Level Matching • Structure Level Matching a=w b=o f=d Search sub-trees
book publisher author title n name name n2 [2,2] b n0 [0,5] p a t n5 [5,5] n1 [1,2] n3 [3,4] n n4 [4,4] Tree Mining Approach Inspired from the tree mining algorithms and data structures based on node scope values (calculated by depth first pre-order traversal) Top-down [Z02] • Our work extends these data structures for schema matching and integration process for handling large sets of XML schema trees. • Employs • Element level Name Matcher (same node label or synonym) • Cluster similar/synonym labels • Utilize the node scope values properties to extract semantics out of structure • E.g. node with label name n2[2,2] is a descendent of node with label author n1[1,2] and not of node with label publisher n3[3,4] verified using descendent test Descendent Node Check : Scope of Node x is [X,Y] and Scope of Descendent Node xd [Xd,Yd] then Xd>X and Yd<=Y
Tree Mining Approach … continued • Data Structure used • Label List : Sorted list of all node labels in the forest of XML schema trees • xGrid : Matrix in which each row represent each participating XML tree and each column represents the corresponding node label. Each cell contains the scope values, parent node number and mapping information. • Output • Creation of a Mediated Schema Tree , from the given forest of participating XML schema trees. • Generation of Mapping Information between participating schema trees and the mediated schema tree
Sm S1 S2 S3 S4 Mapping Information is the column number of node Tree Mining Approach … continued
Conclusion • Element level Name and Linguistic Matching with the support of thesaurus is an integral part of every Match system. • With systems moving towards schema/ontology based manipulation, and lack of global schemas or previous matching results, Structure Level matching is equally important for making out the semantics. • Peer-to-peer environment requires new methods to be exploited for performance and quality mapping i.e. integration of Tree Mining techniques for matching purposes and search space optimisation. • Machine Learning algorithms can be beneficial in the P2P environment in later stages when training examples have been created from instance data, provided the target domain remains the same.
References • [AH04] Antoniou G., Harmelen F. A Semantic Web Primer, The MIT Press, 2004 • [ADMR05] Aumuller D., Do H. H. , Massmann S., and Rahm E. Schema and ontology matching with COMA++. In Proceedings of the International Conference on Management of Data (SIG-MOD), 2005 • [BR04] Bellahsène Z. and Roantree M. (2004) Querying Distributed Data in a Super-peer based Architecture. DEXA 2004. • [BMP04] Bernstein PA., Melnik S., Petropoulos M. and Quix C. (2004) Industrial-Strength Schema Mapping. SIGMOD Record, Vol. 33, No. 4, December 2004 • [DMDH02] Doan AH., Madhavan J., Domingos P. and Halvey A. (2002) Learning to Map Ontologies on the Semantic Web. WWW 2002 • [MBR01] Madhavan J., Bernstein PA. and Rahm E. (2001) Generic Schema Matching with Cupid. VLDB 2001. • [RB01] Rahm E. and Bernstein PA (2001) A Survey of Approaches to Automatic Schema Matching. VLDB Journal 2001 : 10(4):334-3503 • [SE05] Shvaiko P. and Euzenat J. (2005) A Survey of Schema-based Matching Approaches. Journal on Data Semantics, 2005. • [TBBT04] Tranier J., Baraer R., Bellahsene Z. and Teisseire M (2004) Where’s Charlie: Family Based Heuristics for Peer-to-Peer Schema Integration. IDEAS 2004, 227-235 • [Z02] Zaki MJ (2002) Efficiently Mining Frequent Trees in a Forest. 8th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining. July 2002 • http://www.w3.org/TR/daml+oil-reference • http://www.doc.ic.ac.uk/automed/