1 / 30

Schema Matching

Schema Matching. Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Ma ßmann Putting Context into Schema Matching Philip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster COMA - A System for Flexible Combination of Schema Matching Approaches Hongai-Hai Do, Erhard Rahm. Goals.

feivel
Download Presentation

Schema Matching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Schema Matching Matching Large XML Schemas Erhard Rahm, Hong-Hai Do, Sabine Maßmann Putting Context into Schema Matching Philip Bohannon, Eiman Elnahrawy, Wenfei Fan, Michael Flaster COMA - A System for Flexible Combination of Schema Matching Approaches Hongai-Hai Do, Erhard Rahm Christiano Santiago

  2. Goals • Introductory concepts on Schema Matching • Context-Sensitive versus Context-Insensitive • Complexity on XSD schemas Christiano Santiago

  3. Agenda • Terminology • Different Approaches • XML Schema Definition • Context-Insensitive • Context-Sensitive • Q&A Christiano Santiago

  4. Terminology • Schema matching: it is the process of identifying that two objects are semantically related. • Mapping: it refers to the transformations between the objects. Meaning Conversion Christiano Santiago

  5. Terminology Student Name, SSN, Level, Major, Marks GradStudent Name, ID, Major, Grades Christiano Santiago

  6. Schema Matching Christiano Santiago

  7. Context Context-insensitive Context-sensitive Christiano Santiago

  8. Different Approaches • Schema-level matchers • Instance-level matchers • Hybrid matchers • Reusing matching information Christiano Santiago

  9. Schema-Level Matchers • Only consider schema information • Name • Description • Data type • Relationship • Constraints • Number of nesting levels Christiano Santiago

  10. Instance-Level Matchers • Use instance-level to gather insight into the content and meaning of schema elements • Linguistic • Dept • DeptName • EmpName • Constraints • 416-7362100 • M3J1P3 Christiano Santiago

  11. Hybrid-Level Matchers • Combines more than one approach Christiano Santiago

  12. Reusing Matching Information • Use previous matching information for future matching tasks • Structures or substructures often repeat • Caution • Salary & Income • Payroll • Tax Reporting Christiano Santiago

  13. XML Schema Definition (XSD) • Data types • 19 built-in primitive data types • 25 built-in derived data types • User defined complex types Christiano Santiago

  14. XML Schema Definition (XSD) • Complex type definition: <complexType name="myNewNameType"> <complexContent> <restriction base="anyType"> <sequence> <element name="name" type="string" /> <element name="location" type="string" /> </sequence> <attribute name="position" type="string" /> </restriction> </complexContent> </complexType> <element name="employee" type="dc:myNewNameType" /> <dc:employee position="trainer"> <dc:name>Don Smith</dc:name> <dc:location>Dallas, TX</dc:location> </dc:employee> Child Elements Attribute Christiano Santiago

  15. XML Schema Definition (XSD) • Shared schema components Christiano Santiago

  16. XML Schema Definition (XSD) • Match Systems approaches • COMA: path-based • Cupid: materialized • Scalability issue: XCBL Order schema contains 1451 components, including 91 shared types. After resolving the shared components, 26000+ nodes/paths were identified. Christiano Santiago

  17. XML Schema Definition (XSD) • Distributed schemas • XSD allows a schema to be distributed over several schema documents (.xsd files) and namespaces Christiano Santiago

  18. XML Schema Definition (XSD) Determining similarity between and matching complex types can be as difficult as matching two complete schemas. Christiano Santiago

  19. Standard Schema Matching Context-Insensitive • Matchers • Matching algorithms to compute similarity scores between a pair of attributes • Weights • Scores are weighted • Confidence scores are identified based on standard statistical techniques • Selection of best matches Christiano Santiago

  20. Fragmented-Based Schema Matching Context-Insensitive • Fragment identification • Identifying fragment-pair candidates • Fragment matching • Result combination Christiano Santiago

  21. Prototype • Based on COMA: COmbining MAtch algorithm • Support to multiple file schema • Multiple matching strategies • Fragment-based approach • Result combination Christiano Santiago

  22. COMA • Schema representation • Schemas are represented by rooted DAGs (Directed Acyclic Graphs). Christiano Santiago

  23. COMA • Directed Acyclic Graphs • Direct graph • With no cycles • Part tree & part graph • Used in Critical Path Analysis,Expression Tree Evaluation and Game Evaluation Christiano Santiago

  24. COMA • Match processing reusability Christiano Santiago

  25. Continuity of this work • 2004: COMA prototype • 2005: COMA++, extended previous COMA prototype • High quality and fast execution times • Default combination of 4 matchers • 2007: MOMA: Mapping-based Object Matching Christiano Santiago

  26. Context Schema MatchingContext-Sensitive • False Negatives RS.price.prcode = “reg” Rs.price.price → RT.music.price Rs.price.price → RT.music.sale RS.price.prcode = “sale” Christiano Santiago

  27. Context Schema MatchingContext-Sensitive • Two techniques for selecting contextual matches: • MultiTable: find the single match with the highest confidence for every target attribute • QualTable: find the best matches on a per-table basis Christiano Santiago

  28. Context Schema MatchingContext-Sensitive • Experimental Results “Because of its poor performance, MultiTable is not considered further” Christiano Santiago

  29. Conclusion • Current schema matching approaches still have to improve for large and complex schemas. • The large search space increases the likelihood for false matches as well as execution times. • Further difficulties for schema matching are posed by the high expressive power and versatility of modern schema languages like XSD. Christiano Santiago

  30. Questions Christiano Santiago

More Related