1 / 25

A Semantic Approach to Discovering Schema Mapping

A Semantic Approach to Discovering Schema Mapping. Yuan An, Alex Borgida , Renee J. Miller, and John Mylopoulos Presented by: Kristine Monteith . Overview. Goal of the paper: Matching schemas with more than just simple element correspondence (e.g. Can we improve on a naïve mapping?).

yehudi
Download Presentation

A Semantic Approach to Discovering Schema Mapping

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Semantic Approach to Discovering Schema Mapping Yuan An, Alex Borgida, Renee J. Miller, and John Mylopoulos Presented by: Kristine Monteith

  2. Overview Goal of the paper: Matching schemas with more than just simple element correspondence (e.g. Can we improve on a naïve mapping?)

  3. OVERVIEW Approach: Derive a conceptual model for the semantics in a table and match the conceptual model in the source schema to the conceptual model in the target schema e.g. Can we figure out that a source schema like this: can match a target schema like this: hasBookSoldAt(aname,sid)

  4. Example 1

  5. Baseline solution: Referential Integrity constraints • Find correspondences • v1: connect person.pname to hasBookAt.aname • v2: connect bookstore.sid and hasBookSoldAt.sid • Create logical relations using referential constraints • S1: person(pname) |X| writes(pname, bid) |X| book(bid) • S2: book(bid) |X| soldAt(bid,sid) |X| bookstore(sid) • S3: person(name) • S4: bookstore(sid) • Look at target • T1: hasBookSoldAt(aname,sid) • Look at each pair of source and target relations and check to see which are “covered” • <S1,T1,v1> • <S2,T1,v2> • <S3,T1,v1> • <S4,T1,v2>

  6. Ask the user about the following: • Doesn’t present an entire tuple to match the target query: hasBookSoldAt(aname,sid)

  7. What this paper seeks to accomplish: • Generate the following: • compose “writes” and “soldAt” to produce a new semantic connection between “person” and “bookstore”

  8. Approach:Representing Semantics of Schemas • Create a Conceptual Model (CM) graph • Create nodes for classes and attributes • Create directed edges for relationships and inverses C1 ---ISA--- C2 subclasses C ---p--- D relationships C ---p->-- D functional relationships • Duplicate concept nodes to represent recursive relationships

  9. Generating Mapping Candidates • Problem description • Inputs: • A source relational schema S and a target relational schema T • A concept model (GSand GTrespectively) associated with each relational schema via table semantic mappings • A set of correspondences L linking a set L(S) of columns in S to a set L(T) of columns in T • Goal: • A pair of expressions <E1,E2> which are “semantically similar” in terms of modeling the subject matter

  10. Marked Nodes • The set L(S) of columns gives rise to a set CS of marked class nodes in the graph GS • Likewise, the set L(T) gives rise to a set CT of marked class nodes in the graph GT

  11. Basic Algorithm • Create conceptual subgraphs • find a subgraph D1 connecting concept nodes in CS, and a subgraph D2 connecting concept nodes in CT such that D1 and D2 are “semantically similar • Suggest possible mapping candidates • translate D1 and D2 into algebraic expressions E1 and E2 and return the triple < E1,E2,LM> as a mapping candidate

  12. Creating conceptual subgraphs • Notice simple matches • a node v in CS corresponds to a node u in CTwhen v and u have attributes that are associated with corresponding columns via the table semantics • More complicated rules • The connections (v1,v2) and (u1,u2) should be “semantically similar” or at least “compatible” (cardinality constraints, relationships like “is-a” or “part of”) • Use edges from pre-selected trees • Represent “intuitively meaningful” concepts • Favor smaller trees (Occam’s razor) • Other considerations • Favor lossless joins • Reject contradictions

  13. Example • Looking for a functional tree with a root corresponding to the anchor Proj

  14. Example • Notice simple matches • Find a tree with minimal cost (edges in pre-selected trees don’t contribute to cost) • Find a tree containing the most number of edges in the pre-selected trees Project ---controlledBy->-- Department --hasManager->-- Employee

  15. More complicated Example Still looking for low-cost, minimal trees to connect Employee to Project Same Answer: Project ---controlledBy->-- Department --hasManager->-- Employee

  16. Dealing with n-ary Relations • StoreSells(Person, Product)

  17. Considerations for Reified Relationships • A path of length 2 passing through a reified relationship node should be considered to be length 1 • The semantic category of a target tree rooted at a reified relationship induces preferences for similarly rooted (minimal) functional trees in the source (cardinality restrictions, number of roles, subclass relationship to top level ontology concept)

  18. Obtaining Relational Expressions

  19. Experimental Results

  20. Average precision

  21. Average recall

  22. Conclusions • Semantic approach performs at least as well as the RIC-based approach on datasets studied • These approaches made significant improvements in some cases • Many of the datasets did not have complicated schema; a semantic approach didn’t provide as much benefit in those cases

  23. Strengths/Weaknesses • Strengths • Lots of examples • Provides a useful solution to a common problem • Weaknesses • Formalism sometimes made things more complicated rather than more clear • Assumes a lot of background knowledge

  24. Future Work • Embed this functionality into pre-existing mapping tools (they suggest Clio since a lot of their work is based off of this) • Add negation to semantic representation • Investigate more complex semantic mappings

  25. Questions???

More Related