250 likes | 378 Views
A Semantic Approach to Discovering Schema Mapping. Yuan An, Alex Borgida , Renee J. Miller, and John Mylopoulos Presented by: Kristine Monteith . Overview. Goal of the paper: Matching schemas with more than just simple element correspondence (e.g. Can we improve on a naïve mapping?).
E N D
A Semantic Approach to Discovering Schema Mapping Yuan An, Alex Borgida, Renee J. Miller, and John Mylopoulos Presented by: Kristine Monteith
Overview Goal of the paper: Matching schemas with more than just simple element correspondence (e.g. Can we improve on a naïve mapping?)
OVERVIEW Approach: Derive a conceptual model for the semantics in a table and match the conceptual model in the source schema to the conceptual model in the target schema e.g. Can we figure out that a source schema like this: can match a target schema like this: hasBookSoldAt(aname,sid)
Baseline solution: Referential Integrity constraints • Find correspondences • v1: connect person.pname to hasBookAt.aname • v2: connect bookstore.sid and hasBookSoldAt.sid • Create logical relations using referential constraints • S1: person(pname) |X| writes(pname, bid) |X| book(bid) • S2: book(bid) |X| soldAt(bid,sid) |X| bookstore(sid) • S3: person(name) • S4: bookstore(sid) • Look at target • T1: hasBookSoldAt(aname,sid) • Look at each pair of source and target relations and check to see which are “covered” • <S1,T1,v1> • <S2,T1,v2> • <S3,T1,v1> • <S4,T1,v2>
Ask the user about the following: • Doesn’t present an entire tuple to match the target query: hasBookSoldAt(aname,sid)
What this paper seeks to accomplish: • Generate the following: • compose “writes” and “soldAt” to produce a new semantic connection between “person” and “bookstore”
Approach:Representing Semantics of Schemas • Create a Conceptual Model (CM) graph • Create nodes for classes and attributes • Create directed edges for relationships and inverses C1 ---ISA--- C2 subclasses C ---p--- D relationships C ---p->-- D functional relationships • Duplicate concept nodes to represent recursive relationships
Generating Mapping Candidates • Problem description • Inputs: • A source relational schema S and a target relational schema T • A concept model (GSand GTrespectively) associated with each relational schema via table semantic mappings • A set of correspondences L linking a set L(S) of columns in S to a set L(T) of columns in T • Goal: • A pair of expressions <E1,E2> which are “semantically similar” in terms of modeling the subject matter
Marked Nodes • The set L(S) of columns gives rise to a set CS of marked class nodes in the graph GS • Likewise, the set L(T) gives rise to a set CT of marked class nodes in the graph GT
Basic Algorithm • Create conceptual subgraphs • find a subgraph D1 connecting concept nodes in CS, and a subgraph D2 connecting concept nodes in CT such that D1 and D2 are “semantically similar • Suggest possible mapping candidates • translate D1 and D2 into algebraic expressions E1 and E2 and return the triple < E1,E2,LM> as a mapping candidate
Creating conceptual subgraphs • Notice simple matches • a node v in CS corresponds to a node u in CTwhen v and u have attributes that are associated with corresponding columns via the table semantics • More complicated rules • The connections (v1,v2) and (u1,u2) should be “semantically similar” or at least “compatible” (cardinality constraints, relationships like “is-a” or “part of”) • Use edges from pre-selected trees • Represent “intuitively meaningful” concepts • Favor smaller trees (Occam’s razor) • Other considerations • Favor lossless joins • Reject contradictions
Example • Looking for a functional tree with a root corresponding to the anchor Proj
Example • Notice simple matches • Find a tree with minimal cost (edges in pre-selected trees don’t contribute to cost) • Find a tree containing the most number of edges in the pre-selected trees Project ---controlledBy->-- Department --hasManager->-- Employee
More complicated Example Still looking for low-cost, minimal trees to connect Employee to Project Same Answer: Project ---controlledBy->-- Department --hasManager->-- Employee
Dealing with n-ary Relations • StoreSells(Person, Product)
Considerations for Reified Relationships • A path of length 2 passing through a reified relationship node should be considered to be length 1 • The semantic category of a target tree rooted at a reified relationship induces preferences for similarly rooted (minimal) functional trees in the source (cardinality restrictions, number of roles, subclass relationship to top level ontology concept)
Conclusions • Semantic approach performs at least as well as the RIC-based approach on datasets studied • These approaches made significant improvements in some cases • Many of the datasets did not have complicated schema; a semantic approach didn’t provide as much benefit in those cases
Strengths/Weaknesses • Strengths • Lots of examples • Provides a useful solution to a common problem • Weaknesses • Formalism sometimes made things more complicated rather than more clear • Assumes a lot of background knowledge
Future Work • Embed this functionality into pre-existing mapping tools (they suggest Clio since a lot of their work is based off of this) • Add negation to semantic representation • Investigate more complex semantic mappings