330 likes | 448 Views
Generic Model Management A Database Infrastructure for Schema Manipulation. Philip A. Bernstein Microsoft Corporation September 6, 2001. The Problem. There is 30 years of DB Research on meta data But we don’t have great infrastructure to offer
E N D
Generic Model ManagementA Database Infrastructure for Schema Manipulation Philip A. Bernstein Microsoft Corporation September 6, 2001
The Problem • There is 30 years of DB Research on meta data • But we don’t have great infrastructure to offer • Most design tools and web services store meta data in files, not DBs • OODBMS’s are not a huge success • Most meta data driven tools use their own infrastructure • Goal: generic meta data manipulation infrastructure • Reduce the amount of programming required to build meta data driven applications. • Proposal: Model Management • Define an algebra to manipulate meta data in large chunks, called models and mappings.
Outline • Overview of Model Management • Solutions to classical meta data problems • Recent technical results
Models and Mappings • Model – a complex information structure • XML schema, SQL schema, OO interface, UML model, web site map, make script, …. • Mapping – a transformation from one model into another • Map between two XML schemas • Map a SQL schema to an XML schema • Map data sources to a data warehouse • Map an ER diagram to a SQL schema • Map a process defn to a workflow script
Relational Schema XSD map1 Emp Emp E# E# Dept# Dept# Name Name First A mapping is a model each of whose nodes connects nodes of two other models Last Representation A model is a directed graph with one root.
Model Management Algebra • Enumerate • ApplyFunction • Copy • Update operations • Match • Merge • Compose • Select • Diff
map1 = = map = Match(M1, M2, ) • Match(M1, M2, ) returns the best mapping between M1and M2, w.r.t. to M2 Emp M1 Emp E# E# Dept# Dept# Phone Name Name Addr First Last
Emp Emp Emp mapC Addr Name Phone Name Phone Addr Name = M3 = Merge(M1, M2, map) • Return the union of models M1 and M2 • Use map to guide the Merge • If elements x = y in map, then collapse them into one element
Emp Emp mapC mapC = mapA f• mapB Addr c1 StAddr Street c2 Town c3 City Left Composition ( f •) Emp Emp Emp mapB mapA Addr Name Name a1 b1 Street Street StAddr b2 a2 City Town b3 a3 City M2 M1 M3
Model Management Algebra • list = Enumerate(M) • ApplyFunction(M, f ) • M2= Copy(M1) • Update operations • map = Match (M1, M2, ) • M3 = Merge (M1, M2, map) • map3 = Compose(map1, map2) • M2= Select(M1, pred) • M2= Diff(M1, map) They’re generic = data model independent … well … implemented on an extended ER model with an extensibility story
xsd1 1. map2 2. map3 1. map2= Match(xsd1, xsd2) 3. map4 map1 rdb1 2. map3 = map1map2 rdb2 Example • Given • map1 from SQL schema rdb1 to xsd1, • xsd2, which is similar to xsd1 • Produce • a map between xsd2 and a relational schema. xsd2 3. <map4, rdb2 > = Copy(map3) 4. Use ApplyFunction(map4) to map each x in Diff(xsd2,map4) into rdb2
Theme • Classic meta data problems can be solved using Model Management operations • Schema integration • Schema evolution • Data migration • Reverse engineering • Data reintegration (3-way merging) • Published solutions to these problems help us produce generic implementations of model mgmt operations
Outline • Overview of Model Management • Solutions to classical meta data problems • Schema integration • Schema evolution • Reverse engineering • Data reintegration (3-way merging) • Data migration • Recent technical results
S S 1. map= Match(V1, V2) map 2. S = Merge(V1, V2 , map) 2. 3. ApplyFunction(S) // to resolve conflicts in S, producing S 1.map V2 V1 Schema Integration • Given • two view schemas, V1 and V2 • Produce • an integrated schema, S
map f = L Name FirstName = R LastName S Emp E# 1. map= Match(V1, V2) FirstName 2. S = Merge(V1, V2 , map) Dept# LastName 3. Use ApplyFunction(S) to re-solve conflicts, producing S Addr Phone V1 V2 Emp Emp E# E# Dept# Dept# Phone Addr FirstName Name LastName
Merging Knowledge Bases (Ontologies) • Same as schema integration, but applied to ontologies • The literature on merging ontologies focuses mostly on Match.
2. mapSV V 1. mapSS= Match(S, S) mapSV 2. mapS V = mapS SmapSV S S 1. mapSS Schema Evolution • Given • mapSV from schema S to view V • a modified version S of S • Produce • a mapping mapSV from S to V (i.e. a view defn for V over S). • Use ApplyFunction(V) to delete elements not derivable from S
Outline • Overview of Model Management • Solutions to classical meta data problems • Schema integration • Schema evolution • Reverse engineering • Data reintegration (3-way merging) • Data migration • Recent technical results
M M 1. mapGG= Match(G, G) 2. mapMG = mapMG mapGG 2. mapMG mapMG 3.mapMG 3. <M, mapG M > = Copy(mapMG) 1. mapGG G G Reverse Engineering • Given • Model M (e.g., an ER model) • Model G (e.g., SQL) generated via mapMG from M • A modified version G of G • Produce • A modified version M of M that generates G 4. Use ApplyFunction(mapMG), to reverse engineer each g in Diff(G,mapMG) into M
S0 • MapOA = Match(O, A) (based on OIDs) • MapOB = Match (O, B) (based on OIDs) • MapOA = ApplyFunction(MapOA) such that if eMapOA if domain(e) = range(e), then delete e (i.e. things changed in A) • MapOB = ApplyFunction(MapOB) such that if eMapOB if domain(e) = range(e), then delete e (i.e. things changed in B) • ChangedA = range(MapOA) • ChangedB = range(MapOB) • MapChAChB = Match(ChangedA, ChangedB) • MapChBChA = invert(MapChAChB) • A = Diff(ChangedA, ChangedB, MapChAChB) (changed in A but not changed in B) • B = Diff(ChangedB, ChangedA, MapChBChA) • MapAB = Match (A,B) (by OIDs) • G = Merge (A,B, MapAB) • MapGA =Match(G,A) • GA = Merge (G, A, MapGA) with preference for A • MapGAB =Match(GA,B) • GAB = Merge (GA’, B’, MapGA’B’) with preference for B • DeletedA = Diff(O,A,MapOA) • DeletedB = Diff(O, B, MapOB) • MapDeletedAChangedB = Match(DeletedA, ChangedB) • MapDeletedBChangedA = Match(DeletedB, ChangedA) • ShouldDeleteA = Diff(DeletedA, ChangedB, MapDeletedAChangedB) • ShouldDeleteB = Diff(DeletedB, ChangedA, MapDeletedBChangedA) • MapGABSDA = Match(GAB, ShouldDeleteA) • GABSDA = Diff(GAB, ShouldDeleteA, MapGABSDA) • MapGABSDASDB = Match(GABSDA,ShouldDeleteB) • Final result = Diff(GABSDA, ShouldDeleteB, MapGABSDASDB) S1 S2 S3 3-Way Merge (aka Reintegration) • Given • a source schema S0 • two derived schemas S1 and S2 • Produce • a schema S3 that merges the changes of S1 and S2
D Enum Generate Migration Script 1. mapSS= Match(S, S) Run 2. Use Enum(S) to generate a data migration script 1. mapSS D S S Data Migration • Given • a schema S and its database D • an evolved schema S • Produce • a procedure for mapping D into an S database D
Data Translation • Like data migration, except S and S are expressed in different data models.
Outline • Overview of Model Management • Solutions to classical meta data problems • Recent technical results
Status Report • Vision • [Bernstein, Halevy, & Pottinger, SIGMOD Record 12/00] • Data Warehouse Examples • [Bernstein & Rahm, ER ’00] • Match Operation • Survey: [Rahm & Bernstein, MSR Tech Report] • Prototype: [Madhavan, Bernstein, & Rahm, VLDB ’01] • Merge Operation • coming soon … • Theory • [Alagić & Bernstein, DBPL ’01]
Individual matchers Combined matchers Schema-based Content-based Hybrid Composite Per-Element Structural Per-Element Manual composition Automatic composition Constraint-based Constraint-based Constraint-based Linguistic Linguistic • Names • Descriptions • Types • Keys • Graph matching • IR (word frequencies, key terms) • Value pattern and ranges Schema Matching Approaches • About a dozen published algorithms. • Many good ideas, but none are robust.
PurchaseOrder PO DeliverTo InvoiceTo POShipTo POBillTo Address Address City City Street Street City City Street Street The CUPID Algorithm • Computes linguistic similarity of element pairs • Computes structural similarity of element pairs • Generates a mapping ssim++
X X Y Z a a a Y W Z M3 = Merge(M1, map, M2) • [Buneman, Davidson, Kosky, EDBT 92] • Meta-model has aggregation & generalization only • Do a union and collapse objects having the same name • Fix-up step for inconsistencies created by merging X X a a Y Z • Successive fixups lead to different results • Batch them at the end, to produce a unique minimal result • Now enrich the meta-model (containment, complex mappings) & merge semantics (conflicts, deletes)
A Formal Semantics for Model Mgt • Use category theory for a data-model-independent characterization of models and mappings • Models and their DBs are categories • Model and data transformations are morphisms • Mappings between models & data are functors • Utility • Define formal semantics for Match and Merge • Explain when Match & Merge preserve constraints. • Check that implementation satisfies the semantics
Match Categories f Db Schm Db(Sch1) Sch1 p g Functor q Sch12 Sch2 Db Db Merge Db(Sch12) Db(Sch2) Theory • Goal – a mathematical semantics of MM algebra
Generic Tools • Browser • Import/export • Scripting • Editors • Catalogs Update Marketing cust Model-Driven UI Generator emp Authorize Credit Order Entry dept dno Bill Customer dna Schedule Delivery select all Inventory Compose Match Copy Merge … Apply MM Meta-Model Inferencing Engine OR Mapper Implementation Vision Operation Speciali- zations Model Manager Object-Oriented Repository SQL DBMS
Related Work • There’s a lot of it. Apply it to model management! • Platforms – OODBs, datalog, deductive OODBs (Telos/ConceptBase, F-Logic) • Inferencing on mappings – AQUV, description logic • Transitive closure and recursive QP • Differencing – text, trees, graphs • Data translation – algebras, schema evolution • Data integration – schema match, view generation
Summary • Raise the level of abstraction of meta-data programming by using: • models and mappings as objects • an algebra that manipulates models and mappings on a generic meta-model • Classical meta data problems can be expressed using this algebra • Implementations of classic problems offer guidance on implementing the algebra
References • http://www.research.microsoft.com/~philbe • P. Bernstein & E. Rahm, “Data Warehouse Scenarios for Model Management”, ER 2000 Conference • P. Bernstein, A. Levy, R. Pottinger, “A Vision for Manage-ment of Complex Models”, SIGMOD Record, Dec. 2000 • E. Rahm, P. Bernstein, “On Matching Schemas Automatically,” MSR Tech Report • J. Madhavan, P. Bernstein, E. Rahm, “Generic Schema Matching with Cupid”, VLDB 2001 • S. Alagić, P. Bernstein, “A Model Theory for Generic Schema Management”, DBPL 2001