650 likes | 809 Views
Bx Tutorial, Database Flavor: Updatable or Invertible Mappings. James F. Terwilliger Microsoft Corporation. How Does the DB Field Use Mappings?. DB. DB. DB. DB. DB. DB. DB. DB. DB. DB. DB. DB’. DB. DB. DB. App Model Over Store. Data Warehouse, Schema Versioning.
E N D
Bx Tutorial, Database Flavor:Updatable or Invertible Mappings James F. Terwilliger Microsoft Corporation
How Does the DB Field Use Mappings? DB DB DB DB DB DB DB DB DB DB DB DB’ DB DB DB App Model Over Store Data Warehouse, Schema Versioning Federated System Exchanged Data Between Applications
Model (Metadata) Management Capabilities? M S T Model? Virtual? Model? Virtual? Language?
In General, Two Approaches M S T M S T
The View Update Problem M S T Concrete Database Application Model, External Schema • Early work abstracted away the exact language of M, focusing on what it means to be an updatable view • As work progressed, focus shifted somewhat to a choice of M – SQL – and deciding when an update policy can be computed
The View Update Problem Query Update Query M S T Relational (Concrete) Relational (Tables only) (Virtual) SQL Let’s use the declarative query tool we know and love – SQL – as a way to express views! (What could possibly go wrong!)
View Update References • Francois Bançilhonand Nicolas Spyratos. Update semantics of relational views. TODS, 6(4):557–575, December 1981. • UmeshwarDayal and Philip A. Bernstein. On the correct translation of update operations on relational views. TODS, 7(3):381–416, September 1982. • G. Gottlob, P. Paolini, and R. Zicari. Properties and update semantics of consistent views. TODS,13(4):486–524, 1988. • Jens Lechtenbörger. The impact of the constant complement approach towards view updating. PODS, 49–55, 2003.
View Updates: The Basics View definition D V f Update statement u u u(D) u(V) (Unique) Transformed update against the physical database f Update translations available for some syntactic restrictions on f
Constant Complement(Semantics of View Updates) D • Updates leave the view complement unchanged • Complement may not be unique (must be chosen to determine update semantics) V V’ D
Great! Where Can I Get It? • Most database vendors do not implement past the SQL92 standard • View must have: • No set operators • No distinct, no grouping • No joins or multiple FROM items • No smoking, talking, or chewing gum • Basically, only simple select/project queries
View Update Limitations (Among Many) • Large queries are hard to debug (and read!) • Given a large query, how to report to the user why a query is not updatable? • DB Tables, not DB DB • Syntactic restrictions are very strict • It is assumed that a query language can make a good view expression language
Outline • A brief look at updatable views • Transformation-based approaches • PRISM/PRIMA • Guava/Channels • Holistic approaches • Object-relational mappings • Data exchange and model management • Whole schemas on both ends • Interesting and notable mappings • Support for interesting update/bidirectional scenarios
Slides adapted with permission from Carlo Curino PRISM/PRIMA (PANTA RHEI)
PRISM M S T Application Model Version N Application Model Version N+1
PRISM Query Update M S T Relational (Virtual) Relational (Concrete) SMOs (Schema Modification Operators)
Schema Evolution: common practice • Starting point: • a Schema S1, • a database DB1 under S1, and • a set of queries Q1 formulated over S1
Schema Evolution: common practice • Evolution in the real world: • The DBA defines an SQL DDL script modifying S1 into S2 • The DBA defines an SQL DML script migrating data from DB1 to DB2 • Queries in Q1 might fail, the DBA adapts them manually:Q2 = Q1’ + Q2_new (new queries added on S2)
Schema Evolution: common practice • Evolution in the real world: • The DBA defines an SQL DDL script modifying S2 into S3 • The DBA defines an SQL DML script migrating data from DB2 to DB3 • Queries in Q2 might fail, the DBA adapts them manually as in Q3 = Q2’ + Q3_new (new queries added on S3)
Schema Evolution: common practice • DB Administrator (DBA) nightmares: • Data Migration: Data loss, redundancy, efficiency of the migration, efficiency of the new design • Impact on Queries and applications • What is the real impact of schema evolution? [iceis2008]
Schema Evolution: Ideal World • Evolution in an ideal world: • Evolution design is assisted and predictable • Data migration scripts are generated automatically • Legacy Queries (and updates, views, integrity constraints,…) are automatically adapted to fit the new schema
Schema Modification Operators • SMOs: • Atomic changes • SQL-inspired syntax • Operates on both schema and data • Practical completeness (coverage of available evolution scenarios)
Data Migration • SMOs to SQL translation: • Each SMO has a clear semantics that can be represented by a set of SQL statements (DML + DDL) • Optimization issues (single/multi SMO) DECOMPOSE TABLE R INTO S(x,y), T(x,z) SMO to SQL translation RENAME TABLE R TO S; CREATE TABLE T(x varchar,z varchar); INSERT INTO T(SELECT x,z FROM S); ALTER TABLE S DROP COLUMN z;
Logical Mapping • SMOs to logical mapping translation: • Language: Disjunctive Embedded Dependencies (DED) • DED is a subset of First Order Logic with expressions in the following form: • Used to capture the relationships between (instances of) consecutive schema versions JOIN TABLE R, S INTO T WHERE cond SMO to DED translation
Query Rewriting Semantics • Data Migration: • d2 is equivalent to d1 migrated by M and modified by U2 • d2 = M(d1) ± U2 • Query Answering: • Goal: answer q1 on d2 • q1(M-1(d2)) • Query Rewriting Semantics: • Executing the rewritten query q1’ is equivalent to migrating d2 back to schema S1, by applying the M-1, and executing q1 • q1’(d2) = q1(M-1(d2))
Inverting SMOs • The Invertibility problem (How do we get M-1?): • DEDs are not invertible! • We deal with the inversion at the SMO-level (easier but not free) • Inverting SMOs: • Not every SMO is perfectly invertible • in this case we use SMO quasi-inverse1(intuitively “the best you can do”-inverse) • Not every SMO has a unique inverse • in this case the DBA must disambiguate these (rare) cases • 1we apply the notion of quasi-inverse by Fagin, Kolaitis, Popa, Tan, in [fkpt2006, fkpt2007b]
SMO Invertibility: tricky cases • Quasi-Inverse example: • JOIN TABLE R, S INTO T WHERE cond; • a valid (and reasonable) quasi-inverse is: • DECOMPOSE TABLE T INTO R(x,y), S(x,z); • Multiple Inverses example: • COPY TABLE R INTO S; • the system provides the following candidates: • DROP TABLE S; • DROP TABLE R; RENAME TABLE S INTO R; • MERGE R,S INTO R;
Query Rewriting • The Rewriting Process: • From the Inverse SMOs we derive the inverse DEDs mapping M-1 • Rewrite queries by chase&backchase (using M-1) [popa2000, deutsch2006] • MARS [mars2003] optimized chase&backchase implementation Q1: SELECT * FROM R MARS Query Rewriting Q1’: SELECT * FROM S,T WHERE S.x = T.x
References PantaRhei Website • schemaevolution.org PRISM • Carlo Curino, Hyun Jin Moon, Carlo Zaniolo: Graceful database schema evolution: the PRISM workbench. PVLDB 1(1): 761-772 (2008) • Carlo Curino, Hyun Jin Moon, Alin Deutsch, Carlo Zaniolo: Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++. PVLDB 4(2): 117-128 (2010)
Slides used with permission by James Terwilliger… hey wait, that’s me! GUAVA, CHANNELS
Channels M S T Concrete Database Application Model
Channels Query Update Schema Δ M S T Relational (Concrete) Relational (Virtual) “Channel”
Example: Unpivot Employee Employee’ Schema: eliminate Name, Age, … columnsSchema: add Attribute, Value Columns Instance: for each row, produce one row for each non-null attribute value, with the attribute name placed in the Attribute column and the value in the Value column.
The unpivot CT – operationally input schema current input instance output schema with output instance input DB input DB CT insert into Employee delete from Employee update Employee add column to Employee add domain element to Exp domain query that uses Employee input DML/DDL/queries insert into Employee’ delete from Employee’ update Employee’ add column to Employee’ add domain element to Exp domain query that uses Employee’ output DML/DDL/queries unpivot the Employee tablewhere ID is the key for the table(unpivot all remaining attributes)
DDL/DML operations accepted by a channel VLDB 2010
CTs are composable into a channel(CTs simply pass statements on untouched that they don’t need to change) Horiz. Merge Vert. Part. Horiz. Merge Apply Unpivot Unpivot Natural Schema Native Schema Query, Insert, Update, Delete, Create, Alter, Dropstatements Query, Insert, Update, Delete, Create, Alter, Drop Error, Loop statements
List of CTs that we have defined/implemented VPartition – vertically partition into two tables with a 1:1 relationship VMerge – vertically merge two tables with a 1:1 relationship HPartition – horizontally partition based on the value in a column HMerge – horizontally merge, and introduce a provenance column Pivot/Unpivot – move data to schema (pivot) or schema to data (unpivot) (Move from/to generic/triple form to regular form.) Apply – apply an invertible function to data in each tuple Adorn – add columns, specify data to be held in the new columns, such as environment data Audit – add start/end timestamps, and ensure that data is deprecated rather than deleted
HMerge (operationally) working on a query:Replace P_Client to Person appropriately P_Client: Person: HMerge P_Staff: πFName, Lname, AgeσT=P_Client Person Query Q = P_Client (SELECT * FROM P_Client)
HMerge (operationally) working on DML - Insert statement P_Client: Person: HMerge P_Staff: I (Person, {FName, LName, T, Age, Cert}, {Gail, Brown, P_Staff, null, X}) I (P_Staff,{FName, LName, Cert}, {(Gail, Brown, X)})
HMerge (operationally) working on DDL:Create table (P_Admin) add column and add domain element statements P_Client: Person: HMerge P_Staff: Add “P_Admin” to the domain of the column Add “Pay” column P_Admin:
Formal definition of a CT • A channel transformation is a 4-tuple (S, I, Q, U) • A schema transformation S • An instance transformation I • A query transformation Q • An update transformation U • S and I define semantics • Never implemented, but define what the CT means • Q and U define operation • Defines the translation of statements against the natural schema to statements against the native schema
Query Correctness Instance transformation for the channel transform d d Query transformation for the channel transform I q Query against the virtual database q q = Q(s, q) Q Transformed query against physical database The query transformation is correct with respect to the instance transformation
Update Correctness for Schema Schema transformation for the channel transform Update transformation for the channel transform s s S Update statement (DML or DDL) u u u = U(s, u) u(s) u(s) Transformed update against the physical database S The update transformation is correct with respect to the schema transformation
Update Correctness for Instance s s S PutGet Update statement against table “t” u u u = U(s, u) u(s) u(s) Q qt qt= Q(u(s), qt) qt Query: “SELECT * FROM t” The update transformation is correct with respect to the instance transformation
References Overall Architecture (Guava): • James F. Terwilliger, Lois M. L. Delcambre, Judith Logan: Querying through a user interface. Data Knowl. Eng. 63(3): 774-794 (2007) Channels: • James F. Terwilliger, Lois M. L. Delcambre, David Maier, Jeremy Steinhauer, Scott Britell: Updatable and Evolvable Transforms for Virtual Databases. PVLDB 3(1): 309-319 (2010)
Object-Relational Mappings: Overview • Applications written in an object-oriented language have object-oriented data tiers • Persistence is a relational database • “Impedance mismatch” • Map object constructs to relational constructs
Object-Relational Mappings M S T Concrete Database Application Model (Arrow Direction Could Go Either Way)
Object-Relational Mappings Query Update (Schema Δ) M S T Relational (Concrete) Object-Oriented (Virtual) • Specification • Relational equivalences • Mapping strategies
An O-R Mapping Is… • … generally an operational specification rather than a declarative query or set of queries • … tailored more to the purpose of mapping inheritance and relationships to relations rather than a general-purpose mapping