220 likes | 327 Views
A Transactional Model for Data Warehouse Maintenance. Authored by: Jun Chen, Songting Chen, Elke A. Rundensteiner Published in ER’2002, Finland D atabase S ystems R esearch G roup Worcester Polytechnic Institute. DWMS. Wrapper. Wrapper. Wrapper. Base. Base. Base.
E N D
A Transactional Model for Data Warehouse Maintenance Authored by: Jun Chen, Songting Chen, Elke A. Rundensteiner Published in ER’2002, Finland Database Systems Research Group Worcester Polytechnic Institute
DWMS Wrapper Wrapper Wrapper . . . Base Base Base Data Warehousing • Data Integration from Remote Base Sources • Difficult and Labor-Intensive • Better Do it only ONCE and Materialize the Results • Share Materialized Data by Many Applications Data Warehouse
Data Warehouse Maintenance • Motivation: Keep Data Warehouse (DW) Update-to-Date • Base Changes over Time • Source Data Updates • insert, delete, update • Source Schema Changes • add, drop, rename • Basic Idea: Incremental instead of Re-computation • Re-computation may take weeks
General Maintenance Algorithms • View Maintenance (VM) • Incrementally incorporate source data updates • [BLT86], [GMS93], [ZGH+95], [SBC+00] • View Synchronization (VS) • Rewrite data warehouse view definition after the schema of the source changed • [NLR98], [LNR02] • View Adaptation (VA) • Adapt view extent after the view definition changed • [NR99], [GMR+01]
Insert ( ‘Steve’, ‘Boston’) DW Maintenance Example CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer C, FlightRes F WHERE C.Name = F.Name AND F.Dest = ‘Asia’; View: Asia_Traveller Name Address FlightNo Dave WPI AA8384 Select FlightNo from FlightRes where Name=‘Steve’ Customer FlightRes Name Address Name Age FlightNo Dest Dave WPI Dave 22 AA8384 Asia Ellen MA Steve 22 UA77788 Europe
View: Asia_Traveller Broken Query! Name Address FlightNo Dave WPI AA8384 3. Select FlightNo from FlightRes where Name=‘Steve’ Customer FlightRes Name Address Name Age FlightNo Dest Dave WPI Dave 22 AA8384 Asia Ellen MA Steve 22 UA77788 Europe 2. Rename (FlightRes, FlightReservation) 1. Insert ( ‘Steve’, ‘Boston’) Maintenance Anomaly Problem
Inside Broken Query • Two Transactions • Base Update Transaction • w(Bi)c(Bi) • DW Maintenance Transaction • r(B1)r(B2)…r(Bn)w(DW)c(DW) • Read-write conflicts between two transactions • Two Independent Transactions • w(Bi) / r(Bi) • Data Update w(Bi): Incorrect Query Results [ZGH+95] • Schema Change w(Bi): Broken Query
A Transactional Approach • A Global Transaction Model • DWMS_Transaction • Integrates both base update transaction and its corresponding DW maintenance transaction • w(Bi)c(Bi)r(B1)r(B2)…r(Bn)w(DW)c(DW) • Maintenance Anomaly • Rephrased to read-write conflicts of DWMS_Transactions • w(Bi)c(Bi)r(B1)r(B2)…r(Bj)…r(Bn)w(DW)c(DW) • w(Bj)c(Bj)r(B1)r(B2)…r(Bn)w(DW)c(DW)
Serializability of DWMS_Transaction • Theorem • A history of DWMS_Transactions S is serializable iff it is equivalent to some serial schedule S’ of the same DWMS_Transactions. • Basis for Solving Anomaly Problems • To solve the anomaly problem, we need all DWMS_Transactions serializable.
Traditional Serializability Algorithms • Lock-based • Reads / writes acquire locks for access to shared resources • Transactions block each other • Multiversion-based • Write on a version, read on another version • Transactions do not block each other
Traditional Serializability Algorithms • Lock-based • Read / write would need to lock data in sources? • Not desirable in DW environment • Data sources are autonomous • Not realistic to impose locking on them • Multiversion-based • Do not block each other • Desirable in DW environment • DW and data sources do not block each other • Need to maintain versions somewhere
CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer’ C,FlightRes’ F WHERE C.Name = F.Name AND F.Dest = ‘Asia’; View: Asia_Traveller Name Address FlightNo Customer’ Dave WPI AA8384 FlightRes’ Wrapper Meta Relation Wrapper Meta Relation Name Address Rel Attr N. A. F. D. Rel Attr Dave WPI Cust’ Name … … … … Fli’ Name Customer FlightRes Ellen MA Cust’ Address … … … … … … Name Address Name Age FlightNo Dest Dave WPI Dave 22 AA8384 Asia Ellen MA Steve 22 UA77788 Europe TxnWrap: A Multiversion Algorithm CREATE VIEW Asia_Traveller AS SELECT C.Name, C.Address, F.FlightNo FROM Customer C, FlightRes F WHERE C.Name = F.Name AND F.Dest = ‘Asia’;
Versioned Wrapper Wrapper for Customer Relation Customer’ Meta Relation • Semantics: life time of a tuple is • #born <= time < #dead
Name Address #born #dead Dave WPI 0 Relation Customer’ (state 2 ) Ellen MA 0 Name Address #born #dead Dave WPI 0 1 Ellen MA 0 Relation Customer’ (state 1 ) Stove Boston 1 Name Address #born #dead Meta Relation (state 2 ) Rel Attr Rel’ Attr’ #born #dead Dave WPI 0 1 C’ Name - - 0 Ellen MA 0 C’ Addr. - - 0 2 Steve Boston 1 Source Updates on Versioned Wrapper Relation Customer’ (Init) Transaction1: 1. DELETE FROM Customer C WHERE C.Name = ‘Dave’; 2. INSERT (‘Steve’, ‘Boston’); Transcation 2: Drop Customer.Address;
Relation Customer’ (State 1 ) Name Address #born #dead Dave WPI 0 1 Ellen MA 0 Stove Boston 1 DW Maintenance Query Rewritten for Versioned Wrapper The maintenance query issued in Transaction2: SELECT Name, Address FROM Customer WHERE condition; Rewritten versioned maintenance query: SELECT Name, Address FROM Customer’ WHERE condition and #born <= 2 and #dead > 2;
Performance Evaluation • Implementation • In Java • Platform: Oracle, JDBC on Windows NT • Embedded in DyDa [CCZ+01] System at WPI • Testbed • 6 data sources with one relation each • Each relation has 4 attributes and 100,000 tuples • One materialized joined view over these data sources • TxnWrap VS. compensation (SWEEP [AAS+97] & DyDa)
Related Work • View Maintenance • View Maintenance / Synchronization / Adaptation • Maintenance Anomaly • ECA [ZGH+95], SWEEP [AAS+97] handles only concurrent data updates • Compensation-based • Performance degrades at a high load • Multi-version Algorithms • 2-version, n-version, unlimited-version algorithms [MPL92]
Conclusions • Identify the Maintenance Anomaly Problem in mixed model environment • Design a global Transaction DWMS_Transaction model that integrates both source update transaction and maintenance transaction. • Rephrase the maintenance anomaly in terms of serializability of DWMS_Transactions • Propose multiversion algorithm to achieve serializability • Implemented the maintenance solution in Dyda • Achieve stable performance under various workloads
Other Activities and Future Work • Batching of updates into more complex maintenance plans • Parallelism of maintenance processes • Support more complex views, e.g., aggregation • Generalize to more change types • Provide alternate view synchronization algorithms • Discovery of changes by non-cooperating sources • Discovery of meta data in terms of source relationships of distributed sources • Move beyond relational middle-layer model