Data Warehouse View Maintenance

Data WarehouseView Maintenance Presented By: Katrina Salamon For CS561

What is a Data Warehouse? • Repository of integrated information • As information becomes available from a source it is added to the repository

What is a View? • A function from a set of base tables to a derived table • Can be recreated every time the view is accessed

What’s a Materialized View • A view where the tuples are stored in a database (or warehouse) • Can create indexes on them • Provides fast data access • Similar to a cache

What’s View Maintenance • View data becomes out of date when base tables are changed • Updating the view to reflect these changes is called view maintenance

Sounds Easy Right!

Sounds Easy Right! No!

Here’s Why. . . • Data sources are typically legacy systems and do not understand views • Sources can tell the warehouse there is new data, but they cannot determine if any additional data is needed

Examples • Ideal World – new record is added to base relation and view is notified and updated • The Real World • Maintenance Anomaly – trying to update a view while the underlying data is changing • Update Anomaly • Deletion Anomaly

The (Possible) Solutions • Recompute the view • Store all relations involved in the warehouse • Eager Compensating Algorithm (ECA)

Recompute the View • When? • Whenever an update occurs • At a periodic interval • Time and Resource intensive especially in a distributed environment (transferring of data from one source to the other)

Storing Base Relations • Keep up-to-date copies of all relations in the warehouse, queries can be evaluated locally and no anomalies occur • Takes up extra space in the warehouse, storing duplicate data • Copied relations still need to be updated

Eager Compensating Algorithm • Most promising solution • No duplicating base relations or recomputing overheads • All queries sent have compensating queries added to them to offset concurrent updates to the source data

ECA cont. . . • Strongly Consistent • Upon competition of activity, view is consistent with base relations • Every View state has a corresponding state in the base relations and they are completed in order • Not complete • Every source state may not be reflected in a view state (direct mapping)

How ECA Works - 4 basic events • Source executes an update (U) and notification is sent to the warehouse • Warehouse receives update (U) and creates query (Q) to be evaluated by the source • Source evaluates query (Q) against base relations and sends answer (A) to warehouse • Warehouse receives query result and updates view

Resolving Anomalies • Two Updates: Query1 is assumed to be computed before Update2 but is actually computed after Update2 • ECA knows that is happens and takes Update2 into account when Updating the view by using a compensating query for each query it creates

Resolving Issues • When using compensating queries, we should not apply the results until after all related queries have been received • If updates occurred after each query the view could temporally be in an invalid state • To avoid invalid states ECA collects the intermediate answers in a relation called Collect (initialized to empty set)

Example • Three insertions in to three base relations and its affect on the view that references them

ECA-Key • Used to streamline the algorithm when a key from the base relations are available in the View • The Collect relation is initialized to current View and becomes a working copy of the View

ECA-Key Algorithms • Delete received, no query sent, delete is directly applied to Collect • Insert received, query sent, no compensating queries created, answers are added to Collect and duplicate values are ignored because of the keys • Once completed the tuples in Collect replace the tuples for the View

ECA - Local • Combines the compensating queries of ECA and the local updates of ECA-Key to create a more streamlined query • Maintaining order of execution of local and non-local processes is complicated and will create a greater over head then other algorithms • Future work needs to be done to see if this is a worthwhile approach

Performance Comparison Total Bytes Transferredvs.Cardinality of Relation Total Bytes Transferredvs.# of Source Updates

Review of ECA • Incremental updating approach, it doesn’t start from scratch every time • No additional burden placed on sources (timestamps or locks) • Compensating queries are only used when more then update is occurring, keeping computation costs low

Data Warehouse View Maintenance

Data Warehouse View Maintenance

Presentation Transcript

Data Warehouse

Data Warehouse

Data Warehouse

A Transactional Model for Data Warehouse Maintenance

Data Warehouse

Data Warehouse

Data Warehouse

Data Warehouse

Data Warehouse

Data Warehouse

Data Warehouse