260 likes | 418 Views
Data Warehouse View Maintenance. Presented By: Katrina Salamon For CS561. What is a Data Warehouse?. Repository of integrated information As information becomes available from a source it is added to the repository. What is a View?. A function from a set of base tables to a derived table
E N D
Data WarehouseView Maintenance Presented By: Katrina Salamon For CS561
What is a Data Warehouse? • Repository of integrated information • As information becomes available from a source it is added to the repository
What is a View? • A function from a set of base tables to a derived table • Can be recreated every time the view is accessed
What’s a Materialized View • A view where the tuples are stored in a database (or warehouse) • Can create indexes on them • Provides fast data access • Similar to a cache
What’s View Maintenance • View data becomes out of date when base tables are changed • Updating the view to reflect these changes is called view maintenance
Here’s Why. . . • Data sources are typically legacy systems and do not understand views • Sources can tell the warehouse there is new data, but they cannot determine if any additional data is needed
Examples • Ideal World – new record is added to base relation and view is notified and updated • The Real World • Maintenance Anomaly – trying to update a view while the underlying data is changing • Update Anomaly • Deletion Anomaly
The (Possible) Solutions • Recompute the view • Store all relations involved in the warehouse • Eager Compensating Algorithm (ECA)
Recompute the View • When? • Whenever an update occurs • At a periodic interval • Time and Resource intensive especially in a distributed environment (transferring of data from one source to the other)
Storing Base Relations • Keep up-to-date copies of all relations in the warehouse, queries can be evaluated locally and no anomalies occur • Takes up extra space in the warehouse, storing duplicate data • Copied relations still need to be updated
Eager Compensating Algorithm • Most promising solution • No duplicating base relations or recomputing overheads • All queries sent have compensating queries added to them to offset concurrent updates to the source data
ECA cont. . . • Strongly Consistent • Upon competition of activity, view is consistent with base relations • Every View state has a corresponding state in the base relations and they are completed in order • Not complete • Every source state may not be reflected in a view state (direct mapping)
How ECA Works - 4 basic events • Source executes an update (U) and notification is sent to the warehouse • Warehouse receives update (U) and creates query (Q) to be evaluated by the source • Source evaluates query (Q) against base relations and sends answer (A) to warehouse • Warehouse receives query result and updates view
Resolving Anomalies • Two Updates: Query1 is assumed to be computed before Update2 but is actually computed after Update2 • ECA knows that is happens and takes Update2 into account when Updating the view by using a compensating query for each query it creates
Resolving Issues • When using compensating queries, we should not apply the results until after all related queries have been received • If updates occurred after each query the view could temporally be in an invalid state • To avoid invalid states ECA collects the intermediate answers in a relation called Collect (initialized to empty set)
Example • Three insertions in to three base relations and its affect on the view that references them
ECA-Key • Used to streamline the algorithm when a key from the base relations are available in the View • The Collect relation is initialized to current View and becomes a working copy of the View
ECA-Key Algorithms • Delete received, no query sent, delete is directly applied to Collect • Insert received, query sent, no compensating queries created, answers are added to Collect and duplicate values are ignored because of the keys • Once completed the tuples in Collect replace the tuples for the View
ECA - Local • Combines the compensating queries of ECA and the local updates of ECA-Key to create a more streamlined query • Maintaining order of execution of local and non-local processes is complicated and will create a greater over head then other algorithms • Future work needs to be done to see if this is a worthwhile approach
Performance Comparison Total Bytes Transferredvs.Cardinality of Relation Total Bytes Transferredvs.# of Source Updates
Review of ECA • Incremental updating approach, it doesn’t start from scratch every time • No additional burden placed on sources (timestamps or locks) • Compensating queries are only used when more then update is occurring, keeping computation costs low