1 / 23

Data Warehouse View Maintenance

Data Warehouse View Maintenance. Presented By: Katrina Salamon For CS561. What is a Data Warehouse?. Repository of integrated information As information becomes available from a source it is added to the repository. What is a View?. A function from a set of base tables to a derived table

sven
Download Presentation

Data Warehouse View Maintenance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data WarehouseView Maintenance Presented By: Katrina Salamon For CS561

  2. What is a Data Warehouse? • Repository of integrated information • As information becomes available from a source it is added to the repository

  3. What is a View? • A function from a set of base tables to a derived table • Can be recreated every time the view is accessed

  4. What’s a Materialized View • A view where the tuples are stored in a database (or warehouse) • Can create indexes on them • Provides fast data access • Similar to a cache

  5. What’s View Maintenance • View data becomes out of date when base tables are changed • Updating the view to reflect these changes is called view maintenance

  6. Sounds Easy Right!

  7. Sounds Easy Right! No!

  8. Here’s Why. . . • Data sources are typically legacy systems and do not understand views • Sources can tell the warehouse there is new data, but they cannot determine if any additional data is needed

  9. Examples • Ideal World – new record is added to base relation and view is notified and updated • The Real World • Maintenance Anomaly – trying to update a view while the underlying data is changing • Update Anomaly • Deletion Anomaly

  10. The (Possible) Solutions • Recompute the view • Store all relations involved in the warehouse • Eager Compensating Algorithm (ECA)

  11. Recompute the View • When? • Whenever an update occurs • At a periodic interval • Time and Resource intensive especially in a distributed environment (transferring of data from one source to the other)

  12. Storing Base Relations • Keep up-to-date copies of all relations in the warehouse, queries can be evaluated locally and no anomalies occur • Takes up extra space in the warehouse, storing duplicate data • Copied relations still need to be updated

  13. Eager Compensating Algorithm • Most promising solution • No duplicating base relations or recomputing overheads • All queries sent have compensating queries added to them to offset concurrent updates to the source data

  14. ECA cont. . . • Strongly Consistent • Upon competition of activity, view is consistent with base relations • Every View state has a corresponding state in the base relations and they are completed in order • Not complete • Every source state may not be reflected in a view state (direct mapping)

  15. How ECA Works - 4 basic events • Source executes an update (U) and notification is sent to the warehouse • Warehouse receives update (U) and creates query (Q) to be evaluated by the source • Source evaluates query (Q) against base relations and sends answer (A) to warehouse • Warehouse receives query result and updates view

  16. Resolving Anomalies • Two Updates: Query1 is assumed to be computed before Update2 but is actually computed after Update2 • ECA knows that is happens and takes Update2 into account when Updating the view by using a compensating query for each query it creates

  17. Resolving Issues • When using compensating queries, we should not apply the results until after all related queries have been received • If updates occurred after each query the view could temporally be in an invalid state • To avoid invalid states ECA collects the intermediate answers in a relation called Collect (initialized to empty set)

  18. Example • Three insertions in to three base relations and its affect on the view that references them

  19. ECA-Key • Used to streamline the algorithm when a key from the base relations are available in the View • The Collect relation is initialized to current View and becomes a working copy of the View

  20. ECA-Key Algorithms • Delete received, no query sent, delete is directly applied to Collect • Insert received, query sent, no compensating queries created, answers are added to Collect and duplicate values are ignored because of the keys • Once completed the tuples in Collect replace the tuples for the View

  21. ECA - Local • Combines the compensating queries of ECA and the local updates of ECA-Key to create a more streamlined query • Maintaining order of execution of local and non-local processes is complicated and will create a greater over head then other algorithms • Future work needs to be done to see if this is a worthwhile approach

  22. Performance Comparison Total Bytes Transferredvs.Cardinality of Relation Total Bytes Transferredvs.# of Source Updates

  23. Review of ECA • Incremental updating approach, it doesn’t start from scratch every time • No additional burden placed on sources (timestamps or locks) • Compensating queries are only used when more then update is occurring, keeping computation costs low

More Related