240 likes | 336 Views
View Materialization & Maintenance Strategies. By Ashkan Bayati & Ali Reza Vazifehdoost. Motivation. Complex Queries -Decision support queries -OLAP -Statistical Analysis -Business Intelligence -Aggregation Large data sets collected from Heterogeneous remote sources.
E N D
View Materialization & Maintenance Strategies By Ashkan Bayati & Ali Reza Vazifehdoost
Motivation Complex Queries -Decision support queries -OLAP -Statistical Analysis -Business Intelligence -Aggregation Large data sets collected from Heterogeneous remote sources
View Materialization & Maintenance • View materialization is the process of pre-computing views (summarized information) in order to gain performance • Drawback is to keep the view consistent when the underlying data sources change • View Maintenance is the process of keeping the view consistent with the underlying source tables
Incremental View Maintenance • Relevant Updates only affect the view • The aim of incremental view maintenance is to re-compute the view considering only the net changes that have taken place instead of re-calculating the view from scratch.
Selection • V = Ө C(y) (r) any tuple that satisfies C(y) will be in the view • After inserts and deletes we get V’= V + Ө C(y) (i) - Ө C(y) (d) • The view can be incrementally maintained by: Inserting Ө C(y) (i) into V (Insert( V, Ө C(y) (i))) Deleting Ө C(y) (d) from V ( Delete( V, Ө C(y) (d))
Projection • Problem with Projections: • Imagine if you delete (1,10) from the base table • Solution is to keep the key in the view or use a counter
Joins • Inserts: Let V = r s and r’ = r i then: V’= r’ s = (r i) s = (r s) (i s) = V (i s) Deletes are similar
View Maintenance in Dynamic Environments • Dynamic environment is specified here as one that covers both data updates and schema changes • Interleaving data updates and schema changes can cause problems • The following steps need to be taken: - Optimize updates based on their source relations and update types. - For schema changes that effect the view definition perform a view evolution process. - Perform view adaptation to make the view consistent.
Optimize Updates • DU’ = п (attr (R)) ∩ (attr(R’)) <DU> • Its obvious to see п(attr (R)) ∩ (attr(R’)) contains all the attributes related to the view redefinition. This is essentially because neither dropped nor added attributes will appear in the view definition. • The relationship between SC and DU are: • If SCi’ contains drop relation Ri, then DUi ={} and SCi’ = drop relation Ri. • If SCi’ contains drop attribute operation both SCi’ and DUi’ might not be empty • If SCi’ contains no drop operation, then DUi’=DUi.
Example • Assume a view V(A,B,C,D) is defined as R1(A,B) R2(A,C) R3(C,D). Suppose R1 has the following sequence of updates { +(3,2),(1,4)} and relation R2 has the update sequence { + (3,4), add field E, +(4,5,6), drop field C, -(5,7)}. • Hence we get DU2= {+(3,4),+(4,5,6),-(5,7)} and R2=(A,C) and R2’=(A,E). From this information you can see that attr((R)) ∩ (attr(R’))={A} ;hence DU2’={+3,+4,-5}.
Evolving View Definition • Applying view synchronization:
Making the view consistent • Now that the schema is consistent we need the view to become synchronized with the underlying base table updates. Many mechanisms have been defined I will explain more on this issue later.
Efficient VM over distributed data sources • Materialized view integrate and store data from distributed data sources to ensure better access, higher performance and better availability. • Since the data sources are distributed the network cost involved in transferring the net changes can also be dramatic. • State of the art view maintenance requires 0(n^2) maintenance queries to remote data sources with n being the number of data sources in the view definition.
Goal • The aim is to restructure the view maintenance queries in order to reduce costs. • HOW?? • Assume the Materialized view R1► R 2 ► R 3 ► R 4. (►= join)
Restructuring Batch View Maintenance • State of ART: • Ri’=Ri + Ri • Hence O(n^2)
Adjacent Grouping • Adjacent Grouping (share common access to the maintenance Queries): For the previous example divide it up into two groups. • It becomes • (R1 ►R2 +R1’ ►R2) ►R3►R4 + (R3►R4+ R 3’ ►R4)►R1’ + R2’ hence 12 queries have been reduced to 8 hence O(n^1.5)
Conditional Grouping • A more aggressive method is called conditional grouping whose execution is 2*(n-1) maintenance queries. • Scroll up phase
Conditional Grouping Cont • Scroll Down phase
Self Maintenance of Multiple SPJ Views • The view V at level 0 can be described in terms of nodes at level as tmp1 tmp3. • Some tuples of tmp1 and tmp3 do not join into the view V; hence, we store these tuples in their respective AV’s for tmp1 and tmp3 at level 1.
Update takes place in Relation R • There are two possible paths that U (update) can take to find its way to the root node: • ∆V = (((U AV(S)) AV(T)) AV(tmp1))) • ∆V = V U
Sub-trees • With this approach, a change in any sub-tree can be propagated to the root node without re-computing any of the other sub-trees. • Since we only store tuples at level i if they do not join into the node at level i+1, the tuples are not duplicated in the tree.
Benefits of this approach • The benefits of this procedure can be summarized as follows: • Changes to the view of a sub-tree only effectively change the root of that sub-tree • The view updates can effectively be computed by joining only subsets of base relations rather than the entire base relation. As an example ∆V = (((U AV(S)) AV(T)) AV(tmp1))) rather than the traditional method ∆V = (((U S) T) AV(tmp1)))
Multiple View Maintenance • Essentially the same as single view maintenance however the AV of the shared node in the tree will be different.
Auxiliary View Structure • AV(temp3) stores tuples that do not join in V and do not join into V’ into two separate AV’s. The problem with this scheme is that it stores the set AV(temp3) (V) ∩ AV(temp3) (V’).The sub-tree represented by intermediate node temp3 will be recomputed twice and the views V and V’ will be updated separately. • AV(temp3) stores tuples that do not join in view V and tuples that do not join in view V’ in three AV’s: AV(temp3)(V) , AV(temp3)(V’) and AV(temp3)(V∩V’). This eliminates duplicates, this will cut down the computational cost but incurs additional overhead of placing tuples in the correct AV.