240 likes | 480 Views
VIEWS. Prof. Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani. Topics. Query Modification View Materialization Which Views to Materialize? How to exploit Materialized Views to answer queries? View Maintenance. View Modification (Evaluate On Demand).
E N D
VIEWS Prof. Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani
Topics • Query Modification • View Materialization • Which Views to Materialize? • How to exploit Materialized Views to answer queries? • View Maintenance
View Modification(Evaluate On Demand) CREATE VIEWRegionalSales(category,sales,state) ASSELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locid View Query SELECT R.category, R.state, SUM(R.sales) FROMRegionalSales AS R GROUP BY R.category, R.state Modified Query SELECT R.category, R.state, SUM(R.sales) FROM (SELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locid) AS R GROUP BY R.category, R.state
View Materialization (Precomputation) • Suppose we precompute RegionalSales and store it with a clustered B+ tree index on [category,state,sales]. • Then, previous query can be answered by an index-only scan. SELECT R.state, SUM(R.sales) FROM RegionalSales R WHERE R.category=“Laptop” GROUP BY R.state SELECT R.state, SUM(R.sales) FROM RegionalSales R WHERE R. state=“Wisconsin” GROUP BY R.category Index on precomputed view is great! Index is less useful (must scan entire leaf level).
Materialized Views • A view whose tuples are stored in the database is said to be materialized • Provides fast access, like a (very high-level) cache. • Need to maintain the view as the underlying tables change. • Ideally, we want incremental view maintenance algorithms. • Close relationship to Data Warehousing, OLAP,
Issues in View Materialization • What views should we materialize, and what indexes should we build on the precomputed results? • Given a query and a set of materialized views, can we use the materialized views to answer the query? • How frequently should we refresh materialized views to make them consistent with the underlying tables? (And how can we do this incrementally?)
View Materialization:Example SELECT P.Category, SUM(S.sales) FROM Product P, Sales S WHERE P.pid=S.pid GROUP BY P.Category Both queries require us to join the Sales table with another table & aggregate the result SELECT L.State, SUM(S.sales) FROM Location L, Sales S WHERE L.locid=S.locid GROUP BY L.State How can we use materialization to speed up these queries?
View Materialization:Example • Pre-compute the two joins involved ( product & sales & Location & sales) • Pre-compute each query in its entirety • OR let us define the following view: CREATE VIEW TOTALSALES (pid, lid, total) AS Select S.pid, S.locid, SUM(S.sales) FROM Sales S GROUP BY S.pid, S.locid
View Materialization:Example • The View TOTALSALES can be materialized & used instead os Sales in our two example queries SELECT P.Category, SUM(T.Total) FROM Product P, TOTALSALES T WHERE P.pid=T.pid GROUP BY P.Category SELECT L.State, SUM(T.Total) FROM Location L, TOTALSALES T WHERE L.locid=T.locid GROUP BY L.State
View Maintenance • A materialized view is said to be refreshed when it is made consistent with changes ot its underlying tables • Often referred to as VIEW MAINTENANCE • Two issues: • HOW do we refresh a view when an underlying table is refreshed? Can we do it incrementally? • WHEN should we refresh a view in response to a change in the underlying table?
View Maintenance • The task of keeping a materialized view up-to-date with the underlying data is known as materialized view maintenance • Materialized views can be maintained by recomputation on every update • A better option is to use incremental view maintenance • Changes to database relations are used to compute changes to materialized view, which is then updated • View maintenance can be done by • Manually defining triggers on insert, delete, and update of each relation in the view definition • Manually written code to update the view whenever database relations are updated • Supported directly by the database
View Maintenance • Two steps: • Propagate: Compute changes to view when data changes. • Refresh: Apply changes to the materialized view table. • Maintenance policy: Controls when we do refresh. • Immediate: As part of the transaction that modifies the underlying data tables. (+ Materialized view is always consistent; -updates are slowed) • Deferred: Some time later, in a separate transaction. (-View becomes inconsistent; + can scale to maintain many views without slowing updates)
Deferred Maintenance Three flavors: • Lazy: Delay refresh until next query on view; then refresh before answering the query. • Periodic (Snapshot): Refresh periodically. Queries possibly answered using outdated version of view tuples. Widely used, especially for asynchronous replication in distributed databases, and for warehouse applications. • Event-based: E.g., Refresh after a fixed number of updates to underlying data tables.
View Maintenance:Incremental Algorithms • Recomputing the view when an underlying table is modified – straightforward approach • Not feasible to do so for all changes made • Ideally algorithms for refreshing a view should be incremental • Cost of refresh is proportional to the extent of the change
View Maintenance:Incremental Algorithms • Note that a given row in the materialized view can appear many times (duplicates are not eliminated) • Main idea behind incremental algorithms is to efficiently compute changes to the rows of the view • New rows • Changes to count associated with a row • A row is deleted if its count becomes 0
View Maintenance • The changes (inserts and deletes) to a relation or expressions are referred to as its differential • Set of tuples inserted to and deleted from r are denoted ir and dr • To simplify our description, we only consider inserts and deletes • We replace updates to a tuple by deletion of the tuple followed by insertion of the update tuple • We describe how to compute the change to the result of each relational operation, given changes to its inputs • We then outline how to handle relational algebra expressions
Join Operation • Consider the materialized view v = r s and an update to r • Let rold and rnew denote the old and new states of relation r • Consider the case of an insert to r: • We can write rnew s as (rold ir) s • And rewrite the above to (rold s) (ir s) • But (rold s) is simply the old value of the materialized view, so the incremental change to the view is just ir s • Thus, for inserts vnew = vold (ir s) • Similarly for deletes vnew = vold – (dr s)
Selection & Projection Operations • Selection: Consider a view v = (r). • vnew = vold (ir) • vnew = vold - (dr) • Projection is a more difficult operation • R = (A,B), and r(R) = { (a,2), (a,3)} • A(r) has a single tuple (a). • If we delete the tuple (a,2) from r, we should not delete the tuple (a) from A(r), but if we then delete (a,3) as well, we should delete the tuple • For each tuple in a projection A(r) , we will keep a count of how many times it was derived • On insert of a tuple to r, if the resultant tuple is already in A(r) we increment its count, else we add a new tuple with count = 1 • On delete of a tuple from r, we decrement the count of the corresponding tuple in A(r) • if the count becomes 0, we delete the tuple from A(r)
Aggregate Operations • count : v = Agcount(B)(r) ( count of the attribute B, after grouping r by attribute A) • Whena set of tuples ir is inserted • For each tuple t in ir, if the group t.A is present in v, we increment its count, else we add a new tuple (t.A, 1) with count = 1 • When a set of tuples dr is deleted • for each tuple t in ir.we look for the group t.A in v, and subtract 1 from the count for the group. • If the count becomes 0, we delete from v the tuple for the group t.A
branch_name account_number balance Perryridge Perryridge Brighton Brighton Redwood A-102 A-201 A-217 A-215 A-222 400 900 750 750 700 branch_name sum(balance) Perryridge Brighton Redwood 1300 1500 700 Example • Relation account grouped by branch-name: branch_nameg sum(balance) (account)
Aggregate Operations • sum: v = Agsum (B)(r) • We maintain the sum in a manner similar to count, except we add/subtract the B value instead of adding/subtracting 1 for the count • Additionally we maintain the count in order to detect groups with no tuples. Such groups are deleted from v • Cannot simply test for sum = 0 (why?) • To handle the case of avg, we maintain the sum and count aggregate values separately, and divide at the end
Aggregate Operations • min, max: v = Agmin (B) (r). • Handling insertions on r is straightforward. • Maintaining the aggregate values min and max on deletions may be more expensive.We have to look at the other tuples of r that are in the same group to find the new minimum
TEST IDNO Marks T1 T1 T2 T2 T3 A-102 A-103 A-102 A-103 A-104 15 20 25 10 10 TEST Min(marks) T1 T2 15 10 Example Testg Min(marks) (student_record))