VIEWS

VIEWS Prof. Navneet Goyal Department of Computer Science & Information Systems BITS, Pilani

Topics • Query Modification • View Materialization • Which Views to Materialize? • How to exploit Materialized Views to answer queries? • View Maintenance

View Modification(Evaluate On Demand) CREATE VIEWRegionalSales(category,sales,state) ASSELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locid View Query SELECT R.category, R.state, SUM(R.sales) FROMRegionalSales AS R GROUP BY R.category, R.state Modified Query SELECT R.category, R.state, SUM(R.sales) FROM (SELECT P.category, S.sales, L.state FROM Products P, Sales S, Locations L WHERE P.pid=S.pid AND S.locid=L.locid) AS R GROUP BY R.category, R.state

View Materialization (Precomputation) • Suppose we precompute RegionalSales and store it with a clustered B+ tree index on [category,state,sales]. • Then, previous query can be answered by an index-only scan. SELECT R.state, SUM(R.sales) FROM RegionalSales R WHERE R.category=“Laptop” GROUP BY R.state SELECT R.state, SUM(R.sales) FROM RegionalSales R WHERE R. state=“Wisconsin” GROUP BY R.category Index on precomputed view is great! Index is less useful (must scan entire leaf level).

Materialized Views • A view whose tuples are stored in the database is said to be materialized • Provides fast access, like a (very high-level) cache. • Need to maintain the view as the underlying tables change. • Ideally, we want incremental view maintenance algorithms. • Close relationship to Data Warehousing, OLAP,

Issues in View Materialization • What views should we materialize, and what indexes should we build on the precomputed results? • Given a query and a set of materialized views, can we use the materialized views to answer the query? • How frequently should we refresh materialized views to make them consistent with the underlying tables? (And how can we do this incrementally?)

View Materialization:Example SELECT P.Category, SUM(S.sales) FROM Product P, Sales S WHERE P.pid=S.pid GROUP BY P.Category Both queries require us to join the Sales table with another table & aggregate the result SELECT L.State, SUM(S.sales) FROM Location L, Sales S WHERE L.locid=S.locid GROUP BY L.State How can we use materialization to speed up these queries?

View Materialization:Example • Pre-compute the two joins involved ( product & sales & Location & sales) • Pre-compute each query in its entirety • OR let us define the following view: CREATE VIEW TOTALSALES (pid, lid, total) AS Select S.pid, S.locid, SUM(S.sales) FROM Sales S GROUP BY S.pid, S.locid

View Materialization:Example • The View TOTALSALES can be materialized & used instead os Sales in our two example queries SELECT P.Category, SUM(T.Total) FROM Product P, TOTALSALES T WHERE P.pid=T.pid GROUP BY P.Category SELECT L.State, SUM(T.Total) FROM Location L, TOTALSALES T WHERE L.locid=T.locid GROUP BY L.State

View Maintenance • A materialized view is said to be refreshed when it is made consistent with changes ot its underlying tables • Often referred to as VIEW MAINTENANCE • Two issues: • HOW do we refresh a view when an underlying table is refreshed? Can we do it incrementally? • WHEN should we refresh a view in response to a change in the underlying table?

View Maintenance • The task of keeping a materialized view up-to-date with the underlying data is known as materialized view maintenance • Materialized views can be maintained by recomputation on every update • A better option is to use incremental view maintenance • Changes to database relations are used to compute changes to materialized view, which is then updated • View maintenance can be done by • Manually defining triggers on insert, delete, and update of each relation in the view definition • Manually written code to update the view whenever database relations are updated • Supported directly by the database

View Maintenance • Two steps: • Propagate: Compute changes to view when data changes. • Refresh: Apply changes to the materialized view table. • Maintenance policy: Controls when we do refresh. • Immediate: As part of the transaction that modifies the underlying data tables. (+ Materialized view is always consistent; -updates are slowed) • Deferred: Some time later, in a separate transaction. (-View becomes inconsistent; + can scale to maintain many views without slowing updates)

Deferred Maintenance Three flavors: • Lazy: Delay refresh until next query on view; then refresh before answering the query. • Periodic (Snapshot): Refresh periodically. Queries possibly answered using outdated version of view tuples. Widely used, especially for asynchronous replication in distributed databases, and for warehouse applications. • Event-based: E.g., Refresh after a fixed number of updates to underlying data tables.

View Maintenance:Incremental Algorithms • Recomputing the view when an underlying table is modified – straightforward approach • Not feasible to do so for all changes made • Ideally algorithms for refreshing a view should be incremental • Cost of refresh is proportional to the extent of the change

View Maintenance:Incremental Algorithms • Note that a given row in the materialized view can appear many times (duplicates are not eliminated) • Main idea behind incremental algorithms is to efficiently compute changes to the rows of the view • New rows • Changes to count associated with a row • A row is deleted if its count becomes 0

View Maintenance • The changes (inserts and deletes) to a relation or expressions are referred to as its differential • Set of tuples inserted to and deleted from r are denoted ir and dr • To simplify our description, we only consider inserts and deletes • We replace updates to a tuple by deletion of the tuple followed by insertion of the update tuple • We describe how to compute the change to the result of each relational operation, given changes to its inputs • We then outline how to handle relational algebra expressions

Join Operation • Consider the materialized view v = r s and an update to r • Let rold and rnew denote the old and new states of relation r • Consider the case of an insert to r: • We can write rnew s as (rold ir) s • And rewrite the above to (rold s)  (ir s) • But (rold s) is simply the old value of the materialized view, so the incremental change to the view is just ir s • Thus, for inserts vnew = vold (ir s) • Similarly for deletes vnew = vold – (dr s)

Selection & Projection Operations • Selection: Consider a view v = (r). • vnew = vold (ir) • vnew = vold - (dr) • Projection is a more difficult operation • R = (A,B), and r(R) = { (a,2), (a,3)} • A(r) has a single tuple (a). • If we delete the tuple (a,2) from r, we should not delete the tuple (a) from A(r), but if we then delete (a,3) as well, we should delete the tuple • For each tuple in a projection A(r) , we will keep a count of how many times it was derived • On insert of a tuple to r, if the resultant tuple is already in A(r) we increment its count, else we add a new tuple with count = 1 • On delete of a tuple from r, we decrement the count of the corresponding tuple in A(r) • if the count becomes 0, we delete the tuple from A(r)

Aggregate Operations • count : v = Agcount(B)(r) ( count of the attribute B, after grouping r by attribute A) • Whena set of tuples ir is inserted • For each tuple t in ir, if the group t.A is present in v, we increment its count, else we add a new tuple (t.A, 1) with count = 1 • When a set of tuples dr is deleted • for each tuple t in ir.we look for the group t.A in v, and subtract 1 from the count for the group. • If the count becomes 0, we delete from v the tuple for the group t.A

branch_name account_number balance Perryridge Perryridge Brighton Brighton Redwood A-102 A-201 A-217 A-215 A-222 400 900 750 750 700 branch_name sum(balance) Perryridge Brighton Redwood 1300 1500 700 Example • Relation account grouped by branch-name: branch_nameg sum(balance) (account)

Aggregate Operations • sum: v = Agsum (B)(r) • We maintain the sum in a manner similar to count, except we add/subtract the B value instead of adding/subtracting 1 for the count • Additionally we maintain the count in order to detect groups with no tuples. Such groups are deleted from v • Cannot simply test for sum = 0 (why?) • To handle the case of avg, we maintain the sum and count aggregate values separately, and divide at the end

Aggregate Operations • min, max: v = Agmin (B) (r). • Handling insertions on r is straightforward. • Maintaining the aggregate values min and max on deletions may be more expensive.We have to look at the other tuples of r that are in the same group to find the new minimum

TEST IDNO Marks T1 T1 T2 T2 T3 A-102 A-103 A-102 A-103 A-104 15 20 25 10 10 TEST Min(marks) T1 T2 15 10 Example Testg Min(marks) (student_record))

VIEWS

VIEWS

Presentation Transcript

Materialized Views

Views

XML Views & Reasoning about Views

Views

Views

PowerPoint Views

Views

Basic Views

Different Views

Multiple Views

Views

Views

Views

VIEWS

Views

Views

Views

Views

Views

Views

VIEWS

VIEWS

Presentation Transcript

Materialized Views

Views

XML Views &amp; Reasoning about Views

Views

Views

PowerPoint Views

Views

Basic Views

Different Views

Multiple Views

Views

Views

Views

VIEWS

Views

Views

Views

Views

Views

Views

XML Views & Reasoning about Views