160 likes | 230 Views
Query Processing for SQL Updates. Galindo-Legaria, Stefani and Waas (Microsoft Corp) Talk for CS 632 by S. Sudarshan. Issues in Efficient Processing of Updates. Checking constraints primary/unique key check constraints referential integrity, esp. with cascading actions
E N D
Query Processing for SQL Updates Galindo-Legaria, Stefani and Waas (Microsoft Corp) Talk for CS 632 by S. Sudarshan
Issues in Efficient Processing of Updates • Checking constraints • primary/unique key • check constraints • referential integrity, esp. with cascading actions • Updating primary and secondary indices • Bulk updates vs. individual updates • Not in this paper • Write optimized index structures • Materialized view maintenance
Modeling Updates • Delta stream • StreamUpdate operator • issues updates to the storage engine • multiple of these can be cascaded or run serially for a single update
General form of update Plan • Read-query provides • for insert/update: values to be inserted/new values for update • for delete/update: locators of tuples to be deleted updated
Index Maintenance • Primary access path • B-tree or heap • locator: uniquely identify row, and allow efficient retrieval • For B-tree: index attributes + uniquifier • For heap: physical locator: RID • Secondary indices (non-clustered) • Store locator of record corresp to primary access path • Optionally: included columns at leaf level • (a.k.a. covering indices) • Also: • computed columns • unique declaration
Index Maintenance • Primary access path updated first • locator needed for secondary indices • old values needed for secondary index update, obtained from primary access path • copied out into delta table (pre-copy) • Per-row vs Per-Index update plans • Per-row: update all indices corresp. to a row, before processing next row • Per-index: use all rows to update an index, then move to next index • Allows sorting of delta table, to reduce access costs • but requires spooling of delta table
Per-Index Maintenance • A single update affects two areas of an index • messes up access order even if delta table is sorted • idea: create two records, one for delete action, one for insert action • called split delta • sort split delta stream by (locator, action) • delete sorts first to avoid spurious uniqueness violations
Choosing an Update Plan • Per-index plan can be more efficient for a large batch • Sorted order, piggy back other steps such as referential integrity checks • But may be less efficient in other cases due to • overhead of more operators for per-index plan • spooling of delta • Wide vs stacked plans • i.e. separate deltas for each secondary index vs. shared delta pipelined through many operators • wide seems better since it can prune unnecessary columns • Cache effects: when you have a choice of index for selection, may be better to use one that is going to be updated
Checking Single Table Constraints • Only those that may be affected by update are checked • exception/rollback on failure
Referential Integrity Constraints • Updates/deletes on referenced side • Inserts/updates on referencing side • Use outerjoin of delta rows with other table • ensure match/lack of match depending on updated side being referencing/referenced • Checks performed after updating the table • But for updates to referenced table, old values must be used • May require index on attributes of referencing table • index on referenced table already present normally
Referential Integrity (Cont.) • When checking self-referential integrity constraints, perform all updates done by a DML statement, only then check • Why? • If there is an index on f.k. attrs, integrity check can be combined with index maintenance, since both need same sort order • place ref integrity check operator just above index maintenance • Another optimization: eliminate duplicates
Cascading Updates • E.g. S has f.k. referencing R, on update cascade • Update to p.k. of tuple in R cascades to S • Similar to f.k. checking, but instead of validating, update referencing relation(s) • to do so, create delta table, and use update plan to process delta table
Issues not addressed by paper • All optimizations are for a single transaction • What if there is a sequence of small transactions? • Insert/update/delete to large B-tree may require 1 I/O per update to leaf page • even assuming internal pages are in memory • Solution: write-optimized B-trees • collect deltas and apply as a batch • but ensure queries are correctly answered meanwhile, even if more slowly • Several schemes including work at IITB in 1997, and a nice survey paper by Graefe with some neat implementation ideas