Query Processing for SQL Updates

Query Processing for SQL Updates Galindo-Legaria, Stefani and Waas (Microsoft Corp) Talk for CS 632 by S. Sudarshan

Issues in Efficient Processing of Updates • Checking constraints • primary/unique key • check constraints • referential integrity, esp. with cascading actions • Updating primary and secondary indices • Bulk updates vs. individual updates • Not in this paper • Write optimized index structures • Materialized view maintenance

Modeling Updates • Delta stream • StreamUpdate operator • issues updates to the storage engine • multiple of these can be cascaded or run serially for a single update

General form of update Plan • Read-query provides • for insert/update: values to be inserted/new values for update • for delete/update: locators of tuples to be deleted updated

Index Maintenance • Primary access path • B-tree or heap • locator: uniquely identify row, and allow efficient retrieval • For B-tree: index attributes + uniquifier • For heap: physical locator: RID • Secondary indices (non-clustered) • Store locator of record corresp to primary access path • Optionally: included columns at leaf level • (a.k.a. covering indices) • Also: • computed columns • unique declaration

Index Maintenance • Primary access path updated first • locator needed for secondary indices • old values needed for secondary index update, obtained from primary access path • copied out into delta table (pre-copy) • Per-row vs Per-Index update plans • Per-row: update all indices corresp. to a row, before processing next row • Per-index: use all rows to update an index, then move to next index • Allows sorting of delta table, to reduce access costs • but requires spooling of delta table

Per-Index Maintenance • A single update affects two areas of an index • messes up access order even if delta table is sorted • idea: create two records, one for delete action, one for insert action • called split delta • sort split delta stream by (locator, action) • delete sorts first to avoid spurious uniqueness violations

Split Delta

Choosing an Update Plan • Per-index plan can be more efficient for a large batch • Sorted order, piggy back other steps such as referential integrity checks • But may be less efficient in other cases due to • overhead of more operators for per-index plan • spooling of delta • Wide vs stacked plans • i.e. separate deltas for each secondary index vs. shared delta pipelined through many operators • wide seems better since it can prune unnecessary columns • Cache effects: when you have a choice of index for selection, may be better to use one that is going to be updated

Checking Single Table Constraints • Only those that may be affected by update are checked • exception/rollback on failure

Referential Integrity Constraints • Updates/deletes on referenced side • Inserts/updates on referencing side • Use outerjoin of delta rows with other table • ensure match/lack of match depending on updated side being referencing/referenced • Checks performed after updating the table • But for updates to referenced table, old values must be used • May require index on attributes of referencing table • index on referenced table already present normally

Checking Referential Integrity

Referential Integrity (Cont.) • When checking self-referential integrity constraints, perform all updates done by a DML statement, only then check • Why? • If there is an index on f.k. attrs, integrity check can be combined with index maintenance, since both need same sort order • place ref integrity check operator just above index maintenance • Another optimization: eliminate duplicates

Cascading Updates • E.g. S has f.k. referencing R, on update cascade • Update to p.k. of tuple in R cascades to S • Similar to f.k. checking, but instead of validating, update referencing relation(s) • to do so, create delta table, and use update plan to process delta table

Cascading Updates

Issues not addressed by paper • All optimizations are for a single transaction • What if there is a sequence of small transactions? • Insert/update/delete to large B-tree may require 1 I/O per update to leaf page • even assuming internal pages are in memory • Solution: write-optimized B-trees • collect deltas and apply as a batch • but ensure queries are correctly answered meanwhile, even if more slowly • Several schemes including work at IITB in 1997, and a nice survey paper by Graefe with some neat implementation ideas

Query Processing for SQL Updates

Query Processing for SQL Updates

Presentation Transcript

Parallel Query Processing in SQL Server

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

Query Processing

Query processing

Query Processing

Query Processing

Query Processing

Query Processing