640 likes | 735 Views
“Good Enough” Database Caching. Hongfei Guo University of Wisconsin-Madison. Motivation — Scaling Google. …. Motivation — Scaling A DBMS By Caching. How to tell whether the cached data is “good enough” for an application? NO data quality requirements from the applications!
E N D
“Good Enough” Database Caching Hongfei Guo University of Wisconsin-Madison
Motivation — Scaling A DBMS By Caching How to tell whether the cached data is “good enough” for an application? • NO data quality requirements from the applications! • NO data quality guarantees from the caching DBMS! Application Server App specific code … Caching DBMS Updates Updates Asynchronous Backend DBMS
Application Server Caching DBMS Backend DBMS The Thesis • Apps: Specifiesdata quality requirements in queries • Cache: Enforces data quality constraint [SIGMOD 2004] [SIGMOD 2004 Demo] • Cache admin: Specify local data quality to be maintained by cache (Data quality-centric database caching model) [TR 2005] [submitted for publication] • Data quality-aware adaptive cache management [ongoing work]
Data Quality Metrics (informal) • Currency:The elapsed time since this copy becomes stale • Consistency: A query result is (snapshot) consistent iff it is as if evaluated from a snapshot of the master database • C&C: Currency & Consistency
Roadmap • Background • Specifying data quality constraints in SQL • Data quality-centric caching model • Enforcing data quality constraints • Other research • Future directions
Specifying Data Quality Constraints in SQL[Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004] • Currency requirements • Consistency requirements • Extend SQL to specify relaxed C&C requirements • Formal semantics of C&C constraints
Currency Requirements Example 1: The caching database keeps BookCopy • Customer A is about to purchase –he wants the data to be exactly current (High data quality is preferred) • Customer B is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred)
Currency Requirements Example 1: The caching database keeps BookCopy • Customer A is about to purchase –he wants the data to be exactly current (High data quality is preferred) • Customer B is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred)
Currency Requirements Example 1: The caching database keeps BookCopy • Customer A is about to purchase –he wants the data to be exactly current (High data quality is preferred) • Customer B is browsing –it is ok if the data is no more than 3 days out of sync (Quick response time is preferred) Different apps may have different currency requirements for the same query
BookCopy bid title author 1 databases Raghu 2 databases Ullman ReviewCopy rid bid text 1 1 … 2 1 … 3 2 … Consistency Requirements Example 2: SELECT * FROM Books B, Reviews R WHERE B.bid = R.bid AND B.title = “Databases” Different apps may have different consistency requirements for the same query The whole query result be consistent Books be consistent & Reviews be consistent Each book be consistent with its reviews
BookCopy bid title author 1 databases Raghu 2 databases Ullman ReviewCopy rid bid text 1 1 … 2 1 … 3 2 … Proposed SQL Syntax SELECT * FROM Books B, Reviews R WHERE B.bid = R.bid AND B.title = “Databases“ Consistency class Currency bound Group by CURRENCY BOUND 10 min ON (B, R) BY B.bid CURRENCY BOUND 10 min ON (B, R) CURRENCY BOUND 10 min ON (B), 30 min ON (R)
Specifying Data quality Constraints in SQL: Contributions • Extend SQL to express C&C constraints • Single-block queries • Multi-block (i.e., nested) queries • Timeline constraint • Formal semantics of C&C constraints Provides correctness standard for using replicated or cached data
Roadmap • Background • Specifying data quality constraints in SQL • Data quality-centric caching model • Enforcing data quality constraints • Other Research • Future directions
Data Quality-Centric Caching Model[Guo, Larson and Ramakrishnan, submitted] • Cache data quality properties • Cache property specification • Maintenance and “safety”
Why Define Cache Properties? Query processing Cache Properties (= contract) Cache maintenance
Cache Properties (P+3C) • Presence — per object • Consistency — a set of objects • Completeness — per predicate • Currency — object staleness
View 1 View 2 View 3 Basic Concepts Tables Object H1 Snapshots Master Database Cache H2
View 1 View 2 View 3 Cache Property Examples Currency = now – stale point Consistent Complete Present H1 Master Database Stale point Cache H2
Specifying Cache Properties • Specified as integrity constraints • Presence constraint • Consistency constraint • Completeness constraint • Presence correlation constraint • Consistency correlation constraint
Presence Constraint AuthorCopy: 1 Alice Madison 2 Bob Madison 3 Cedric Seattle Backend DBMS AuthorList_PCT: 1 2 3 Caching DBMS
Presence Constraint Partially materialized view [Zhou et al 2005] AuthorCopy: CREATE VIEW AuthorCopy AS SELECT * FROM Authors CREATE TABLE AuthorList_PCT (authorId int) ALTER VIEW AuthorCopy ADD ON authorId IN (SELECT authorId FROM authorId_PCT 1 Alice Madison 2 Bob Madison 3 Cedric Seattle control-key control-table AuthorList_PCT: PRESENCE 1 2 3
Consistency Constraint AuthorCopy: Cache Region CREATE TABLE CityList_CsCT (city string) ALTER VIEW AuthorCopy ADD ON city IN (SELECT city FROM cityList_CsCT 1 Alice Madison 2 Bob Madison 3 Cedric Seattle Backend DBMS CityList_CsCT: AuthorList_PCT: AuthorList_PCT: Consistency Madison 1 2 3 1 2 3
Completeness Constraint AuthorCopy: CREATE TABLE CityList_CpCT (city string) ALTER VIEW AuthorCopy ADD ON city IN (SELECT city FROM cityList_CsCT 1 Alice Madison 2 Bob Madison 3 Cedric Seattle Backend DBMS AuthorList_PCT: CityList_CpCT: AuthorList_PCT: Completeness Madison 1 3 1 3
Presence Correlation Constraint AuthorList_PCT: AuthorCopy: 1 Alice Madison 2 Bob Madison 3 Cedric Seattle authorId 1 2 3 authorId BookCopy: Backend DBMS 111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3 eee ALTER VIEW BookCopy ADD PRESENCE ON authorId IN (SELECT authorId FROM AuthorCopy)
Presence Correlation Constraint AuthorList_PCT: AuthorCopy: 1 Alice Madison 2 Bob Madison 3 Cedric Seattle authorId 1 2 3 authorId BookCopy: AuthorList_PCT 111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3 eee authorId AuthorCopy authorId BookCopy
Consistency Correlation Constraint AuthorList_PCT: AuthorCopy: 1 Alice Madison 2 Bob Madison 3 Cedric Seattle authorId 1 2 3 authorId BookCopy: Backend DBMS 111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3 eee ALTER VIEW BookCopy ADD CONSISTENCY ROOT
Consistency Correlation Constraint AuthorList_PCT: AuthorCopy: 1 Alice Madison 2 Bob Madison 3 Cedric Seattle authorId 1 2 3 authorId BookCopy: AuthorList_PCT 111 1 aaa 222 1 bbb 333 2 ccc 444 3 ddd 555 3 eee authorId AuthorCopy authorId BookCopy
Cache Schema Example AuthorList_PCT ReviewerList_PCT authorId reviewerId AuthorCopy ReviewerCopy authorId reviewId ReviewCopy BookCopy isbn
Pull-Maintenance • Refresh a region by pulling query results • When refreshing a region, also refresh the affected closure • All overlapping regions • All correlated regions
Pull-Maintenance BookCopy: AuthorList_PCT: 111 1 aaa 222 1 bbb 333 1 ccc 444 3 aaa 555 4 eee 1 3 4 authorId TitleList_CsCT: aaa
Pull-Maintenance AuthorCopy: AuthorList_PCT 1 Alice Madison 3 Cedric Seattle authorId AuthorCopy authorId BookCopy: authorId BookCopy 111 1 aaa 222 1 bbb 333 1 ccc 444 3 aaa 555 3 eee
Inefficient Pulling AuthorCopy: Shared-row problem 1 Alice Madison 3 Cedric Seattle AuthorBookCopy: authorId BookCopy: 1 111 1 222 1 333 3 111 3 555 111 10 aaa 222 20 bbb 333 30 ccc 555 50 eee isbn
Issues • Inefficient pulling: • Calculation of the affected closure requires checking the rows • Efficient pulling: • The affected closure does NOT depend on the instance of a view • Only requires forward pull among correlated views
Definition: (Safe PMV) A partially materialized view V is safe if the following two conditions hold for every instance of the cache that satisfies all integrity constraints: For any pair of regions in V, either they don’t overlap or one is contained in the other. If V is gray, let X denote the set of regions in V defined by presence control-key values. X is a partitioning of V and no pair of regions in X is contained in any one region defined on V. Theoretical Results • Cache schema design rules: • Rule 1: A cache graph is a DAG. • Rule 2: Only red nodes can have independent completeness or consistency control-tables. • Rule 3: Every PMV with more than one parent must be a red circle. • Rule 4: If a PMV has the shared-row problem according to Lemma 5.2, then it cannot be gray. • Rule 5: A PMV cannot have non-compatible control-tables. Syntactically checkable conditions (polynomial) Property for every instance • Theorem: Given a cache schema <W, E>, if it satisfies the design rules, then every PMV in W is safe. Conversely, if the schema violates one of these rules, there is an instance of the cache satisfying all specified integrity constraints in which some PMV is unsafe.
Data Quality-Centric Caching Model: Contributions • Four cache properties • Specifying cache properties • Cache property unit: cache region • Safe views and efficient pulling Provides an abstraction layer (contract) between query processing and cache maintenance
Roadmap • Background • Specifying data quality constraints in SQL • Data quality-centric caching model • Enforcing data quality constraints • Other research • Future directions
Enforcing Data Quality Constraints • Overview • Simple case: View-level consistency [Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004] [Guo, Larson, Ramakrishnan and Goldstein, SIGMOD 2004 Demo] Implemented in MS SQL Server code base • General case: Row-level consistency [Guo, Larson and Ramakrishnan, submitted]
Cache Region Metadata Shadow Databases Heartbeat Tables Backend DBMS Queries with Relaxed C&C Requirements Queries Query Optimizer Execution Engine Local Materialized Views Caching DBMS Results Results Extension to MTCache Framework MTCache Framework [Larson et al. 2004]
Simple Case Assumptions • Fully materialized views • Each view is consistent • Push-based maintenance • E.g., MS replication service
Cache Region Metadata Shadow Databases Heartbeat Tables Backend DBMS Queries with Relaxed C&C Requirements Query Optimizer Execution Engine Local Materialized Views Caching DBMS Results Results Extension to MTCache Framework
V 1 V 3 V2 V 4 V 5 C&C Tracking Mechanism • Consistency tracking cache region (CR) • The unit of update propagation • Data mutually consistent all the time • Properties, e.g., est. delay, est. interval • Currency tracking heartbeat table V 1 V 3 V2 Backend Cache V 4 V 5 CR1: 12: 10 12: 00 12: 30 12: 30 12: 20 12: 00 CR2:
Shadow Databases Backend DBMS Queries with Relaxed C&C Requirements Queries with Relaxed C&C Requirements Currency Region Metadata Query Optimizer Heartbeat Tables Execution Engine Local Materialized Views The best plan that: • Satisfies consistency requirements • Includes run-time currency checking Caching DBMS Results Results Extension to MTCache Framework
Extension to the Optimizer • Compile-time consistency checking • Run-time currency checking • Cost estimation
Consistency Checking • Enforced at optimization time • Immediately prune a sub-plan if it violates consistency constraints Merge join Q1: σ( Books Reviews) CURRENCY 5 ON (Books, Reviews) Local scanReviews Remote queryon Books
Run-time Currency Checking When view V matches expression E E V Currency Guard SwitchUnion Local plan using V Remote plan requesting E • Currency guard: • Check if local view V satisfies currency requirement
Cost Estimation Cost for the SwitchUnion operator: C = p* Clocal + (1- p) * Cremote + Ccg p : probability that the local branch will be used Clocal :cost of execution of the local branch Cremote : cost of execution of the remote branch Ccg : cost of currency checking
Estimating p Compute p from three parameters: f : estimated refresh interval d : estimated minimal delay B : currency bound 0 if B-d ≤ 0, (B-d)/f if 0 < B-d ≤ f, 1 if B-d > f p =
Changing The Assumptions Fully materialized views Consistent views Push-based maintenance Partially materialized views Row-level consistency Pull-based maintenance • More general algorithms • Run-time check for consistency constraints that can not be validated at compile-time
Run-time C&C Checking When view V matches expression E E Currency Guard SwitchUnion Local plan using V Remote plan requesting E • Currency guard: • Check if local view V satisfies currency requirement