300 likes | 448 Views
Cache Tables: Paving the way for an Adaptive Database Cache. Mehmet Altınel, Christof Bornhövd, C. Mohan, Hamid Pirahesh, Berthold Reinwald (IBM Almaden Research Center) Sailesh Krishnamurthy (Computer Science Division,UC Berkeley) Presented by: Umar Farooq Minhas October 04, 2006.
E N D
Cache Tables: Paving the way for an Adaptive Database Cache Mehmet Altınel, Christof Bornhövd, C. Mohan, Hamid Pirahesh, Berthold Reinwald (IBM Almaden Research Center) Sailesh Krishnamurthy (Computer Science Division,UC Berkeley) Presented by: Umar Farooq Minhas October 04, 2006
Motivation • Issues • Response time • Scalability • Wide-spread use of Transactional Web Applications (TWA) in enterprise applications • Broad range of components e.g. network load balancers, HTTP servers, application servers, … , databases etc. • Solutions • Caching of static HTML pages • Multiple level caches
Motivation contd.. • Static Caching, Drawbacks • TWAs tend to be more & more dynamic • High volumes of data • Highly personalized contents • Run business logic in remote application servers close to end users • Reduced response time • Reduced load on in-house systems • Benefits are limited by the frequency with which remote server needs to access backend DB • Proposed Solution: DBCache • Allows DB caching at mid-tier nodes, remote data centers and edge servers
DBCache: Overview • Built using full-fledged DBMS, DB2 • Reduced development effort • Allows caching of related DB objects • Triggers, constraints, indices , stored procedures, … • Makes use of existing distributed query execution • Provides cache transparency • Supports both full-table and partial-table caching • On-demand caching • Adapts to dynamically changing loads • Exploits typical characteristics of TWA queries
DBCache: Contributions • Database cache model • Introduces a new DB object ‘Cache Table’ • Dynamic/static caching support • Novel query re-write scheme • Cache load and maintenance mechanisms
Outline • Motivation • DBCache: Overview • Cache Tables • Dynamic Cache Model • Query Compilation • Cache Table Population and Maintenance • Performance Evaluation • Conclusions & Future Work • Discussion
Cache Tables • A Cache Table is a database object by which an end user can specify that a table (cache table) in a database (cache database) is a cache of a table (backend table) in another database (backend database) Back end Table Cache Table Backend DB Cache DB • Two types of cache tables supported: • Declarative/Static Cache tables • Dynamic Cache tables
Declarative/Static Cache Tables • When table contents static and known upfront • Use declarative cache tables • Similar to materialized views • Entire table cached in absence of predicate definition • Exploits existing materialized view support in DB2
Dynamic Cache Tables • Populated on-demand • Provides adaptability • Can choose to cache only “hot” items
DBCache Schema Setup • Cache schema exact mirror of backend DB schema • Each backend DB table represented by • Cache Table or • Nickname (caching disabled) • Requires no change in existing queries • Allows caching of other relevant logical and physical objects
Outline • Motivation • DBCache: Overview • Cache Tables • Dynamic Cache Model • Query Compilation • Cache Table Population and Maintenance • Performance Evaluation • Conclusions & Future Work • Discussion
Dynamic Cache Model • Key concepts • Cache Keys • Defined on cache table column • Can be non-unique • Must be ‘domain-complete’ • Unique/Primary key columns complete by definition • Guarantees correctness of equality predicates
Dynamic Cache Model • Key concepts contd.. • Referential Cache Constraints (RCCs) • Defined between any cols of two cache tables • Creates a cache-parent/cache-child relationship • Guarantees the correctness of equi-join predicates • Somewhat similar to referential integrity constraints
Dynamic Cache Model • Key concepts contd.. • Cache Groups • Set of related cache tables whose content is (directly or transitively) populated by the values of one or more cache keys of a single cache table, called the root table. • Tables reachable by RCC constraints from the root table are called member tables • Advantages • Application context recognized more easily • Helps avoiding conflicting cache constraints
Dynamic Cache Model • Key concepts contd.. • Cache Groups contd.. • Represented by a directed graph called cache group graph, nodes denote cache tables and edges denote RCCs • Direction of an edge for RCC is from a cache-parent to a cache-child • Bi-directional edges possible • Two or more groups can be overlapping • Captured in connectivity graphs
Dynamic Cache Model • Issues with Cache Constraints • Can cause unexpected cache loads resulting in a phenomena called recursive cache load problem • A cache group is called safe if it avoids this problem • How to ensure group safety ?
Dynamic Cache Model • Rules for cache group safety • Rule-1: A cache group graph must not include any heterogeneous cycles. • Rule-2: A cache table must not have more than one non-unique domain-complete column. • A new cache constraint is created only if it doesn’t violate Rule 1 and Rule 2.
Outline • Motivation • DBCache: Overview • Cache Tables • Dynamic Cache Model • Query Compilation • Cache Table Population and Maintenance • Performance Evaluation • Conclusions & Future Work • Discussion
Query Compilation • Declarative Cache Tables • Existing materialized view matching mechanism in DB2 is exploited • Name switching • Dynamic Cache Tables • Generate two plans local plan and remote plan • Choose at run-time through a switch operator which uses the probe query to decide which leg to execute • Janus (two-headed) plan: derived from Roman Mythology • God of gates, doors, doorways, beginnings and endings. Month of January ? http://en.wikipedia.org/wiki/Janus_%28mythology%29
Query Compilation • Constructing a Janus Plan: 1 Initial Query Plan Remote Query Plan Replace Cache Table names with Nicknames 2 Generate a probe query by checking all equality predicates that can potentially participate in probe query condition if none found then ABORT ( remote query plan gets executed ) 3 Cloned Input Query Graph Local Query Plan Replace Nicknames with eligible Cache Table names from step - 2 4 Insert switch operator on top of remote, local and probe query plans
Outline • Motivation • DBCache: Overview • Cache Tables • Dynamic Cache Model • Query Compilation • Cache Table Population and Maintenance • Performance Evaluation • Conclusions & Future Work • Discussion
Cache Table Population & Maintenance • Declarative Cache Tables • Relies on DPropR utility: IBM’s asynchronous data replication tool • Dynamic Cache Tables • On-demand loading • Cache key values failing probe query are used to extract data • Extracted data populated asynchronously by a cache daemon • Cache invalidation • Generate invalidation messages and send to cache daemon • Cache daemon generates and executes deletes against cacheDB • Updated rows get loaded with new requests
Outline • Motivation • DBCache: Overview • Cache Tables • Dynamic Cache Model • Query Compilation • Cache Table Population and Maintenance • Performance Evaluation • Conclusions & Future Work • Discussion
Performance Evaluation • Focus: Evaluate overhead of Janus plans for dynamic tables • Overhead of probe query and switch operator • Overhead of on-demand loading • Experimental settings
Performance Evaluation • Cache Hit Case • Janus plan vs. pure local queries • Difference gives the overhead for probe query and the switch operator • Cache table loaded with all the data from backend table
Performance Evaluation • Cache Miss Case • Janus plan vs. pure remote queries • Difference gives the overhead • Cache table initially empty
Outline • Motivation • DBCache: Overview • Cache Tables • Dynamic Cache Model • Query Compilation • Cache Table Population and Maintenance • Performance Evaluation • Conclusions & Future Work • Discussion
Conclusions & Future Work • Significant contributions • Provides a new frame-work to implement DB caching for TWAs and tends to provide: • Seamless integration with current applications • Supports static/dynamic cache tables • Adapts to the changing workloads in TWAs • Re-uses the functionality of a full-fledged DBMS i.e. DB2 • What next ? • Provide efficient, scalable, zero-admin DBCache • Development of new tools to ease deployment • Improve adaptability and maintenance
Comparison vs. amco05: • Relies on asynchronous data propagation utility • Not completely transparent • May not work for heterogeneous DBMSs • Allows stale data vs. gula04: • Cache constraints against C&C constraints • Doesn’t provide any guarantees of freshness/consistency • Relatively more transparent • Maintenance-centric vs. query-centric • Both deployed as mid-tier level caches • Both use a full-fledged DBMS • Both use Materialized views • Both use two-headed query plans
Discussion • Is it really that good ? • Using full-fledged DBMS at each middle-tier node, drawbacks ? • How is data freshness specified/guaranteed ? • Is it adaptable ? Weakly ? Strongly ? • When can cache constraints become bottleneck ? • Size of dynamic cache tables ? • Cache replacement policies/cleansing mechanisms? • Caching of other physical & logical DB Objects ? • Updates to those objects in backend DB? • Message traffic between Cache Daemon & Backend DB ? • Very frequent updates in backend DB • Local updates ? • Flaws in performance evaluation ?