220 likes | 308 Views
Semantic Data Caching and Replacement. Based on the talk by Kunhao Zhou about the paper by: Shaul Dar, Michael J. Frankin, Bjorn T. Jonsson, Divesh Srivastava, Michael Tan. Proceedings of the 22 nd VLDB Conferences Mumbai (Bombay), India, 1996. Outline. Motivation
E N D
Semantic Data Caching and Replacement Based on the talk by Kunhao Zhou about the paper by: Shaul Dar, Michael J. Frankin, Bjorn T. Jonsson, Divesh Srivastava, Michael Tan Proceedings of the 22nd VLDB Conferences Mumbai (Bombay), India, 1996
Outline • Motivation • Client Caching Architecture • Model of Semantic Caching • Simulations and Results • Conclusion and Future Work
Motivation • Distributed database • Clients are high-end workstations (fat client) • High computational power. • Big local storage
Motivation (Contd.) • Effective use of a client is the key to achieving high performance. • Less network traffic. • Faster response time. • Higher server throughput. • Better scalability.
Client Caching Architecture • Data-Shipping. • Client process query. • Data is brought on-demand from servers. • Navigational access. • Object ID (Tuple ID or Page ID). • Can be categorized as tuple-based or page-based • Cache Replacement Policies: • LRU. • MRU.
Client Caching Architecture (Contd.) • Data-Shipping. • Problem. • Applications require associative access to data, that is, as provided by relational query languages.
Client Caching Architecture (Contd.) • Query-Shipping. • Associative access to data. • Problems. • Implementations do not support client caching. (No caching).
Client Caching Architecture (Contd.) • Semantic Caching. • A model that integrates support for associative access into an architecture based on data-shipping. • Advantage. • Exploit the semantic information to effectively manage client cache.
Client Caching Architecture (Contd.) • Semantic Caching. • Semantic description of the data rather than use record-id or page-id. • Can be used to generate remainder query to send to server if the requested tuples are not available locally. • Information for replacement is maintained as semantic regions. • Low overhead, insensitive to bad clustering. • Cache replacement use value function based on semantic description. Not just LRU or MRU.
Model of Semantic Caching • Remainder Query • Semantic Regions • Replacement Issues
Remainder Query • Relation Re, query Q, client cache V. • Probe query P(Q,V) = Q ÙV can be answered locally. • Remainder query R(Q,V) = QÙ(ØV) should be sent to the server. • Example: • Select * from E where. salary< 60,000 and salary >30,000. • Client cache all the tuples, which salary < 50,000. Q = (salary< 60,000 ) Ù (salary >30,000). V = (salary <50,000). P = (salary<50,000) Ù(salary >30,000). R = (salary>=50,000) Ù(salary< 60,000 ). P R Re V Q
Semantic Regions • Cache management and replacement unit. • Grouped by semantic value. Each semantic region has a single replacement value. • Described by a constrained formula. • Consideration: • Semantic region merge. (a)Original regions (a)Regions after Q
Semantic Regions • Cache management and replacement unit. • Grouped by semantic value. Each semantic region has a single replacement value. • Described by a constrained formula. • Consideration: • Semantic region merge.(always merge) (a)Original regions (a)Regions after Q
Replacement Issues • Temporal locality • LRU, MRU
Replacement Issues (Contd.) • Semantic locality • Manhattan distance (Note) Manhattan distance Definition: The distance between two points measured along axes at right angles. In a plane with p1 at (x1, y1) and p2 at (x2, y2), it is |x1 - x2| + |y1 - y2|. O p1 O O o p2 | p1 p2 | = |p2O | + |p1O |
Simulation and Result Relation has three candidate keys, Unique2 is indexed and clustered, Unique1 is indexed and unclustered, Unique3 is unindexed and unclustered.
Simulation and Result (Contd.) • Unique2 (Clustered Index). • Performance: • Almost the same. • Page-based is slightly better. • Reason: • Page-based overhead is smaller.
Simulation and Result (Contd.) • Unique1(Unclustered Index). • Performance: • Tuple-based and semantic-based. are much better. • Reason: • Page-based is sensitive to clustered.
Simulation and Result (Contd.) • Unique3(UnIndexed and Unclustered). • Performance: • Semantic-based is better. • Reason: • Remainder enables client and server. process query in parallel.
Simulation and Result (Contd.) • Semantic locality / Manhattan distance on Unique1. • Performance: • Manhattan distance is better than LRU. • Reason: • “Cold regions” will be replaced faster.
Conclusion and Future Work • Conclusion. • A simple model with selection query, semantic caching provides better performance. • Future work. • Implementation issues for complex query, update, deletion, and insertion: • Concurrency control. • Consistency. • Completeness. • A Predicate-based caching scheme for client-server database architecture. (Arthur M. Keller and Julie Basu)