1 / 25

Data consistency for Applications using FroNtier/Squid

Learn about the cache consistency issues with Frontier/Squid and how it affects application data. Understand various inconsistency scenarios and application restrictions. Discover an invalidation mechanism for maintaining data consistency.

sjacqueline
Download Presentation

Data consistency for Applications using FroNtier/Squid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data consistency for Applications using FroNtier/Squid Luis Ramos, CERN 3D Meeting, January 2006

  2. Agenda • Frontier Basics • Cache consistency issues with Frontier/Squid • Inconsistency Scenarios • Application Restrictions Summary • Conclusions • Appendix: Invalidation Mechanism Luis Ramos, CERN

  3. Frontier Basics • Frontier servlet generates query results as XML documents from database queries submitted by clients • Frontier Client is an C/C++ API to send requests to the Frontier servlet • FrontierAccess (Frontier POOL “plug-in”) uses Frontier Client to access Frontier servlets In this context, Frontier is a web-based approach for generic DB access Luis Ramos, CERN

  4. Example - Frontier servlet (HTTP) QUERY: http://pcitdb03.cern.ch:8080/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOB&p1=...(SQL query encoded in base64) REPLY: Luis Ramos, CERN

  5. Squid • Squid cache servers are placed between clients and the Frontier servlet • Squid caches query results (XML documents) and serves them to clients that ask for exactly the same query Luis Ramos, CERN

  6. Cache Consistency Problem • Squid caches database query results for a fixed time (HTTP TimeToLive) • set by Frontier server (7 days) • time-based cache invalidation • Backend database change • Squid keeps serving stale data to clients Luis Ramos, CERN

  7. Cache Consistency Problem • If tables are created in the database, new queries will refer them and results will not be in cache as tables are new, no problem • If tables are dropped the cached results will be wrong • BUT, if inserts or updates are made in existing tables, cached data in Squids becomes stale! Luis Ramos, CERN

  8. Scenario: CREATE TABLE - OK • Cached query: • Select * from tab1, tab2 where … • Database change • Create table tab3 (…); • New query: • Select * from tab1, tab3 where … • Query not cached, OK Luis Ramos, CERN

  9. Scenario: DROP TABLE - KO • Cached query: • Select * from tab1, tab2 where … • Database change • Drop table tab1 (…); • New query: • Select * from tab1, tab2 where … • If query is cached, KO: wrong result Luis Ramos, CERN

  10. Scenario: INSERT - KO • Cached query: • Select * from tab1, tab2 where … • Database change • insert into tab1 values (…); • New query: • Select * from tab1, tab2 where … • If query is cached, KO: stale data Luis Ramos, CERN

  11. Scenario: UPDATE - KO • Cached query: • Select * from tab1, tab2 where … • Database change • Update tab1 set … where …; • New query: • Select * from tab1, tab2 where … • If query is cached, KO: stale data Luis Ramos, CERN

  12. Scenario: new object OK • Cached query: • Select * from obj, attribs where objs.ID = attribs.OBJ_ID and objs.ID = X • Database change • insert into objs values (Y, …); • insert into attribs values (.., Y, …); • New query: • select * from objs, attribs where objs.ID = attribs.OBJ_ID and objs.ID = Y • Query for object Y is not cached, OK Queries on IDs of static objects, static cache is OK Luis Ramos, CERN

  13. Scenario: new attribute KO • Cached query: • Select * from objs, attribs where objs.ID = attribs.OBJ_ID and objs.ID = X • Database change • insert into attribs values (.., X, …); • New query: • select * from objs, atribs where objs.ID = attribs.OBJ_ID and objs.ID = X • Query for object X might be cached, KO Queries on IDs of non static objects, static cache is KO Luis Ramos, CERN

  14. Restrictions with static cache • Table drops can lead to wrong query results • Data updates can lead to wrong query results • Inserts need special care • ID based queries are OK • Otherwise, KO • when inserting in “attribs”, a force refresh is needed at user application level for queries over “objs” Will user applications respect these restrictions? Luis Ramos, CERN

  15. Present Status - problem • POOL Frontier plug-in has two types of queries: • DB dictionary data and user data • To avoid stale cached data, the plug-in does client side cache refresh for metadata queries Stale data in cache may appear in user data queries Luis Ramos, CERN

  16. Invalidation Mechanism • Build a cache content invalidation mechanism over Squid/Frontier/OracleDB • A way to invalidate cached query results when respective tables are changed • Invalidation mechanism basic steps are: • Detect database changes • Detect which cache content is stale • Send invalidation messages to Squids • Purge cached content in Squids Luis Ramos, CERN

  17. Conclusions • Frontier alone does not grant data consistency • Applications must follow a set of rules to keep data consistency (see slide 14) • Invalidation mechanism could be developed • Some ideas follow in appendix Luis Ramos, CERN

  18. Appendix - Invalidation Steps • 1. Database changes detection • 2. Stale cached queries detection • 3. Invalidation propagation to Squids • 4. Purge cached content in Squids Luis Ramos, CERN

  19. 1. Database changes detection • Options: • Database triggers • data manipulation triggers (DML operations) can only be setup on table level (not on database or schema level) • View ALL_TAB_MODIFICATIONS • This view is updated off-line with up to 3 hours delay between table update and registration in all_tab_modifications • Database auditing • AUDIT INSERT TABLE, UPDATE TABLE, DELETE TABLE BY ACCESS WHENEVER SUCCESSFUL; • Oracle Log Miner • More info available and less performance overhead than auditing • Not so simple as DB auditing and implies setup time overhead Luis Ramos, CERN

  20. 1. Database changes detection • Database auditing • Simple to configure • Trigger over the table sys.aud$ • Trigger fires a stored procedure to start the invalidation procedure Luis Ramos, CERN

  21. 2. Stale cached queries detection • How to find pages to invalidate in Squids given the name of a modified table? • A mapping between tables and queries • Frontier servlet query strings could be modified to ease this mapping • Whenever there’s a query to the servlet it must store the query and the tables somewhere • When a table is modified all queries with that table are invalidated • Danger of invalidating objects that are still valid (over-invalidation) • Invalidation procedure can be tricky (invalidation rules) Luis Ramos, CERN

  22. 2. Stale cached queries detection • Logging queries, clients and tables affected • Two logging options: • Log module in Frontier servlet (as a servlet wrapper) • OR • Some script running over Apache logs Luis Ramos, CERN

  23. 3. Invalidation propagation to Squids • After having a list of queries to invalidate we need to know: • What caches requested the query? • Easy to register except with hierarchical caches • Where are those caches? • Caches must be registered in server • The cache hierarchy (topology) must be also registered Luis Ramos, CERN

  24. 4. Purge cached content in Squids • Two options: • Purge HTTP command • one object at a time • Squid purge tool • regular expressions for purging multiple objects with one command • Performance tests could be done Luis Ramos, CERN

  25. Questions ? Luis Ramos, CERN

More Related