Data consistency for Applications using FroNtier/Squid

Data consistency for Applications using FroNtier/Squid Luis Ramos, CERN 3D Meeting, January 2006

Agenda • Frontier Basics • Cache consistency issues with Frontier/Squid • Inconsistency Scenarios • Application Restrictions Summary • Conclusions • Appendix: Invalidation Mechanism Luis Ramos, CERN

Frontier Basics • Frontier servlet generates query results as XML documents from database queries submitted by clients • Frontier Client is an C/C++ API to send requests to the Frontier servlet • FrontierAccess (Frontier POOL “plug-in”) uses Frontier Client to access Frontier servlets In this context, Frontier is a web-based approach for generic DB access Luis Ramos, CERN

Example - Frontier servlet (HTTP) QUERY: http://pcitdb03.cern.ch:8080/Frontier/Frontier?type=frontier_request:1:DEFAULT&encoding=BLOB&p1=...(SQL query encoded in base64) REPLY: Luis Ramos, CERN

Squid • Squid cache servers are placed between clients and the Frontier servlet • Squid caches query results (XML documents) and serves them to clients that ask for exactly the same query Luis Ramos, CERN

Cache Consistency Problem • Squid caches database query results for a fixed time (HTTP TimeToLive) • set by Frontier server (7 days) • time-based cache invalidation • Backend database change • Squid keeps serving stale data to clients Luis Ramos, CERN

Cache Consistency Problem • If tables are created in the database, new queries will refer them and results will not be in cache as tables are new, no problem • If tables are dropped the cached results will be wrong • BUT, if inserts or updates are made in existing tables, cached data in Squids becomes stale! Luis Ramos, CERN

Scenario: CREATE TABLE - OK • Cached query: • Select * from tab1, tab2 where … • Database change • Create table tab3 (…); • New query: • Select * from tab1, tab3 where … • Query not cached, OK Luis Ramos, CERN

Scenario: DROP TABLE - KO • Cached query: • Select * from tab1, tab2 where … • Database change • Drop table tab1 (…); • New query: • Select * from tab1, tab2 where … • If query is cached, KO: wrong result Luis Ramos, CERN

Scenario: INSERT - KO • Cached query: • Select * from tab1, tab2 where … • Database change • insert into tab1 values (…); • New query: • Select * from tab1, tab2 where … • If query is cached, KO: stale data Luis Ramos, CERN

Scenario: UPDATE - KO • Cached query: • Select * from tab1, tab2 where … • Database change • Update tab1 set … where …; • New query: • Select * from tab1, tab2 where … • If query is cached, KO: stale data Luis Ramos, CERN

Scenario: new object OK • Cached query: • Select * from obj, attribs where objs.ID = attribs.OBJ_ID and objs.ID = X • Database change • insert into objs values (Y, …); • insert into attribs values (.., Y, …); • New query: • select * from objs, attribs where objs.ID = attribs.OBJ_ID and objs.ID = Y • Query for object Y is not cached, OK Queries on IDs of static objects, static cache is OK Luis Ramos, CERN

Scenario: new attribute KO • Cached query: • Select * from objs, attribs where objs.ID = attribs.OBJ_ID and objs.ID = X • Database change • insert into attribs values (.., X, …); • New query: • select * from objs, atribs where objs.ID = attribs.OBJ_ID and objs.ID = X • Query for object X might be cached, KO Queries on IDs of non static objects, static cache is KO Luis Ramos, CERN

Restrictions with static cache • Table drops can lead to wrong query results • Data updates can lead to wrong query results • Inserts need special care • ID based queries are OK • Otherwise, KO • when inserting in “attribs”, a force refresh is needed at user application level for queries over “objs” Will user applications respect these restrictions? Luis Ramos, CERN

Present Status - problem • POOL Frontier plug-in has two types of queries: • DB dictionary data and user data • To avoid stale cached data, the plug-in does client side cache refresh for metadata queries Stale data in cache may appear in user data queries Luis Ramos, CERN

Invalidation Mechanism • Build a cache content invalidation mechanism over Squid/Frontier/OracleDB • A way to invalidate cached query results when respective tables are changed • Invalidation mechanism basic steps are: • Detect database changes • Detect which cache content is stale • Send invalidation messages to Squids • Purge cached content in Squids Luis Ramos, CERN

Conclusions • Frontier alone does not grant data consistency • Applications must follow a set of rules to keep data consistency (see slide 14) • Invalidation mechanism could be developed • Some ideas follow in appendix Luis Ramos, CERN

Appendix - Invalidation Steps • 1. Database changes detection • 2. Stale cached queries detection • 3. Invalidation propagation to Squids • 4. Purge cached content in Squids Luis Ramos, CERN

1. Database changes detection • Options: • Database triggers • data manipulation triggers (DML operations) can only be setup on table level (not on database or schema level) • View ALL_TAB_MODIFICATIONS • This view is updated off-line with up to 3 hours delay between table update and registration in all_tab_modifications • Database auditing • AUDIT INSERT TABLE, UPDATE TABLE, DELETE TABLE BY ACCESS WHENEVER SUCCESSFUL; • Oracle Log Miner • More info available and less performance overhead than auditing • Not so simple as DB auditing and implies setup time overhead Luis Ramos, CERN

1. Database changes detection • Database auditing • Simple to configure • Trigger over the table sys.aud$ • Trigger fires a stored procedure to start the invalidation procedure Luis Ramos, CERN

2. Stale cached queries detection • How to find pages to invalidate in Squids given the name of a modified table? • A mapping between tables and queries • Frontier servlet query strings could be modified to ease this mapping • Whenever there’s a query to the servlet it must store the query and the tables somewhere • When a table is modified all queries with that table are invalidated • Danger of invalidating objects that are still valid (over-invalidation) • Invalidation procedure can be tricky (invalidation rules) Luis Ramos, CERN

2. Stale cached queries detection • Logging queries, clients and tables affected • Two logging options: • Log module in Frontier servlet (as a servlet wrapper) • OR • Some script running over Apache logs Luis Ramos, CERN

3. Invalidation propagation to Squids • After having a list of queries to invalidate we need to know: • What caches requested the query? • Easy to register except with hierarchical caches • Where are those caches? • Caches must be registered in server • The cache hierarchy (topology) must be also registered Luis Ramos, CERN

4. Purge cached content in Squids • Two options: • Purge HTTP command • one object at a time • Squid purge tool • regular expressions for purging multiple objects with one command • Performance tests could be done Luis Ramos, CERN

Questions ? Luis Ramos, CERN

Data consistency for Applications using FroNtier/Squid

Data consistency for Applications using FroNtier/Squid

Presentation Transcript