200 likes | 402 Views
Form-Based Proxy Caching for Database-Backed Web Sites. Qiong Luo Joint work with Jeffrey F. Naughton University of Wisconsin-Madison. Web Caching for Databases. Goal Proxy caching for db-backed web sites Gap RDBMS answers SQL queries Web caching proxies cache web pages
E N D
Form-Based Proxy Caching for Database-Backed Web Sites Qiong Luo Joint work with Jeffrey F. Naughton University of Wisconsin-Madison
Web Caching for Databases • Goal • Proxy caching for db-backed web sites • Gap • RDBMS answers SQL queries • Web caching proxies cache web pages • Our proposal: a new caching proxy • Query result caching, plus • Query processing Qiong Luo @ VLDB 2001
Outline • Introduction • HTML Forms and Query Templates • Form-Based Active Caching • Experiments • Conclusions Qiong Luo @ VLDB 2001
Web/Application Server Application DB Server Database Database-Backed Web Sites HTML Forms (2) (1) Browser HTTP (3) • Forms allow user input • Queries go through multiple tiers • DB Server is often the bottleneck (4) Qiong Luo @ VLDB 2001
DB Server Application Web/Application Server Database Today’s proxies do URL-matching; We want to add query processing here! Caching Proxies HTML Forms (3) (1) Caching Proxy (2) Browser HTTP HTTP (4) (5) Qiong Luo @ VLDB 2001
Why Proxy Caching? • Flexibility and easy deployment • Don’t change servers or clients • Server workload sharing • On a hit, save all steps on the server • Response time improvement • By sharing server workload, • By bring content closer to users, or • By both Qiong Luo @ VLDB 2001
Exact Match Prior Research: Web Caching • IBM’s Olympics web site [CID99] • Dynamic Content Caching Protocol [SAYZ99] • Web view materialization [LR00] • Caching search engine results [Mar00] • CachePortal project at NEC [CLL+01] • Active Cache Protocol [CZB98] Qiong Luo @ VLDB 2001
Our Focus • Feasibility • How can proxies cache HTML form queries? • Different server collaboration levels • Efficiency • Which caching schemes are more efficient? • Active caching vs. passive caching • Flexibility • How can we still keep things simple and easy? • Declarative specification of form semantics Qiong Luo @ VLDB 2001
Search by: At a browser At the web site db SELECT top50 i_title, i_id, a_fname, a_lname FROM item, author WHERE a_id = i_a_id AND (i_title LIKE '%Java Programming%') ORDER BY i_title Search Request Page An HTML Form Query Java Programming HTML form queries have database semantics. Qiong Luo @ VLDB 2001
Search by: At a browser At the web site db SELECT top50 i_title, i_id, a_fname, a_lname FROM item, author WHERE a_id = i_a_id AND (i_title LIKE '%Network Programming%') ORDER BY i_title Search Request Page Another HTML Form Query Network Programming Queries from the same HTML form have a common structure! Qiong Luo @ VLDB 2001
Search by: Query Template @ proxy SELECT top50 i_title, i_id, a_fname, a_lname FROM item, author WHERE a_id = i_a_id AND (i_title LIKE $search_string) ORDER BY i_title Search Request Page Proxy Side Query Template tpcwSearchForm.html Form = Parameterized Queries (Language Independent) Qiong Luo @ VLDB 2001
Class of Queries Handled SELECT TOP n selection_list FROM target_relations WHERE search_predicate(search_field, $search_string) ANDother_predicates ORDER BY orderby_fields • SPJ queries with • a parameterized search predicate, • an order-by clause, and • a top-n operation. • selection_list search_field orderby_fields Qiong Luo @ VLDB 2001
Remainder Query Q1 Q2 Remainder Predicate Challenges of Form Queries • Unordered domain of search predicates • Much different than range predicates • Top-n operation • Answering subsumed queries needs care. • Remainder queries to the server Q2-Q1 Qiong Luo @ VLDB 2001
Form-Based Active Caching • Keep queries from the same form together • Cache the whole result set, not only top-N • Process a new query at the cache • Exact matches – trivial • Subsumed queries – selections at the cache • Otherwise -- • For collaborating server, send the remainder query • Otherwise, send the original query • Eliminate duplicate tuples in the cache Qiong Luo @ VLDB 2001
Cached Queries Cached Tuples Cached Lexicons Advanced Advanced Java Programming Java Network Programming Java Programming Network Cache Organization Example Predicates on i_title Only i_title shown Index on i_title Java Programming Network Programming Unix Network Programming Unix Form-based Active Cache Organization Qiong Luo @ VLDB 2001
Experiments Overview • Setup • TPC-W book title search workload • Adding overlap in queries • Adding overlap in datasets • Real user trace over real web sites (omitted) Qiong Luo @ VLDB 2001
TPC-W Search Times* • One signature word per tuple • One signature word per query • Five result tuples per query • 10K-query trace • 2K distinct queries • 10K-tuple cache Both caching schemes perform well on TPC-W. *Milliseconds Qiong Luo @ VLDB 2001
Response times of noun traces on 100K TPC-W database 450 400 Time in milliseconds Direct 350 300 PQ 250 AQ0 200 150 100 Noun100 Noun80 Noun60 Noun40 50 0 Adding Overlap in Queries Active outperforms Passive. Qiong Luo @ VLDB 2001
Adding Overlap in Datasets Remainder Predicates can help… Qiong Luo @ VLDB 2001
Conclusions • Form-based proxy caching framework • Enabling declaration of query templates • Answering HTML form-based queries • Caching schemes • Passive caching is sufficient for the TPC-W trace. • Active caching is more promising for other traces. • Full semantic caching is probably not worthwhile. • Each needs different server collaboration level: none some remainder query handling Full paper at http://www.cs.wisc.edu/niagara/ Qiong Luo @ VLDB 2001