390 likes | 500 Views
Materialized View Selection for XQuery Workloads. Asterios Katsifodimos 1 , Ioana Manolescu 1 & Vasilis Vassalos 2 1 Inria Saclay & Université Paris- Sud , 2 Athens University of Economics and Business. Athens University of Economics and Business. View selection in XML databases.
E N D
Materialized View Selection for XQuery Workloads Asterios Katsifodimos1, Ioana Manolescu1 & VasilisVassalos21Inria Saclay & Université Paris-Sud, 2Athens University of Economics and Business Athens University of Economics and Business
View selection in XML databases Problem definition • Find a set of materialized views that minimizes workload evaluation costsnot exceeding a space budget. Materialized View Selection for XQuery Workloads
Materialized View Selection for XQuery Workloads Contributions Materialized View Selection for XQuery Workloads • View selection for multiple-views XQuery rewriting • Rich subset of XQuery • Tree patterns with multiple return nodes and value joins • We provide • Candidate view pruning methods • View selection algorithms: • Utility-Based Greedy (UDG) • Reduce-Optimize Algorithm (ROA) • Extensive experimental evaluation • Outperforming & extending state-of-the-art works
Outline Materialized View Selection for XQuery Workloads The View Selection Problem View Language & Candidate Views View Selection Algorithms Related Work & Experimentation
Query and view language Anatomy of a query ID of book ID book paper textcont author conference[=“SIGMOD”] authorval cont=subtree of the text element Value-join • Return the ID of every bookalong with its text and authorif the book author has a paper in the SIGMOD conference. Materialized View Selection for XQuery Workloads
Candidate Views Rewriting Query Candidate Views Example: PROJECT [textcont, authorval] JOIN [v1.authorID>v2.bookID] book SCAN(v1) SCAN(v2) v1 v2 textcont authorval authorID,val bookID • Candidate views: views that can participate in a rewriting of a query. • Property: candidate views are exactly those embeddable in a query. textID,cont Materialized View Selection for XQuery Workloads
Candidate Views Materialized View Selection for XQuery Workloads • Number of candidate views • For query of m value joins and k tree patterns: • Early pruning is needed • Rules of thumb for pruning • Drop all views that can be replaced by others • Views should not store anything extraneous • Challenge: remove maximum number of views • Preserve low cost and/or small size rewriting possibilities.
Candidate Views Pruning techniques Query Candidate Views • Do not store unnecessary data • i.e. useless cont, val or //-axis • Avoid expensive rewritings • Save space • Annotate all nodes with ID • Maximize rewriting opportunities book authorval textcont v1‘ v1 v2 v3 v2‘ v3‘ authorID, cont authorID,val bookID book bookID bookID authorID authorID textID,cont textID,cont Materialized View Selection for XQuery Workloads
Outline Materialized View Selection for XQuery Workloads The View Selection Problem View Language & Candidate Views View Selection Algorithms Related Work & Experimentation
Materializing a set of views View set benefit Materialized View Selection for XQuery Workloads • Benefit of materializing a set of views benefit (V, Q)=(cost of evaluating Q over D) – (cost of evaluating Q over V) • Computation of benefit requires invoking rewriting algorithm • Expensive! • Space occupancy of a view set V • Total size (in bytes)
View Selection Algorithms Knapsack-inspired view selection Materialized View Selection for XQuery Workloads • High similarity with the classic 0-1 knapsack problem • Typical element of the greedy algorithms for knapsack: utility(v,Q)=benefit({v} U V, Q)/size(v)
View Selection Algorithms S=12 Utility-Driven Greedy (UDG) Algorithm • Enumerate candidate views • Compute view utilities • Order views by utility • Select the view of largest utility fitting in budget • Repeat 2-4 until budget exhausted U=10 S=7 U=60S=5 U=50 S=4 U=8 S=2 Candidate Views Space Budget U=Utility(=benefit/size) S=Space occupancy Materialized View Selection for XQuery Workloads
View Selection Algorithms S=12 Utility-Driven Greedy (UDG) Algorithm U=12 S=7 U=40S=5 U=64 S=4 U=9 S=2 Candidate Views Space Budget U=Utility(=benefit/size) S=Space occupancy Materialized View Selection for XQuery Workloads Enumerate candidates Compute utilities Order by utility Select the view of largest utility fitting in budget Repeat 2-4 until budget exhausted
View Selection Algorithms S=12 Utility-Driven Greedy (UDG) Algorithm U=64 S=4 • Enumerate candidates • Compute utilities • Order by utility • Select the view of largest utility fitting in budget • Repeat 2-4 until budget exhausted • Greedy algorithms for knapsack not a perfect fit for our problem • Utility of a view • may change after every round • depends on other views already selected U=13 S=7 U=10S=5 U=4 S=2 Candidate Views Space Budget U=Utility(=benefit/size) S=Space occupancy Materialized View Selection for XQuery Workloads
View Selection Algorithms State space search (state=candidate view set) Initial state: query workload transform(S1)S8 Best state: largest benefit under space budget Materialized View Selection for XQuery Workloads
View Selection Algorithms State Transformations: Break, Join, Generalize, Adapt book paper conference[=“SIGMOD”] textcont author authorval Materialized View Selection for XQuery Workloads View Break: break a view in smaller parts Reveals common sub-expressions of views Can reduce or increase space occupancy Increases query evaluation costs
View Selection Algorithms State Transformations: Break, Join, Generalize, Adapt book paper conference[=“SIGMOD”] textcont author authorval ID val ,ID Materialized View Selection for XQuery Workloads Join: opposite to Break, join two views into one Reduces evaluation costs Joined views can be smaller in size
View Selection Algorithms State Transformations: Break, Join, Generalize, Adapt paper conference author book cont textcont authorval val val [=“SIGMOD”] Materialized View Selection for XQuery Workloads Generalize: generalization/relaxation of a view Reveals common sub-expressions of views Can reduce or increase space occupancy Increases query evaluation costs
View Selection Algorithms State Transformations: Break, Join, Generalize, Adapt paper conference author book cont text author [=“SIGMOD”] val • Break, Join, Generalize, Adapt • Allow to generate all states • Guaranteed not to generate pruned views Materialized View Selection for XQuery Workloads • Adapt: specialization of views by 1. Conversion of //-axis to /-axis 2. Addition of existential nodes • Reduces evaluation costs • “Adapted” views can be smaller in size
View Selection Algorithms The Reduce-Optimize algorithm (ROA) Reduce Jump Optimize Materialized View Selection for XQuery Workloads • Huge number of states • Call rewriting algorithm after every state transition • Need for heuristics • Proposal: heuristic three-phase algorithm ROA
View Selection Algorithms The Reduce-Optimize algorithm (ROA) Intermediary State Best State Solution Revisited State Space Occupancy Space Budget Time Benefit Time Reduce Reduce Reduce ... Optimize Jump Optimize Materialized View Selection for XQuery Workloads
View Selection Algorithms Reducing ROA search time - heuristics Materialized View Selection for XQuery Workloads • Some transitions may apply several transformations at once • Stop the rewriting algorithm early • After k rewritings found or • At a timeout • Consider only the lowest cost rewritings
Outline Materialized View Selection for XQuery Workloads The View Selection Problem View Language & Candidate Views View Selection Algorithms Related Work & Experimentation
Related Work Materialized View Selection for XQuery Workloads
Experimental Evaluation Settings • Space budget • S=size(Q) • Tested space budgets: • S, S/2, S/4, S/6 • Algorithms • UDG and ROA • Competitors: • [Mandhani& Suciu VLDB05] • [Tang et al. DASFAA09] • Implementation • ViP2P*, Java *http://vip2p.saclay.inria.fr Materialized View Selection for XQuery Workloads • Queries • Workloads • Tree patterns: Q1(14), Q2 (50), Q3(100) • Tree patterns + joins: Q4 (50), 20% joins • Query Selectivity • ⅓ low, ⅓ medium, ⅓ high • Database: • 1GB XMark (10x100MB documents)
Experimental Evaluation Workload Evaluation Time of Q1 (14 queries) Hit Ratio Evaluation time versus docs Materialized View Selection for XQuery Workloads
Experimental Evaluation Evaluation Time & hit ratio for Q3 (100 queries) Hit Ratio Evaluation time versus docs Materialized View Selection for XQuery Workloads
Experimental Evaluation ROA evaluation for Q4 (50 queries, 20% value-joined) Materialized View Selection for XQuery Workloads
Conclusions Materialized View Selection for XQuery Workloads • Automatic selection of XQuery views for multiple-views rewritings • Reduction of candidate views • By orders of magnitude • ROA performs better than related work • Scales and manages to find good solutions relatively fast • 80% of the benefits attained in ~2 minutes • Maximum benefit attained within 25 minutes. • Algorithms of [Tang et. al. DASFAA09] did not scale beyond 14 queries • Utility Drive Greedy (UDG) did not scale beyond 50 queries
Thank you Questions? ?
BACKUP Materialized View Selection for XQuery Workloads
Cost of algebraic plans Estimating the evaluation cost of a rewriting Data Statistics • DataGuideof every document • Enriched with information: • # of instances of a path • Average path val size (bytes) • Average path cont size (bytes) • Distinct values of each path • Used to estimate • Cardinality & size of a view Materialized View Selection for XQuery Workloads Algebraic Plan cost • Execution cost of an operator has • ACPU execution cost and • An IO cost • Both depend on input • Evaluation cost of a plan: • Calculated bottom-up
Cost of algebraic plans Cost estimation example OUTPUT=25 IO=600 | CPU=320+25 PROJECT [textcont, authorval] JOIN [v1.author=v2.author] OUTPUT=25 (50*5*0.1) IO=500+100 | CPU=70+50*5 SCAN(v1) OUTPUT=50 OUTPUT=5 IO= 500 | CPU=50 IO=100 | CPU=10+10 SELECT [conference=“SIGMOD”] SCAN(v2) OUTPUT=10 IO=100 | CPU=10 Materialized View Selection for XQuery Workloads
Experimental Evaluation ROA time to attain increasing benefits (minutes) Materialized View Selection for XQuery Workloads
Experimental Evaluation Candidate views pruning Materialized View Selection for XQuery Workloads
Candidate Views Size of the set of candidate views for a tree pattern Example: q=/a/bval/c Combinations of nodes of q: ({a},{b},{c},{a,b},{a,c},{a,b,c}) Edge combinations: how to connect nodes with (/, //) e.g. /a/b, //a/b, /a//b, //a//b}. There are 12 return node variations for each node in a pattern e.g. (aID,cont,aval,aID,val…) Materialized View Selection for XQuery Workloads The cardinality of the set of candidate views of a tree pattern query q of |q| nodes is bounded by:
Candidate Views Size of the set of candidate views for a joined pattern Value join combinations Number of views resulting from all possible cartesian products of k tree patterns Materialized View Selection for XQuery Workloads • Given a joined pattern q with: • k tree patterns and • m value-joins • The candidate view set size of q is bounded by:
View Selection Algorithms Benefit of materializing a set of views Cost of evaluating query q given the set of materialized views V Cost of evaluating query q from the documents Frequency of query q Materialized View Selection for XQuery Workloads • The benefit of materializing a view set V is • The difference in cost of evaluating the workload over V • vs. evaluating from the documents