230 likes | 504 Views
Self-Tuning Database systems. Wang Haocong Jan 8, 2009. Tuning databases. Logical database design Physical database design (indexes) Memory, disks Materialized Views Partitioning. Self-Tuning Systems. Databases are complicated! Schema design is hard Lots of “knobs” to tweak
E N D
Self-Tuning Database systems Wang Haocong Jan 8, 2009
Tuning databases • Logical database design • Physical database design (indexes) • Memory, disks • Materialized Views • Partitioning
Self-Tuning Systems • Databases are complicated! • Schema design is hard • Lots of “knobs” to tweak • Need appropriate information • Does the DB approach give us more ability to “self-tune” than some other approach (e.g., Java)?
What Would We Like to Auto-Tune? • Query optimization – statistics, bad decisions, … • The schema itself? • Indices • Auxiliary materialized views • Data partitioning • Perhaps logging?
What Are The Challenges in Building Adaptive Systems? • Really, a generalization of those in adaptive query processing • Information gathering – how do we get it? • Extrapolating – how do we do this accurately and efficiently? • Sampling or piloting • Minimizing the impact of mistakes if they happen • Using app-specific knowledge
Who’s Interested in these Problems? • Oracle: • Materialized view “wizard” • Microsoft “AutoAdmin”: • Index selection, materialized view selection • Stats on materialized views • Database layout • IBM SMART (Self-Managing And Resource Tuning): • Histogram tuning (“LEO” learning optimizer) • Partitioning in clusters • Index selection • Adaptive query processing
A Particular Instance: Microsoft’s Index Tuning Wizard • Why not let the system choose the best index combination(s) for a workload • The basic idea: • Log a whole bunch of queries that are frequently run • See what set of indices is best • Why is this hard? Why not index everything? • Create these indices with little or no human input
Possible Approaches • Obviously: only consider indices that would be useful • The optimizer can “tell” which indices it might use in executing a query • But that continues to be a lot of indices! • Can exhaustively compare all possible indices • Note that indices can interact (esp. for updates) • How do we compare costs and benefits of indices? • Execute for real • Use optimizer cost model with whatever stats we have • Gather some stats (e.g., build histograms, sample) and use cost model
Their Approach in More Detail • For a workload of n queries: • Generate a separate workload with each query • Evaluate the candidate indices for this query to find the best “configuration” – limited to 2 indices, 2 tables, single joins • Candidate index set for workload is the union of all configurations • Too expensive to enumerate all; use a greedy algorithm: • Exhaustively enumerate (using optimizer) best m-index configuration • Pick a new index I to add, which seems to save cost relative to adding some other I’ or to the current cost • Repeat until we’ve added “enough” k indices • “Despite interaction among indices, the largest cost reductions often result from indices that are good candidates by themselves” • They iteratively expand to 2-column indices – index on leading column must be desirable for this to be desirable
Further Enhancements • Use the tool for “what-if” analysis • What if a table grows by a substantial amount? • Supplement with extra info gathered from real query execution • Maybe we can “tweak” estimates for certain selectivities • An attempt to compensate for the “exponential error” problem
Physical tuning tool • Decide when to tune • Decide what “representative” workload • Run the tool and examine the recommended physical design changes • Implement them if appropriate
Alternative Tuning Models • Alerter • When to tune • Light weight tools • Workload as a Sequence • Read/update queries • Create/drop physical structures
Dynamic tuning • Low overhead, not interfere with normal functioning of DBMS • Balance cost of transitioning and potential benefits • Avoid unwanted oscillations
Impact • Tuning Large Workloads • Partition • Sample • Tuning Production Servers • Test server
Future Directions • Ability to compare the quality of automated physical design solutions • Light weight approaches • Machine learning techniques, control theory and online algorithms
Memory tuning in DB2 • Innovative cost-benefit analysis • Simulation technique vs. modeling • Tunes memory distribution and total memory usage • Simple greedy memory tuner • Control algorithms to avoid oscillations • Performs very well in experiments • For both OLTP and DSS
SBPX Operation 9. Stop timer Disk SBPX Buffer Pool 3. Page request for 5. Check SBPX 1. Victimize Page (move to SBPX) 2. Load new page from disk 7. Victimize BP page (send to SBPX) 4. Check Bufferpool 6. Start timer 8. Load page from disk
7000 avg = 6206 6000 Reduce 63% 5000 4000 Some Indexes Dropped avg = 2285 Time in seconds 3000 2000 avg = 959 1000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Order of execution Experimental results – workload shift