80 likes | 98 Views
Self-Managing Cost Models. Stanford University. Shivnath Babu. Cost Models in Database Systems. Conventional query optimization: Enumerate query plans Estimate physical cost of each plan (e.g., execution time, total resources--CPU & I/O--required) Choose plan with minimum cost
E N D
Self-Managing Cost Models Stanford University Shivnath Babu
Cost Models in Database Systems • Conventional query optimization: • Enumerate query plans • Estimate physical cost of each plan (e.g., execution time, total resources--CPU & I/O--required) • Choose plan with minimum cost • Estimation of physical cost is based on (operator) cost models • Very important to have fairly accurate cost models
Current Approach to Deriving Cost Models • Trial and error • Classic: Linear combination of CPU cost & the number of disk blocks accessed • Sequential Vs. Random accesses • Data layout, data access pattern • Buffer pool hit ratio • Buffer pool size, data access pattern, number of concurrent queries • L1, L2, L3 cache hit ratio
Problems with the Current Approaches • Growing importance of: • Autonomic Computing • Diverse data management needs in many new apps • Non-monolithic uses of database software • Better user experience (Ex: SLAs, progress bars) • Current manual approach to cost model management is a hindrance in this new world: • Hard to port across system configurations (Ex: Local disk Vs. RAID Vs. NAS Vs. Remote database) or workloads • Complex, many lines of code, hard to maintain • Assumptions (Ex: ignores interference across queries) • Severely restricts auto-configuration and plug & play
Solution #1: Get Rid of Cost Models • Use Eddies: no plan, no optimizer no cost models • Jury is still out
#2: Automated Cost-Model Management • Bootstrapping--Start with: • An overall objective (Ex: minimize execution time) • A common-case model (Ex: CPU + Seq. I/O + Rand. I/O) • A list of other factors that could affect cost (Ex: cache misses, #concurrent processes) • Detect deviations from model during execution • Ignore deviations resulting from stats. estimation errors • Troubleshoot online (challenging) • Does the deviation matter? • What is the cause? Use extra “probe queries” • Update model and test: Online what-if analysis
Epilogue • Related work: • In data integration (e.g., CORDS-MDBS, Garlic) • In main-memory databases (e.g., Monet) • Not comprehensive or fully automated • Self-managing cost models: • A big step toward Autonomic Database Systems • Will improve re-usability of DB software • Should improve overall performance and user-experience