410 likes | 593 Views
월간 마이크로소프트웨어 창간 20 주년 기념 세미나. 데이터베이스 ( 가제 ). 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases -. 이 상 원 wonlee@ece.skku.ac.kr http://vldb.skku.ac.kr 성균관대 & ㈜ 엑셈. Contents. A Quick Overview of Database History Why Self-Tuning Databases? Database Tuning in the 21 st century
E N D
월간 마이크로소프트웨어 창간 20 주년 기념 세미나 데이터베이스(가제) 데이터베이스의 끝없는 도전 - Towards Self-Tuning Databases - 이 상 원 wonlee@ece.skku.ac.kr http://vldb.skku.ac.kr 성균관대 & ㈜엑셈
Contents • A Quick Overview of Database History • Why Self-Tuning Databases? • Database Tuning in the 21st century • Towards Self-Tuning Databases
A Quick Overview of Database History • File systems(50s) • Hierarchy/Network databases(60s) • Relational databases(70s) • Object databases(80s) • OR databases(90s) • XOR databases(00s)
A Quick Overview of Database History(2) • The state-of-the-arts functionalities of RDBMSs • A giantic black-hole of application intelligence • Developer DB2, Oracle, SQL Server ROLAP (Cube) Active DB (Trigger) Deductive DB (Recursion) MOLAP EXCEL Spatial DB Data Mining Temporal DB • Procedure • C/C++, Java, .. XML/Stream/ Bioinformatics/ Appl. Server/ Web Service .. Multimedia File System e-mail Queue
A Quick Overview of Database History(3) • The state-of-the-arts functionalities of RDBMSs • Rapidly growing internal optimization/tuning knobs • DBA DB2, Oracle, SQL Server MVs Buffer Mgmt Partitioning Extensibility Bitmap Index Cost-Based Optimization Automatic Disk Mgmt ……….
A Quick Overview of Database History(4) • Achievements of RDBMSs • Relational model and the “Esperanto” SQL “productivity” • Query optimization “can beat all but the best programmers” • Transaction management “ACID” • E. F. Codd gone! but Relational forever!! • A vision for database community "My prediction is that eventually all storage systems will evolve to be database systems."(Jim Gray, 2002)
A Quick Overview of Database History(5) DB = IT Core Platform in the 21st Century??? • ERP/CRM/SEM/BSC • DW/OLAP/Data Mining • Web Log Analysis • GIS/XML/Mobile • Bio-informatics • .... • Ubiquitous Databases
A Quick Overview of Database History(6) • One big challenge for RDBMS • Simplicity is a big issue. (Jim Gray, 2003) • Commercial DBMSs are exponentially complicated • Even 10-year veteran Oracle DBAs do not know 50% of all the new features of Oracle’s latest release • Worsely, it would take more than 1 week for them to understand the performance behavior of a single new feature • The gap between human DB learning curve and new features becomes larger “Crossing this chasm” seems to be impossible!! • It’s time for self-tuning intelligence!
What is DB Tuning? • Car tuning • Goals? • What? • How?
Why DB Tuning in the 21st Century? • DB performance influences all aspects of your business • DB performance management is one of the most important success factors in almost every IT projects • “미래 웹경쟁력은 데이터처리능력에”(Scott McNealy, Sun CEO, 전자신문) • DB tuning is not a technical issue, but a business issue! • Improve your productivity • Increase your revenue/profit • Increase your ROI(Return on Investment)
Why DB Tuning in the 21st Century?(2) • More data - doubling every 9 month • More users - from CEO to sales representatives • More complex query - e.g. OLAP/data mining query • Faster responses • e.g. web response time impacts revenue! • e.g. real time personalization • Data growth vs. computer speedup * Moore’s Law -- # of transistors/chip doubles every 18 months (1965)
Why DB Tuning in the 21st Century?(4) • Disk sales doubling every 9 months • Greg Papadopoulos, Sun CTO • Also from Winter VLDB survey (http://www.wintercorp.com) • Time to process all your data doubles every 18 months! • Without DB tuning, you need double-up investment on DB server and software every 6 month! • Maximize your DB performance!! (Let’s DBMax) • Maximize your PRODUCTIVITY • Maximize your REVENUE/PROFIT • Maximize your ROI
Why Self-Tuning Databases? • War of TCO(Total Cost of Ownership) in 21st century • Major costs of IT infra.: H/W, S/W, Human Cost • Moore’s law: H/W, S/W cost ↓ • BUT! Human cost ↑ • cf. 1950s: telephone industry • manual switching by operators automatic switching system
Why Self-Tuning Databases?(2) Blackbox Database Tuning
Self-Tuning: Targets UserProcesses Big Picture of a Simplied DBMS SQL Server Processes Oracle Instance SQL Exec. Engine Optimizer • SGA Redo LogBuffer Shared Pool Database Buffer Cache SQL Exec. Memory Oracle Database Parameter File Control Files Datafiles Redo Log Files
Self-Tuning: Targets(2) • SQLs • Memory • Index/Materialized Views • Emerging fields
SQL Self-Tuning: Optimizer • SQL best execution plan • access method + join order + join method • Rule-based optimizer vs. Cost-based optimizer • The Selinger-style optmizer • P. Selinger et al, “Access path selection in a relational database management systems,” SIGMOD 79 • SYSTEM/R • A “MUST” reading for every SQL guys • cost estimation • dynamic programming
SQL Self-Tuning(2): Cost-Based Optimzation 4. (Estimated) Best Execution Plan for SQL: Query Optimizer 1. SQL 2. Plan Generator P1, P2, .. , Pn Pi estimated cost RDBMS Instance Runtime Memory Environment: Buffer Cache Size, Sort/Hash Memory etc. for each plan Estimated Cost Cost Model Estimated Cardinality Combined Selectivity Selectivity & Cardinality Estimation Formula Real Data Data Dictionary Single Selectivity 3. Cost Estimator
SQL Self-Tuning(3): Cost-Based Optimzation • Cost-based optimization technique itself is a self-tuning process, but it lacks 5% • 80% of SQL: Best plan • 15%: top 3 plan • 5%: not so good • Fallacies of CBO • Uniform distribution • Attribute Independence • Join Independence
SQL Self-Tuning(4): Cost-Based Optimzation • How to fix the fallacies of CBO? • Various approaches: who? when? based on what? • Manual tuning • LEO • SOS [13] • Dynamic reoptimization • Parameteric query optimization • ….
SQL Self-Tuning(5): IBM DB2 LEO • LEO( LEarning Optimizer ) • DB part of IBM SMART: Self-Managing And Resource Tuning (See IBM’s IBM Autonomic Computing )
SQL Self-Tuning(6): IBM DB2 LEO Negative feed back!!! • eeed SELECT * FROM X,Y,Z WHERE X.PRICE>=100 AND Z.CITY=‘Denver’ AND Y.MONTH=‘Dec’ AND X.ID=Y.ID AND Y.NR=Z.NR GROUP BY A
SQL Self-Tuning(7): Oracle’s v$sql_plan_statistics • Runtime statistics monitor(Since Oracle9iR2) • You can compare optimizer’s estimation with real statistics • Then, you can find out where is the source of errors? ** Someday, Oracle and MS will come with LEO-like features • SKKU and XM are developing SOS tool • “Save Our SQL” • How to tune SQL (semi-)automatically? • Find the negative feedback • Drill down to the causes • Analyze the effect • Hint the SQL
Self-Tuning: Targets • SQLs • Memory • Index/Materialized Views • Emerging fields
Memory Self-Tuning Response Time M1-pass = sqrt(I x S x 2) Moptimal = I . Sort . Memory 1-pass optimal M-pass • Complex queries use memory intensive operators • Sorting, hashing, bitmap index manipulation etc. • The amount of available memory greatly affects their performance • However, there is only a finited amount of memory, shared by all concurrent users
Memory Self-Tuning(2): Oracle SQL Memory Mgmt. Sort Area, Hash Area, Bitmap Index • Pre-Oracle9i • Fixed PGA size • Sort(Hash)_Area_Size • Oracle9i [8] • PGA_Aggregate_Target • Dynamically allocated according to the needs Oracle Memory Model
Memory Self-Tuning(3): Oracle SQL Memory Mgmt. • Manual vs. Automatic: Response time
Memory Self-Tuning(4): Oracle SQL Memory Mgmt. • Manual vs. Automatic: Total Memory Usage Number of Users
Memory Self-Tuning(5): Oracle SQL Memory Mgmt. Active work areas profiles WP4 Local Memory Manager Sn Sn Sn Sn … Memory Usage Statistics WPn SQL Statements WP1 pga_aggregate_target WP2 SQL Memory Target WP3 Memory Bound Computation Memory Target Computation Memory Bound SMM Daemon • DBA specifies PGA_AGGREGATE_TARGET • Oracle divides PGA memory for max. performance and optimal memory usage
Self-Tuning: Targets • SQLs • Memory • Index/Materialized View(MV)s • Emerging fields
Index/MV Self-Tuning • Selection of Index/MVs: “SQL Workloads + Available Disk Space Best Index/MV sets” • Workload: queries + updates • Constraints: upper bound on storage space for indexes • Search: pick a configuration that is of “lowest” cost for the given database and workload • Industry cases • MS Autoadmin • IBM Design Advisor • Oracle OEM Index/Summary Advisor
Index/MV Self-Tuning(2): MS SQL Server • AutoAdmin: Self-Tuning and -Administering Databases[12] “We achieve this by enabling databases to track the usage of their systems and to gracefully adapt to application requirements. Thus, instead of applications having to track and tune databases, databases actively auto-tunes itself to be responsive to application needs.” • Workload driven physical database design • To reduce TCO(Total Cost of Ownership)
Index/MV Self-Tuning(3): MS SQL Server Workload Syntactic structure selection Microsoft SQL Server Candidate Index Selection Candidate Materialized View Selection Configuration Simulation and Cost Estimation Module Configuration Enumeration Final Recommendation Architecture of Index/MV Selection Tools Issue - search space • Large search space for indexes • Many columns to choose from • Kinds of indexes • Explosive search space for MVs • Query optimizers use physical design in novel ways • Physical design choices interact
Self-Tuning: Targets • SQLs • Memory • Index/Materialized View(MV)s • Emerging areas for self-tuning
Emerging Areas for Self-Tuning UserProcesses Big Picture of a Simplied DBMS SQL Server Processes Oracle Instance SQL Exec. Engine Optimizer • SGA Redo LogBuffer Shared Pool Database Buffer Cache SQL Exec. Memory Oracle Database Parameter File Control Files Datafiles Redo Log Files
Emerging Areas for Self-Tuning(2) • Automatic configuration of init. parameter[5] • e.g. IBM DB2 V7.2 “Configuration Advisor” • For 36 parameters, 98.4% of DB2 performance experts • Automating physical database layout[9] • “How database objects such as tables and indexes are assigned to disk drives can impact the I/O performance of the system” (MS SQL Server) • Automatic partitioning of tables under parallel databases (DB2)
Emerging Areas for Self-Tuning(3) • Accurate storage device access cost estimation[11] • CPU speed, disk I/O time, network, RAID rebuild • Inaccurate estimation cause (seriously) suboptimal query plan • Why wrong estimation? • storage or network may undergo heavy load or failure • older/newer than the h/w the optimizer assume • “We can achieve noticeable performance improvements by providing query optimizer with accurate and timely information about the current status of their storage devices”
Emerging Areas for Self-Tuning(4) • System tuning[1] • Most urgent, but only ad-hoc or rule-of-thumb approaches with some monitoring tools: Bizmax/MaxGuage, Orange/Chakra, Spotlight, Patrol, OEM • Only the black box consulting • Highly dependent on consultant or DBA experience • Not scientic/engineering realm!! • We need to develop mathematical model for the interactions among various system tuning knobs; Or, we need to construct and share the rule/knowledge base of the system tuning experts Monitoring Drill-down to Bottleneck Root Cause Analysis Tuning
Conclusion • Next decade will be self-tuning intelligence era • Better than average DBAs • BUT, You may be still the MVP for IT ROI only if • “Crossing the chasm between you and your DBMS technology” • Know your data • Know your application workloads • Know your DBMS intelligence Let’sDBMax!
References • Gerhard Weikum et al., “Self-Tuning Database Technology and Information Services: from Wishful Thinking to Viable Engineering” VLDB 2002, Ten-Year Award • Gerhard Weikum et al, “The COMFORT Automatic Tuning Project”, Information Systems 19(5), 1994 • Gerhard Weikum et al, “Special Issues on Self-Tuning Database and Applications,” Data Engineering Bulletine, Jun., 1999 • IBM’s Autonomic Computing Perspectives: http://www.resarch.ibm.com/autonomic/manifesto/autonomic_computing.pdf • Guy Lohman et al, “Toward Automomic computing with DB2 Universal Database,” SIGMOD Record, Sep., 2002 • MS AutoAdmin Homepage: http://www.research.microsoft.com/ • Oracle Manageability: http://technet.oracle.com/products/manageability/content.html
References(2) • Benoit Dageville and Mohamed Zait, “SQL Memory Management in Oracle9i,” VLDB 2002 • Jun Rao, Chun Zhang, Guy Lohman et al., “Automating Physical Database Design in a Parallel Database,” SIGMOD 2002 • S. Agrawal and Surajit Chaudhuri et al., “Automating Layout of Relational Databases,” ICDE 2003 • F. R. Reiss and T. Kanungo, “A Characterization of the Sensitivity of Query Optimization to Storage Access Cost Parameters,” SIGMOD 2003 • S. Agrawal and Surajit et al., “Automated Selection of Materialized Views and Indexes for SQL Databases,” VLDB 2000 • Michael Stillger, Guy M. Lohman et al., “LEO - DB2's LEarning Optimizer,” VLDB 2001