Statistics Profile For Query Optimization

Statistics Profile For Query Optimization WENYI NI Spring 2004, CSE8330 Presentition

Introduction What is statistics profile? • Every object has its own status. • In order to know its status, we need statistics. • The relation between Statistics profile and statistics. Spring 2004, CSE8330 Presentition

Cost Model From M.Tamer Oszu When DBMS use statistics profile? Spring 2004, CSE8330 Presentition

What does statistics profile collect? • The central tendency of the data • The range of the data • The size of the data • The distribution of the data Spring 2004, CSE8330 Presentition

Common types of statistics profile • Table profile • Attribute profile • Index profile Spring 2004, CSE8330 Presentition

Typical profiles Spring 2004, CSE8330 Presentition

Three ways to collect statistics • Exhaustive accumulation • Sampling • Piggyback Spring 2004, CSE8330 Presentition

Exhaustive accumulation • Calculate every statistics describer through scanning the related object exhaustively • Advantage Most Accurate • Disadvantage Heavy system load Spring 2004, CSE8330 Presentition

Sampling • Scan part of the related object. Estimate statistics through sample data • Advantage Low system overhead • Disadvantage Still have overhead. Statistics is not 100% accurate. Spring 2004, CSE8330 Presentition

Piggyback • Collect statistics through data in memory. Slightly change SQL statement to make full use of these data. • Types of piggyback • Vertical piggyback • Horizontal piggyback • Mixed piggyback Spring 2004, CSE8330 Presentition

Vertical piggyback • Include extra columns during query processing Example: Select student.name from student; rewrite to: Select student.name,student.age from student; Spring 2004, CSE8330 Presentition

No extra I/O, but extra cpu load. Solution: set piggyback level • AC1 = { x| x is a column in Table Ri referenced by Query Q} • AC2 = { x| x is an index column in Table Ri } – AC1 • AC3 = { x| x is a column in Table Ri and x is a part of the primary key or foreign key or referenced by a foreign key}-AC2 • AC4 = { x| x is a column in Table Ri }-AC3 Advantage: Choose your piggyback level according to the CPU load Spring 2004, CSE8330 Presentition

Horizontal piggyback • Include extra rows during query process Example: Select student.name, student.score From student where score >60; Rewrite to: Select student.name, student.score From student where score >60 or student.pid In (Select student.pid for student Where score>60); • Advantage Spring 2004, CSE8330 Presentition

Mixed piggyback • Use both vertical and horizontal piggyback method • Advantage Spring 2004, CSE8330 Presentition

Value distribution • Why we need it? Example: Select * from Student Where score>60; Size?? Spring 2004, CSE8330 Presentition

Answer: Size = 500*0.81*30 = 121.5 Where 500 is the cardinality of the student table. 30 is the size of each record Spring 2004, CSE8330 Presentition

How to get distribution table? • Histogram • Equal width • Equal height Spring 2004, CSE8330 Presentition

Bucket number • 1+ logn [rule of sturge 1927] Example: student table ( 500 records) 1+log500 = 10 • For equal width, put each value into the proper buckets • For equal height, make an order to the value, if the sampling size is m, decide the height k = m/(bucket number), and put the value in bucket in order Spring 2004, CSE8330 Presentition

Sampling • How many sample do we need? A sample size of 1064 can give a less than 10% error rate with 99% probability (mannino1988) To gain same error rate for varies size of table, Sample rate drops when size of table grows. Drop rate: log(n)/n Example: 20 sample with 2%error rate on table with 100 records We need 1000*0.2*(1-log(1000)/1000) samples to reach 2% error rate on table with 1000 records Spring 2004, CSE8330 Presentition

Summery & Future work • Low overhead • Low error rate, still have room to improve • The way to estimate the size of project and join operations with statistics still need be improved. Spring 2004, CSE8330 Presentition

The end Spring 2004, CSE8330 Presentition

Statistics Profile For Query Optimization

Statistics Profile For Query Optimization

Presentation Transcript

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization

Query optimization

Query Optimization

Query Optimization

Query Optimization

Query Optimization