Query optimisation

Query optimisation

Example - hospital database Doctors 100 tuples Patients 2500 tuples

Query Get the name of the doctors who treat patients suffering from the prk11 disease SELECT Doctors.name FROM Doctors, Patients WHERE Disease = ‘prk11’ AND D_name = Doctors.Name

Evaluation #1 • restrict Patients to those who suffer from prk11 • read: 2500 tuples; result: estimated 50 tuples; no need to write intermediate result - sufficiently small • join above result with Doctors • read: 100 tuples (Doctors); result 50 tuples; no need to write to disk intermediate result • project result over Doctors.name • the desired result is in the memory • estimated cost (read and write) 2600

Evaluation #2 • suppose the internal memory allows only some 350 tuples • join Patients with Doctors • read Patients in batches of 250 tuples; therefore read Doctors 10 times; read: 2500 + 1000 = 3500; write intermediate result (too big) to disk: 2500; • restrict above result • read 2500; result: estimated 50 tuples; • project • cost: 8500 (read and write)

Intermediate conclusions • the evaluation strategy (procedural aspect) can lead to very big differences in computation time, for the same query • computation time: read from and write to disk (quintessential) • processor time • the actual evaluation procedures are far more complex than in the previous introductory example

Optimisation - what • deciding upon the best strategy of evaluating a query • it is performed automatically by the optimiser of the DBMS • not just for data retrieval operations, but for updating operations as well (e.g. UPDATE) • not guaranteed to give the best result

Optimisation - how • based on statistical information about the specific database (not necessarily, though) perform • expression transformation (cast query in some internal form and convert to respective canonical form • candidate low level procedures selection • query plans generation and selection • statistical information - could you think of examples? • cardinality of base relations, indexes, ...

Cast (transform) query in some internal form • internal format • more suitable for automatic processing • trees (syntax tree or query tree) • from a conceptual point of view is is easier to assume that the internal format is relational algebra

Convert to canonical form • the initial expression is transformed into an equivalent but more efficient form • “efficient form” = efficient when executed • these transformation are performed independently from actual data values and access paths

Expression transformation • examples (A WHERE condition#1) WHERE condition#2 (A WHERE condition#1 AND condition#2) (A [projection#1] ) [projection#2] A [projection#2] (A [projection]) WHERE condition (A WHERE condition) [restriction]

Expression transformation • distributivity • commutativity and associativity • idempotence • scalar expressions • conditional expressions • semantic transformation

Set level operations • the operators of relational algebra are set level • i.e. they manipulate sets (relations) and not individual tuples • however, these operators are implemented by internal (DBMS) procedures • these procedures, inherently, need tuple-access (in fact, they need access to scalar values)

Choose candidate low-level procedures • the optimiser decides how to execute the query (expressed in canonical form) • access paths are relevant at this stage • in the main, each basic operation (join , restriction, …) has a set of procedures that implement it • e.g. RESTRICTION - (1) on candidate key; (2) on indexed key; (3) on other attributes … • each procedure has associated a cost function (usually based on the required I/O disk operations); these functions are used in the next stage

Implementing JOIN - examples • R and P - two relations to be joined • J - the attribute on which the (natural) join is performed • R[i] and P[j] mean the i-th tuple of R and the j-th tuple of P, respectively • R[i].J means the value of the attribute J for the i-th tuple of the relation R • R has M and P has N tuples, respectively

Implementing JOIN - brute force for i:=1 to M for j := 1 to N do if R[i].J = P[j].J then add joined tuple R[i]*P[j] to result end end end

Index lookup index X on Patients.D_name

Implementing JOIN - index lookup /* index X on P.J */ for i:=1 to M for j := 1 to K[i] do add joined tuple (R[i] * PK[j]) to result /* PK[j] represents the tuple of P that K[j] points to */ end end

Choose the cheapest query plan • construct query plans (query evaluation plan) • combine candidate low level procedures • choose the cheapest • total cost = the sum of individual costs • individual costs depend on the actual data values; estimates are used instead, based on statistical data • usually not all possible evaluation procedures are generated; the search space is reduced by applying heuristics

Database statistics - in the data dictionary • for each base table • cardinality • space occupied • etc. • for each column of each base table • no of distinct values • maximum, minimum and average values • histogram of values • … • ...

An optimiser is never perfect • the following example is a real life example • suppose a Postgres definition for • base relation: Treatment(Patient, Drug, Disease, …) • the query • get all the drugs that are taken by patients that suffer from prk11 • (all the drugs, not only those for prk11) SELECT DISTINCT Drug FROM Treatment WHERE Patient IN (SELECT Patient FROM Treatment WHERE Disease = ‘prk11’) ; • the query is far slower that the equivalent one (next) ...

An optimiser is never perfect /* this query is faster than the previous one, even though it seems to be performing more computations - Patient is not unique! */ CREATE VIEW V_Treatment AS SELECT * FROM Treatment SELECT DISTINCT Treatment.Drug FROM Treatment, V_Treatment WHERE Treatment.Patient = V_Treatment.Patient AND Disease = ‘prk11’ ;

Query optimisation

Query optimisation

Presentation Transcript

Physical DB Issues, Indexes, Query Optimisation

PORTFOLIO OPTIMISATION

Medicines Optimisation

Consumer Optimisation

Optimisation

Ubiquitous Optimisation

Optimisation Strategies

Optimisation

Optimisation

Dynamic Optimisation

Calibration/Optimisation

Linear Optimisation

Physical DB Issues, Indexes, Query Optimisation

Consumer Optimisation

Website optimisation