450 likes | 756 Views
CSI 3130 - 2009 - Lab 5. PostgreSQL Query Plans. Daniel Antwi www.site.uottawa.ca/~dantw005/csi3130. Outline. Review: building PostgreSQL Query Plans. Building PostgreSQL. Download the source code postgresql-8.4.1.tar.gz http://www.postgresql.org/ftp/source/v8.4.1/ Decompress
E N D
CSI 3130 - 2009 - Lab 5 PostgreSQL Query Plans Daniel Antwi www.site.uottawa.ca/~dantw005/csi3130
Outline • Review: building PostgreSQL • Query Plans
Building PostgreSQL • Download the source code • postgresql-8.4.1.tar.gz • http://www.postgresql.org/ftp/source/v8.4.1/ • Decompress • Configure • Make install • Initialize a data directory • Start the sever and clients
Commands $ tar xvf postgresql-8.4.1.tar.gz $ pwd $ cd postgresql-8.4.1 $ ./configure --prefix=/home/rwchen/pginstall --enable-debug --enable-cassert --enable-depend $ make install $ cd ../pginstall/bin $ ./initdb -D../data $ ./postgres -D ../data -p5555(Server) $ ./psql postgres -p 5555(Client)
Query Plans • A query plan is a set of steps used to access or modify information in a DBMS. • Given a query, the query optimizer will evaluate different query plans and choose the best one (it considers). • Indexes • Join methods
Query Plan Choice • Scan • Sequential scan • Index scan (btree, hash ...) • Join • Merge join • Index jon • Nested loop join
Example: test.sql • We will try different queries and plans over the example • Two tables: • cust(cid, name, bno); • branch (bno, name, region); create table cust (cid int primary key, name char(40), bno int); create sequence seq_cust; create index cust_bno on cust(bno); create table branch (bno int primary key, name char(40), region int);
Intialize the Database • Insert tuples: insert into cust values(nextval('seq_cust'), 'aaaaaaaaaaaaaaaaa', (random()*50)::int); insert into cust select nextval('seq_cust'), name, (random()*50)::int from cust; • Analyse • Refresh statistics information about the tables • analyse cust;
Explain Query Plans - Scan • Explain will generate the best plan for a query, but not execute it postgres=# explain select * from cust where cid = 102; QUERY PLAN ----------------------------------------------------------------------- Index Scan using cust_pkey on cust (cost=0.00..8.29 rows=1 width=49) Index Cond: (cid = 102) (2 rows) postgres=# explain select * from cust where cid < 102; QUERY PLAN ------------------------------------------------------------------------ Index Scan using cust_pkey on cust (cost=0.00..8.73 rows=26 width=49) Index Cond: (cid < 102) (2 rows) postgres=# explain select * from cust where cid > 102; QUERY PLAN ------------------------------------------------------------- Seq Scan on cust (cost=0.00..5847.80 rows=262118 width=49) Filter: (cid > 102) (2 rows)
Planner Configurations • In <postgresql.conf> (under the data directory) #------------------------------------------------------------------------------ # QUERY TUNING #------------------------------------------------------------------------------ # - Planner Method Configuration - #enable_bitmapscan = on #enable_hashagg = on #enable_hashjoin = on #enable_indexscan = on #enable_mergejoin = on #enable_nestloop = on #enable_seqscan = on #enable_sort = on #enable_tidscan = on
Explain Query Plans - Scan • Try the following set enable_indexscan = off; explain select * from cust where cid = 102; explain select * from cust where cid < 102; explain select * from cust where cid > 102;
Explain Query Plans - Join • Join cust and branch on bno, and sort the results on branch.region postgres=# explain select * from cust, branch where cust.bno = branch.bno order by branch.region; QUERY PLAN ------------------------------------------------------------------------- Sort (cost=60170.76..60826.12 rows=262144 width=98) Sort Key: branch.region -> Hash Join (cost=4.88..8801.80 rows=262144 width=98) Hash Cond: (cust.bno = branch.bno) -> Seq Scan on cust (cost=0.00..5192.44 rows=262144 width=49) -> Hash (cost=3.28..3.28 rows=128 width=49) -> Seq Scan on branch (cost=0.00..3.28 rows=128 width=49) (7 rows) • A hash-join plan is generated by default.
postgres=# set enable_hashjoin = off; SET postgres=# explain select * from cust, branch where cust.bno = branch.bno order by branch.region; QUERY PLAN ---------------------------------------------------------------------- Sort (cost=71170.19..71825.55 rows=262144 width=98) Sort Key: branch.region -> Merge Join (cost=326.38..19801.23 rows=262144 width=98) Merge Cond: (branch.bno = cust.bno) -> Index Scan using branch_pkey on branch (cost=0.00..15.17 rows=128 width=49) -> Index Scan using cust_bno on cust (cost=0.00..16518.38 rows=262144 width=49) (6 rows) • Hash-join disabled • A Merge join plan based on index scan
postgres=# set enable_mergejoin = off; SET postgres=# explain select * from cust, branch where cust.bno = branch.bno order by branch.region; QUERY PLAN ------------------------------------------------------------------------------ Sort (cost=83740.80..84396.16 rows=262144 width=98) Sort Key: branch.region -> Nested Loop (cost=0.00..32371.84 rows=262144 width=98) -> Seq Scan on branch (cost=0.00..3.28 rows=128 width=49) -> Index Scan using cust_bno on cust (cost=0.00..188.63 rows=5140 width=49) Index Cond: (cust.bno = branch.bno) (6 rows) • Merge join and Hash-join disabled • A nested Loop join plan with • Index scan on cust • Sequential scan on branch
postgres=# set enable_indexscan = off; SET postgres=# explain select * from cust, branch where cust.bno = branch.bno order by branch.region; QUERY PLAN ------------------------------------------------------------------------------ Sort (cost=83644.48..84299.84 rows=262144 width=98) Sort Key: branch.region -> Nested Loop (cost=58.17..32275.52 rows=262144 width=98) -> Seq Scan on branch (cost=0.00..3.28 rows=128 width=49) -> Bitmap Heap Scan on cust (cost=58.17..187.88 rows=5140 width=49) Recheck Cond: (cust.bno = branch.bno) -> Bitmap Index Scan on cust_bno (cost=0.00..56.89 rows=5140 width=0) Index Cond: (cust.bno = branch.bno) (8 rows) • Merge join, hash-join, and index-scan disabled • A nested Loop join plan with • Bitmap index scan on cust • Bitmap sequential scan on branch
postgres=# set enable_bitmapscan = off; SET postgres=# set enable_mergejoin = on; SET postgres=# explain select * from cust, branch where cust.bno = branch.bno order by branch.region; QUERY PLAN ------------------------------------------------------------------------------------- Sort (cost=101118.53..101773.89 rows=262144 width=98) Sort Key: branch.region -> Merge Join (cost=45842.07..49749.57 rows=262144 width=98) Merge Cond: (branch.bno = cust.bno) -> Sort (cost=7.76..8.08 rows=128 width=49) Sort Key: branch.bno -> Seq Scan on branch (cost=0.00..3.28 rows=128 width=49) -> Materialize (cost=45809.40..49086.20 rows=262144 width=49) -> Sort (cost=45809.40..46464.76 rows=262144 width=49) Sort Key: cust.bno -> Seq Scan on cust (cost=0.00..5192.44 rows=262144 width=49) (11 rows) • Disable bitmapscan (also indexscan and hash-join) and enable mergejoin • A sort merge-join plan is generated
Compare Query Plans • We have 5 plans for the same query • select * from cust, branch where cust.bno = branch.bno order by branch.region; • The costs are estimated. But they are not actually costs. • You can try other combinations.
Real Execution • Use EXPLAIN ANALYSE • Or, setup timing from psql: • \timing
Self-Study • PostgreSQL Document Chap. 14 • http://www.postgresql.org/docs/8.4/interactive/using-explain.html
CSI 3130 – 2012 – Lab 6 Query Planner Daniel Antwi http://www.site.uottawa.ca/~dantw005/csi3130/
Outline • Query Planner: The Big Picture • Modify Source Codes
PostgreSQL multiuser database server • postmaster is the PostgreSQL multiuser database server. • In order for a client application to access a database it connects to a running postmaster. • The postmaster then starts a separate server process postgres to handle the connection. • The postmaster also manages the communication among server processes.
Planner/Optimizer • The task of the planner/optimizer is to create an optimal execution plan. • Scan methods • seqscan, index scan ... • Join methods • nested loop join, merge join, hash join • Join order • for more than two relations
Modify Planner Source Codes • We will modify PostgreSQL source codes • query planner • Under: • src/backend/optimizer • src/backend/optimizer/plan • src/backend/optimizer/path
Modify the Code: Example • Add the debug maro • Edit "src/backend/optimizer/plan/planner.c“ and “src/backend/optimizer/path/allpaths.c” • Add “#define OPTIMIZER_DEBUG” • Edit “/src/backend/optimizer/util/pathnode.c” • After line: 405, add one line • pathnode->total_cost=0; • Make install again • Start the server and try • explain select * from cust where cid=100;
Debug using GDB • Start GDB • gdb ./postgres • Set breakpoints • break pathnode.c:402 • Start the server in single-user mode • start --single -D../data postgres • continue • Execute a SQL • explain select * from cust where cid=100; • (break at the breakpoint in pathnode.c) • Track • backtrace (see the calling stack) • step 1 (track step by step) • continue
Related Files • src/backend/optimizer/plan/planmain.c • src/backend/optimizer/path/allpaths.c • set_plain_rel_pathlist • src/backend/optimizer/path/costsize.c • src/backend/optimizer/path/pathkeys.c • src/backend/optimizer/plan/createplan.c • src/backend/optimizer/README • Optimizer Data Structure • PathKeys
Related Source Codes • src/backend/optimizer/plan/planmain.c • Line 85: query_planner() • Generate a plan • src/backend/optimizer/plan/allpaths.c • Line 86: make_one_relation() • Find all paths to generate a result relation • Line 166: set_rel_pathlist() • Line 214: set_plain_rel_pathlist() • Build paths for a base relation • /src/backend/optimizer/util/pathnode.c • Line 397: create_seqscan_path()
Get Details of Plans • In PSQL use EXPLAIN VERBOSE • From server: • use “--log-error-verbosity=verbose -d5” • Enable OPTIMIZER_DEBUG • Add the following line at the beginning of "src/backend/optimizer/plan/planner.c“ and “src/backend/optimizer/path/allpaths.c” • #define OPTIMIZER_DEBUG • And then make install again • Try the queries from the last lab • example
List of Commands - Installation cd postgresql-9.0.2 ./configure --enable-debug --enable-cassert --enable-depend --prefix=/home/rwchen/pginstall • Make • Build PostgreSQL • $gmake • Wait for this message: • All of PostgreSQL successfully made. Ready to install. • Install • Install PostgreSQL to $HOME/pginsalldirectory • $make install • Wait for this message: • PostgreSQLinstallation complete. Change to your own directory (use pwd to get current path)
List of Commands - GDB cd ../pginstall/bin ./initdb –D../data ./postgres –D../data –p 5678 ./psql postgres –p 5678 ./postgres --single –D../data postgres (use Ctrl+D to exit) Change to a random port number to avoid conflicts gdb ./postgres break pathnode.c:402 start --single -D../data postgres continue backtrace step 1 Single-user mode
GDB reference card • http://users.ece.utexas.edu/~adnan/gdb-refcard.pdf • vi reference card • http://www.digilife.be/quickreferences/QRC/Vi%20Reference%20Card.pdf
Reference • Chap. 43.5. Planner/Optimizer • http://www.postgresql.org/docs/8.4/interactive/planner-optimizer.html • Chap 17.3. Starting the Database Server • http://www.postgresql.org/docs/8.4/interactive/server-start.html • PostgreSQL Internals Presentation • http://www.postgresql.org/files/developer/internalpics.pdf