250 likes | 358 Views
Short Queries and Indexes. Local Adaptation from: Henrietta Dombrovskaya, Boris Novikov “System Tuning” Saint Petersburg, Russia 2006. What will be covered:. Which queries are considered short. Short queries and indexes Indexes and mass data update Choosing selection criteria
E N D
Short Queries and Indexes Local Adaptation from: Henrietta Dombrovskaya, Boris Novikov “System Tuning” Saint Petersburg, Russia 2006
What will be covered: • Which queries are considered short. • Short queries and indexes • Indexes and mass data update • Choosing selection criteria • Excessive selection conditions • How to avoid using indexes • Joins order • Impact of indexes on nested loops • Other index types Henrietta Dombrovskaya – Enova Financial
Which Queries are Considered Short? • The query is considered short, when result can be obtained processing the small number of records, even if the tables are large. • For short queries sorting, grouping, and even joins are not time consuming. • Optimization goal for short queries: to avoid full scan of large tables (for small tables full scan may be still OK) • Typically for short queries tuning is necessary to improve throughput, not response time – the user does not care whether it takes 50 ms or 150 ms. Henrietta Dombrovskaya – Enova Financial
Short Queries and Indexes • If we want to avoid full scan, some sort of index for the table should exist • Any index is a part of the database schema; we can create new indexes only if schema changes are allowed. Sometimes they can’t be implemented right away, and sometimes delayed index creation may be almost impossible • Be aware of potential implication of index creation on other queries and data modifications Henrietta Dombrovskaya – Enova Financial
Indexes and Bulk Data Update drop index index_1; drop index index_2; … Bulk INSERT … create index index_1(….); create index index_2(….); …. Henrietta Dombrovskaya – Enova Financial
Choosing Selection Criteria • Index selectivity • Unique indexes • Selection criteria and indexes • Compound indexes • Using index for data retrieval Henrietta Dombrovskaya – Enova Financial
Index Selectivity • Do not use an index with small number of distinct values (exception - bitmap indexes, where applicable) • Index usage order: make sure that the index with the highest selectivity level will be used first Henrietta Dombrovskaya – Enova Financial
Index Selectivity - Example select customer_id from customer_sourcescs where incoming_brand_id =11 AND type_cd IN ('lead_reject_import', 'import', 'pass_active_customer') AND cs.received_time > current_date - interval '24 hours' AND cs.received_time < current_timestamp - interval ' 15 minutes' AND cs.source_type_cd not in ('yesloans1stgbi','yesloansgbi','noworries1stgbi','noworriesgbi','aspiregbi','aspire-cpfgbi') We have indexes for type_cd, source_type_cd and received_time, the optimizer may choose condition with =, while in this case received_timehas higher selectivity level. Henrietta Dombrovskaya – Enova Financial
Unique Indexes • If a column is described as a primary key, the unique index will be created automatically • You may need extra unique constraints for data integrity purposes (each UNIQUE constraint will generate unique index, too) • Unique indexes make nested loops efficient. Henrietta Dombrovskaya – Enova Financial
Nested Loops FOR row1 in table1 loop For row2 in table2 loop If match(row1,row2) insert output row end loop end loop • When to use: joins and products • Cost: proportional (Size_of_T1)X(Size_of_T2) Table 1 Table 2 5 3 8 1 6 2 3 4 1 3 1
Selection Criteria and Indexes – Columns Transformation • Any column transformations will prevent from using an index: where lower(last_name)=‘smith’ • If we need to search by transformed value, we need to create additional index: CREATE INDEX people_m13 ON private.people USING btree (lower(last_name::text)); Henrietta Dombrovskaya – Enova Financial
Selection Criteria and Indexes- Using like Operator • Using like operator: • WHERE (lower(people.first_name) like E'chaman%‘ will not use the index Possible rewriting, if for some reason we can’t create an index: • WHERE (lower(people.first_name) >=E'chaman' and lower(people.first_name) <E'chamam') will use the index Henrietta Dombrovskaya – Enova Financial
Pattern Indexes CREATE INDEX people__last_name_pattern ON private.people (lower(last_name::text) text_pattern_ops); Henrietta Dombrovskaya – Enova Financial
Compound Indexes • Compound index is build for several columns of one table. Note: if an index was built for columns (X,Y,Z), you can use it to search X, XY and XYZ, not Y and not YZ. Postgres : equality on leading columns, inequality on other columns, which will still be scanned, but may save additional trip to the table. • Why to create compound index? • Additional selectivity • Additional data storage • Index-organized tables – not currently available in Postgres Henrietta Dombrovskaya – Enova Financial
Using Indexes for Data Retrieval When all the columns from select statement are included into compound index, they may be retrieved without accessing the table. Example: CREATE INDEX loans_m2 ON cnu.loans (customer_id, funding_date, status_cd); SELECT funding_date, status_cdFROM loans WHERE customer_id=1111111 Henrietta Dombrovskaya – Enova Financial
Using Multiple Indexes in Postgres Postgres can use the search results from multiple indexes by creating a bitmap of matching rows in main memory and then OR-ing or AND-ing them In this case the records will be scanned in the physical order, so the index-based ordering will be lost. Usage of compound indexes vs. sets of single-column indexes should be justified on case-by-case basis. Henrietta Dombrovskaya – Enova Financial
Excessive Selection Criteria We can add redundant selection criteria to pre-select small records subset from the big table: • prompt to use specific indexes • reduce the sizes of join arguments. Henrietta Dombrovskaya – Enova Financial
Excessive Selection Criteria – Example SELECT <…> FROM reports r, expense_itemsi WHERE r.rep_id=i.rep_id AND (r.proc_date=’1-jul-2003’ AND i.charge<0 OR r.proc_date=’15-jun-2003 AND i.charge=0 ); In this case the complex selection criteria can’t be applied before the tables are joined. (expense_itemsmay contain over several million records) Henrietta Dombrovskaya – Enova Financial
Excessive Selection Criteria – Example (2) We will modify this select statement the following way: SELECT <…>FROM reports r, expense_items I WHERE r.rep_id=i.rep_id AND (r.proc_dateIN (‘1-jul-2003’, ’15-jun-2003’) ) AND (r.proc_date=’1-jul-2003’ AND i.charge<0 OR r.proc_date=’15-jun-2003’ AND i.charge=0 ); This will allow to restrict reports to the ones processed on Jul 1 and Jul 15, and then join only those reports with expense_items. Henrietta Dombrovskaya – Enova Financial
How to Avoid Using Indexes • Use system-specific mechanisms, like optimizer hints • When optimizer hints can’t be used - modify selection criteria. Examples: • attr1+0=p_value • COALESCE (t1.attr2, 0)=t2.attr2 Henrietta Dombrovskaya – Enova Financial
Order of Joins In short queries the size of join result may be small because of: • Restrictions on the joined tables (reduce the number of records in join arguments) • Semi-join (one argument significantly restricts the result size) If the optimizer generates an execution plan with large intermediate results size, the order of joins should be changed Henrietta Dombrovskaya – Enova Financial
Order of Joins - Example SELECT DISTINCT c.* FROM customers c LEFT OUTER JOIN loans l on l.customer_id = c.id INNER JOIN customer_sourcescs ON cs.id = (SELECT MAX(id) FROM customer_sources cs1 WHERE cs1.customer_id = c.id AND incoming_brand_id = c.brand_id AND type_cd IN ('lead_reject_import', 'import', 'pass_active_customer')) LEFT OUTER JOIN work_items w ON w.customer_id = c.id and w.created_by in ('generate_ast_workitems','generate_ast_workitems_instl') and w.created_on > current_date - interval '24 hours‘ WHERE c.brand_id= 11 AND c.status_cd = 'active' AND c.fraud_flg = FALSE AND cs.received_time > current_date - interval '24 AND cs.received_time < current_timestamp - interval ' 15 minutes' AND cs.source_type_cdnot in ('yesloans1stgbi','yesloansgbi','noworries1stgbi','noworriesgbi','aspiregbi','aspire-cpfgbi') Henrietta Dombrovskaya – Enova Financial
Impact of Indexes on Nested Loops • For small queries nested loops will be efficient, when the inner table is indexed by join attribute • No transformation should be applied to the indexed attribute. • Note: this may not work for long queries Henrietta Dombrovskaya – Enova Financial
Other Index Types • Function-based indexes - use when the queries WHERE clause often contains the same attribute(s) transformation(like end_date - start_date) • Bitmap indexes - use for combination on several low-cardinality attributes, mostly for Data Warehouse type applications. • Partial indexes: CREATE INDEX payment_transactions_m16c ON cnu.payment_transactions_committed (credit_account_cd, status_cd, eff_date, payment_method_cd) WHERE status_cd::text = 'created'::text AND payment_method_cd::text = 'bank_account_ach'::text AND (credit_account_cd::text = 'disbursement_account'::text OR credit_account_cd::text = 'cso_disbursement_account'::text); Henrietta Dombrovskaya – Enova Financial
More Complex Index Example CREATE INDEX customers__email_alt_pattern_idx ON cnu.customers (lower(email_alt::text) text_pattern_ops) WHERE email_alt IS NOT NULL; Selection criteria example: lower(email) like ‘johns%' OR (email_alt IS NOT NULL AND lower(email_alt) like ‘johns%') This condition will use bitmap OR’ing. Henrietta Dombrovskaya – Enova Financial