530 likes | 712 Views
Powerful and Efficient SQL Features (that are rarely used). Kenneth Naim Consultant Session #303. Speaker Qualifications. Kenneth Naim – Oracle Database / Data Warehouse Consultant.
E N D
Powerful and Efficient SQL Features(that are rarely used) Kenneth NaimConsultantSession #303
Speaker Qualifications Kenneth Naim – Oracle Database / Data Warehouse Consultant • Architected and coded the ETL for several data warehouses for insurance and points processing companies. • Wrote proprietary batch (sub-ledger) interfaces for financial applications. • Tuned dozens of batch & OLTP processes by writing better and more creative sql.
Creative problem solving • Creative solutions are created when tools, knowledge and imagination are combined.
SQL Features • Ordering of nulls • Regular Expressions • Random Number Generator (dbms_random) • Case Statements • Nullif • Coalesce • Analytical Functions (Ntile, Row_Number, Rank/Drank, Lead/Lag etc.) • Sub-queries
SQL Features Cont’d • Inline Views • Order By clause within an Inline View • Merge Statements • Updateable Views • Full Outer Joins • Partitioned Outer Joins • Set Operators (Union, Minus, Intersect) • Hierarchical Queries (Connect By)
SQL Features Cont’d • Global Temporary Tables • External Tables • Model Clause • Multi-table inserts • Hints
Standard SQL Clauses select to_char(sum(b.sal),'9,9999.00'), dname from dept a, emp b where a.deptno=b.deptno and loc!='BOSTON' groupby a.dname having sum(b.sal)>9000 orderby dname;
Ordering of Nulls (8i) • Null sort as the last/highest value when sorting in ascending order • Prior to Oracle 8i changing how Nulls sorted required use of nvl and/or decode functions. • Nulls can now be explicitly set to sort first or last by specifying “nulls first|last” select * from scott.emp order by comm nulls first;
Regular Expressions Four regular expression functions exist in SQL and work in a similar manor to their non regular expression counterpart. • regexp_like • regexp_instr • regexp_substr • regexp_replace
Regular Expressions – Regexp_like Regular expression based (regex) functions significantly simplified the task of data validation and cleansing. Identifying zip codes or phone numbers containing non-digits for example. Regexp_like differs from the like operator as the pattern is specified within function. 1. replace( translate(postal_code,'1234567890~', '~~~~~~~~~~x'), '~',null) is not null 2. regexp_like(postal_code,'[^[:digit:]]')
Regular Expressions – Cont’d update hr.locations set postal_code = regexp_replace(postal_code,'[^[:digit:]]',null) whereregexp_like(postal_code,'[^[:digit:]]')
Dbms_random – Random number generator Typically data sets are ordered explicitly or implicitly however there are times when an explicit random ordering can be beneficial. • Content or ad selection for website display. • Picking winners from a list of entries for a promotion or contest. • Control and experimental group selection for marketing materials.
Dbms_random – Content Rotation • To prevent the same content from being displayed for customers that returned to a website multiple times per day dbms_random was used to provide a “fresh” list. select content_id,... fromtables... where conditions... orderby prioriry asc,dbms_random.random;
Dbms_random – Contest winner Selection To pick a set of winner for various promotions we used a query that returned all eligible entries and sorted them randomly. We then picked the first n number of entries required as winners and alternates. insertinto entries (entrant_id, entry_number) select entrant_id, rownum from(select member_id entrant_id fromtables... where conditions.... orderbydbms_random.random);
Dbms_random – Control Group Selection To test the effectiveness of different marketing materials members needed to be randomly separated into four equally sized groups. • trunc(dbms_random.value(1,5)) • Produced the correct group numbers however the sets sizes were not exactly equal as the distribution was truly random. • ntile(4)over(orderbydbms_random.random) • Ntile, an analytical function breaks apart sets into smaller set differing in count by at most one.
Case Statements (8i) • Case Statements within sql statements allow the incorporation of If/Then/Else logic into SQL that normally would be written in a procedural language. selecta.*,casewhen comm isnullor comm=0 then'None' when comm between 300 and 499 then'Basic' when comm between 500 and 1199 and deptno=20 then'Mid-Level' when comm> 1200 then'High' else'Unknown'end comm from scott.emp a;
Case Statements Cont’d • Criteria within each when statement can be completely different. • Can be used to filter data in the Select clause allowing the combination of two similar queries with differing filters to be executed in one pass rather than two or more. selectsum(casewhen dname ='ACCOUNTING' then sal elsenullend) tot_acct_sal, sum(casewhen loc ='DALLAS' then sal elsenullend) tot_dallas_sal from scott.emp a, scott.dept b wherea.deptno=b.deptno;
Specialized Case Statements • Coalesce coalesce(primary_rate_code, secondary_rate_code, tertiary_rate_code) casewhen expr1 isnotnullthen expr1 when expr2 isnotnullthen expr2 when expr3 isnotnullthen expr3 elsenullend • Nullif nullif(expr1,expr2) casewhen expr1 = expr2 thennullelse expr1 end
Analytical Functions (8.1.6) • Allow each row to access data present in the entire set or specified subset. • Process the result set after the Where, Group by, and Having clauses have been applied but prior to the Order by clause. • Are permitted in the Select, Group by and Order by clauses. • Can nest and be in nested in regular functions but cannot be nested in/by another analytical function. Inline views must be used for this purpose as well as filtering on them.
Analytical Functions • Can be defined to process the entire set or a sub-set by using the Partition By clause • Are useful for • Aggregating data within a subset or a moving window (50-day moving average). • Ranking data within a sub-set (rank of members by points earned within each metro area). • Comparing data from the current row against a previous and following row (days between current & previous sale) eliminating the need for a self join. • Comparing data from the current row against an aggregate (each policies premium as a percent of all premium collected).
Analytical Functions Cont’d case when row_number ()over(partitionby lgl_enty_id orderby eff_dt)=1 then 1 when eff_dt-lag(end_dt, 1, end_dt)over(partitionby lgl_enty_id orderby eff_dt)>1 then row_number ()over(partitionby lgl_enty_id orderby eff_dt) --lapsed elsenull-- current policy end rn max(rn)over(partitionby lgl_enty_id orderby eff_dt) grp first_value(policy_id)over(partitionby lgl_enty_id, grp orderby plcy_prd_eff_dt) group_policy_id
Sub-queries Sub-queries are nested select statements that can be used anywhere a literal can be used except • Default values of a column • Returning clause • Function based index expressions • Check constraints • When conditions of triggers • Group By, Having and Connect By clauses
Sub-queries • Can be nested 255 levels deep • Must be scalar, return only one finite value when used with operators that can only accept one value (=, >, <, +, -, *, /, like etc.) • Are most commonly used an In clause within a dml statement and can return multiple values. select policy_id, premium_period_eff_dt, premium_period_end_dt from premium_periods where(policy_id, policy_period_eff_dt)in (select policy_id, policy_period_eff_dt from policy_periods where premium_type='a');
With Clause – Sub-query Factoring (9ir2) Sub-query factoring benefits • Materializes the sub-query so the result set can be reused without multiple executions of the sub-query • Eliminates sub-query repetition enhancing code maintainability. • Shortens complex code increasing readability. with sub_query_alias_1 as(firstselectstatement goes here), sub_query_alias_2 as(secondselectstatement goes here), sub_query_alias_3 as(third selectstatement goes here) select * from table_1 a, sub_query_alias_1 b, sub_query_alias_2 c wherea.col1=b.col1 and b.col2 =c.col2 and b.col3 in(select d.col3 from sub_query_alias_3)
With Clause Example with q as(select policy_period_key from policies a, calendar_premium b wherea.policy_key= b.policy_key and policy_period_start_date betweento_date('1-jan-2004') andto_date('1-jan-2007')-1/86400 and pl_team in('A','B') groupby policy_period_key havingsum(premium)>=50000) select claim_number,nvl(total_incurred,0),nvl(total_paid,0) from(select claim_number,sum(incurred_loss_amount) total_incurred from claims a, incurred_losses b wherea.claim_key=b.claim_key and policy_period_key in(select policy_period_key from q)) fullouterjoin (select claim_number,sum(paid_loss_amount) total_paid from claims a, paid_losses b wherea.claim_key=b.claim_key and policy_period_key in(select policy_period_key from q)) y using claim_number;
Inline Views Inline views are • Sub-queries in the From clause of a select statement or Using clause of the merge statement and are used in place of tables • Useful to aggregate and/or filter data prior to being joined. • Used to avoid joining issues such as fan and chasm traps when dealing with multiple fact style tables.
Order by in an inline view (8i) • Pagination queries were very difficult to write prior to 8i and we routinely done on client side. • Nesting an inline view with an order by reduces overhead on the sever and client side as only the required data needs to be processed and sent over the network. select ename, sal from(select ename, sal,rownum rn from(select ename, sal from scott.emp orderby sal)) where rn between 5 and 10;
Merge Statements – aka Upsert (9i) • Allow the manipulation of data residing in one table based upon whether data exists in another set (table/ view/ in-line view). • Data could only be updated if matching data existed and insert if the matching data did not exist.(9i) • Almost any combination of inserts, updates, and deletes are allowed. (10g) • The When Matched and When Not Matched clauses are optional.(10g) • A Where clause can be applied to both match clauses.(10g)
Merge Statements Cont’d mergeinto policy_lists x using(selecta.plcy_id,a.plcy_prd_eff_dt,a.prem_typ_cd from policy_period_premium_drv a wherea.prem_typ_cd='a' and a.rt_elem_typ_cd='eap') y on(x.plcy_id=y.plcy_id and x.plcy_prd_eff_dt=y.plcy_prd_eff_dt) whenmatchedthenupdatesetx.prem_typ_cd=y.prem_typ_cd whennotmatchedtheninsert(plcy_id, plcy_prd_eff_dt, prem_typ_cd) values(y.plcy_id, y.plcy_prd_eff_dt, y.prem_typ_cd) where (plcy_prd_eff_dt>’15-may-2003’ );
Updateable Views Defined and inline views can be updated as long as only data from one base table is updated and the view does not contain any of the following clauses. • Aggregate Functions (SUM, AVG, COUNT, MAX, MIN) • Set Operators • Distinct • Group by • Order by • Connect by
Updateable Views Cont’d update( select empno,a.deptno, dname, comm from scott.emp a, scott.dept b wherea.deptno=b.deptno) set comm=comm * 1.1 where dname='SALES';
Full Outer Joins (9i) Set A Set B
Full Outer Joins Cont’d Result Set
Full Outer Joins Cont’d with q as(select policy_id from policy_list where change_date betweentrunc(sysdate-1) andtrunc(sysdate)-1/86400) select policy_id,category, a.balance, b.transaction_amount, nvl(a.balance,0)-nvl(b.transaction_amount,0) new_transaction_amount from (select policy_id,category, balance from source_tables where policy_id in(select policy_id from q))a fullouterjoin (select policy_id,category,sum(transaction_amount) transaction_amount from warehouse_tables where policy_id in(select policy_id from q)) b using(policy_id,category) wherenvl(balance,0)-nvl(transaction_amount,0)<>0;
Partitioned Outer Joins Cont’d • Cartesion Product Version select product, sales_quarter,nvl(total_quantity_sold,0) total_quantity_sold from(select product, sales_quarter from(select year_quarter sales_quarter from days where end_of_quarter='Yes' andyear=2005)a, (select product from products where product in(13,14)) b)x, (select product, sales_quarter,sum(quantity_sold) total_quantity_sold from sales group by product, sales_quarter) y wherex.product=y.product(+) andx.sales_quarter=y.sales_quarter (+)
Partitioned Outer Joins Cont’d • Partitioned Outer Join Version selecta.product, b.sales_quarter, nvl(total_quantity_sold,0) total_quantity_sold from(select product, sales_quarter, sum(quantity_sold) total_quantity_sold from sales group by product, sales_quarter) b partiton by(product)rightouterjoin (select year_quarter sales_quarter from days where end_of_quarter='Yes' andyear=2005)aon b.sales_quarter=a.sales_quarter;
Hierarchical Queries (Connect By) • Connect By allows the creation of tree relationships within sql. • In 10g the connect by feature was enhanced to include: • Connect_by_root – returns the root of the hierarchy for the current row. • Connect_by_iscycle – flag indicating that the current record is part of an infinite loop ( A-B-C-A). • Connect_by_isleaf – flag identifying that the current row does not have children rows. • Nocycle – prevents the query from throwing an error if it identifies an infinite loop.
Hierarchical Queries Cont’d Connect By allows the creation of tree relationships within sql. select empno, ename,job,level, substr(sys_connect_by_path(ename,'>‘) ,2)path, connect_by_iscyclecycle from scott.emp a startwithjob='PRESIDENT' connectbynocycleprior empno = mgr;
Set Operations • Union • Union all • Minus • Intersect
Set Operation example select * frompolicies x, (select plcy_id, plcy_prd_eff_dt,'e' prem_typ_cd from policy_coverage_premium a where plcy_prd_eff_dt <= c_start_of_day_to_process andprem_typ_cd ='a'and empl_id_crea = 99 UNION select plcy_id, plcy_prd_eff_dt,'e' prem_typ_cd from policy_premium_prd_prem_drv a where prem_typ_cd ='e' and plcy_prd_eff_dt <= c_start_of_day_to_process andplcy_prem_prd_prem_drv_eff_dtm between c_start_of_day_to_process and c_end_of_day_to_process) UNION select agre_id, b.plcy_prd_eff_dt,'e' prem_typ_cd from la_agreement_rule_asgn_hist b where b.la_agre_rl_asgn_hist_crea_dtm between c_start_of_day_to_process and c_end_of_day_to_process and plcy_prd_eff_dt <= c_start_of_day_to_process)y where x.plcy_id=y.plcy_id and x.plcy_prd_eff_dt=y.plcy_prd_eff_dt and x.prem_typ_cd=y.prem_typ_cd;
Global Temporary Tables • Have a persistent data dictionary definition. • Has table level options for data to only persist for the duration of a session or a transaction. • Is stored in memory and uses temporary segments once it gets too large. • Act like normal tables and can be queried, built into views, stored procedures etc. • Can have indexes and constraints. • Useful for holding intermediate results that need to be accessed by multiple queries that do not need persistent storage.
Global Temporary Tables Cont’d • Data can only be been seen by the session that inserted it even after it has been committed. • Export/import/Datapump will only extract and import the table definition and not any data. • Data is cleaned up when the session ends gracefully or not. • Can speed up long running queries as they act like in-memory arrays. • Can prevent poor execution plans for complex SQL statements by breaking the process into smaller pieces but without the redo/undo overhead of normal table and are particularly useful for anti-joins.
External Tables • SQL based interface for SQL*Loader • Have a persistent data dictionary definition with much of the same information that is contained within a SQL*Loader control file and column definitions. • Can be used for manipulating, aggregating, cleansing and filtering data prior without the redo/undo overhead prior to being inserted into normal tables. • Cannot have a storage clause as the data is stored in a text file on operating system. • Cannot have indexes and constraints. • Data cannot be inserted, deleted or updated.
Items Learned in this Session • SQL has many powerful and under-utilized functions and features. • Tools, Knowledge and Imagination are required for creative problem solving. • Use of Case statements and Analytical functions can convert most painful procedural code into super fast efficient SQL statements.
Favorite Quotes “I have no particular talent. I am merely inquisitive.” “Imagination is more important than knowledge.” “It’s not that I am smart, it’s just that I stay with problems longer.” --Albert Einstein
Questions? • Questions? Riddles? Comments?
Thank You • Please complete your evaluation forms • Kenneth Naim • Powerful and Efficient SQL Features Session #303 • Email incase of additional questions kennaim@gmail.com