1.03k likes | 1.24k Views
Analyzing Your Data with Analytic Functions. Carl Dudley University of Wolverhampton, UK UKOUG Council Oracle ACE Director carl.dudley@wlv.ac.uk. Introduction. Working with Oracle since 1986 Oracle DBA - OCP Oracle7, 8, 9, 10 Oracle DBA of the Year – 2002 Oracle ACE Director
E N D
Analyzing Your Data with Analytic Functions Carl Dudley University of Wolverhampton, UK UKOUG Council Oracle ACE Director carl.dudley@wlv.ac.uk
Introduction Working with Oracle since 1986 Oracle DBA - OCP Oracle7, 8, 9, 10 Oracle DBA of the Year – 2002 Oracle ACE Director Regular Presenter at Oracle Conferences Consultant and Trainer Technical Editor for a number of Oracle texts UK Oracle User Group Council Member of IOUC Day job – University of Wolverhampton, UK
Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance Analyzing Your Data with Analytic Functions
Analytic Functions New set of functions introduced in Oracle 8.1.6 Analytic functions or Window functions Intended for OLAP (OnLine Analytic Processing) or data warehouse purposes Provide functionality that would require complex conventional SQL programming or other tools Advantages Improved performance The optimizer “understands” the purpose of the query Reduced dependency on report generators and client tools Simpler coding
Analytic Function Categories The analytic functions fall into four categories Ranking functions Aggregate functions Row comparison functions Statistical functions The Oracle documentation describes all of the functions Processed as the last step before ORDER BY Work on the result set of the query Can operate on an intermediate ordering of the rows Actions can be based on : Partitions of the result set A sliding window of rows in the result set
Processing Sequence There may be several intermediate sort steps if required Analytic process HAVING evaluation WHERE evaluation Intermediate ordering GROUPING Rows Analytic function Final ORDER BY Output
The Analytic Clause Syntax : <function>(<arguments>) OVER(<analytic clause>) The enclosing parentheses are required even if there are no arguments RANK() OVER (ORDER BY sal DESC)
Sequence of Processing Being processed just before the final ORDER BY means : Analytic functions are not allowed in WHERE and HAVING conditions Allowed only in the final ORDER BY clause Ordering the final result set OVER clause specifies sort order of result set before analytic function is computed Can have multiple analytic functions with different OVER clauses, requiring multiple intermediate sorts Final ordering does not have to match ordering in OVER clause
Analytic Functions Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance The emp and dept Tables DEPTNO DNAME LOC ------ -------------- -------- 10 ACCOUNTING NEW YORK 20 RESEARCH DALLAS 30 SALES CHICAGO 40 OPERATIONS BOSTON emp EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO ----- ------- --------- ----- ----------- ----- ----- ------ 7934 MILLER CLERK 7782 23-JAN-1982 1300 10 7782 CLARK MANAGER 7839 09-JUN-1981 2450 10 7839 KING PRESIDENT 17-NOV-1981 5000 10 7369 SMITH CLERK 7902 17-DEC-1980 800 20 7876 ADAMS CLERK 7788 12-JAN-1983 1100 20 7566 JONES MANAGER 7839 02-APR-1981 2975 20 7902 FORD ANALYST 7566 03-DEC-1981 3000 20 7788 SCOTT ANALYST 7566 09-DEC-1982 3000 20 7900 JAMES CLERK 7698 03-DEC-1981 950 30 7521 WARD SALESMAN 7698 22-FEB-1981 1250 500 30 7654 MARTIN SALESMAN 7698 28-SEP-1981 1250 1400 30 7844 TURNER SALESMAN 7698 08-SEP-1981 1500 0 30 7499 ALLEN SALESMAN 7698 20-FEB-1981 1600 300 30 7698 BLAKE MANAGER 7839 01-MAY-1981 2850 30 dept
Example of Ranking Ranking with ROW_NUMBER No handling of ties Rows retrieved by the query are intermediately sorted on descending salary for the analysis ROWNUMBER SAL ENAME --------- ---- ----- 1 5000 KING 2 3000 SCOTT 3 3000 FORD 4 2975 JONES 5 2850 BLAKE 6 2450 CLARK 7 1600 ALLEN 8 1500 TURNER 9 1300 MILLER 10 1250 WARD 11 1250 MARTIN 12 1100 ADAMS 13 950 JAMES 14 800 SMITH SELECT ROW_NUMBER() OVER( ORDER BY sal DESC)rownumber ,sal ,ename FROM emp ORDER BY sal DESC; • If the final ORDER BY specifies the same sort order as the OVER clause only one sort is required • ROW_NUMBER is different from ROWNUM
Different Sort Order in Final ORDER BY If the OVER clause sort is different from the final ORDER BY An extra sort step is required SELECT ROW_NUMBER() OVER( ORDER BY sal DESC) rownumber ,sal ,ename FROM emp ORDER BY ename; ROWNUMBER SAL ENAME --------- ---- ------ 12 1100 ADAMS 7 1600 ALLEN 5 2850 BLAKE 6 2450 CLARK 3 3000 FORD 13 950 JAMES 4 2975 JONES 1 5000 KING 11 1250 MARTIN 9 1300 MILLER 2 3000 SCOTT 14 800 SMITH 8 1500 TURNER 10 1250 WARD
Multiple Functions With Different Sort Order Multiple OVER clauses can be used SELECT ROW_NUMBER() OVER(ORDER BY sal DESC)sal_n ,sal ,ROW_NUMBER() OVER(ORDER BY comm DESC NULLS LAST) comm_n ,comm ,ename FROM emp ORDER BY ename;
RANK and DENSE_RANK ROW_NUMBER increases even if several rows have identical values Does not handle ties RANK and DENSE_RANK handle ties Rows with the same value are given the same rank After the tie value, RANK skips numbers, DENSE_RANK does not Ranking using analytic functions has better performance, because the table is not read repeatedly
RANK and DENSE_RANK (continued) SELECT ROW_NUMBER() OVER(ORDER BY sal DESC) rownumber ,RANK() OVER(ORDER BY sal DESC) rank ,DENSE_RANK() OVER(ORDER BY sal DESC) denserank ,sal ,ename FROM emp ORDER BY sal DESC,ename; ROWNUMBER RANK DENSERANK SAL ENAME --------- ---- ---------- ----- ------ 1 1 1 5000 KING 2 2 2 3000 FORD 3 2 2 3000 SCOTT 4 4 3 2975 JONES 5 5 4 2850 BLAKE 6 6 5 2450 CLARK 7 7 6 1600 ALLEN 8 8 7 1500 TURNER 9 9 8 1300 MILLER 10 10 9 1250 MARTIN 11 10 9 1250 WARD 12 12 10 1100 ADAMS 13 13 11 950 JAMES 14 14 12 800 SMITH Multiple OVER clauses may be used specifying different orderings
Analytic Function in ORDER BY Analytic functions are computed before the final ordering Can be referenced in the final ORDER BY clause An alias is used in this case SELECT RANK() OVER( ORDER BY sal DESC) sal_rank ,sal ,ename FROM emp ORDER BY sal_rank ,ename; SAL_RANK SAL ENAME -------- ---- ------ 1 5000 KING 2 3000 FORD 2 3000 SCOTT 4 2975 JONES 5 2850 BLAKE 6 2450 CLARK 7 1600 ALLEN 8 1500 TURNER 9 1300 MILLER 10 1250 MARTIN 10 1250 WARD 12 1100 ADAMS 13 950 JAMES 14 800 SMITH
WHERE Conditions Analytic (window) functions are computed after the WHERE condition and hence not available in the WHERE clause SELECT RANK() OVER(ORDER BY sal DESC) rank ,sal ,ename FROM emp WHERE RANK() OVER(ORDER BY sal DESC) <= 5 ORDER BY rank WHERE RANK() OVER(ORDER BY sal DESC) <= 5 * ERROR at line 5: ORA-30483: window functions are not allowed here
WHERE Conditions (continued) Use an inline view to force the early processing of the analytic SELECT * FROM (SELECT RANK() OVER(ORDER BY sal DESC) rank ,sal ,ename FROM emp) WHERE rank <= 5 ORDER BY rank ,ename; RANK SAL ENAME ---------- ---------- ---------- 1 5000 KING 2 3000 FORD 2 3000 SCOTT 4 2975 JONES 5 2850 BLAKE • Inline view is processed before the WHERE clause
Grouping, Aggregate Functions and Analytics Rank the departments by number of employees SELECT deptno ,COUNT(*) employees ,RANK() OVER(ORDER BY COUNT(*) DESC) rank FROM emp GROUP BY deptno ORDER BY employees ,deptno; DEPTNO EMPLOYEES RANK ------ ---------- --------- 10 3 3 20 5 2 30 6 1 • Analytic functions are illegal in the HAVING clause • The workaround is the same; use an inline view • Ordering subclause may not reference a column alias
Analytic Functions Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance
Partitioning Analytic functions can be applied to logical groups within the result set rather than the full result set Partitions PARTITION BY specifies the grouping ORDER BY specifies the ordering within each group Not connected with database table partitioning If partitioning is not specified, the full result set behaves as one partition NULL values are grouped together in one partition, as in GROUP BY Can have multiple analytic functions with different partitioning subclauses ... OVER(PARTITION BY mgr ORDER BY sal DESC)
Partitioning Example Rank employees by salary within their manager SELECT ename ,mgr ,sal ,RANK() OVER(PARTITION BY mgr ORDER BY sal DESC) m_rank FROM emp ORDER BY mgr ,m_rank; ENAME MGR SAL M_RANK ---------- ---------- ---------- ---------- SCOTT 7566 3000 1 FORD 7566 3000 1 ALLEN 7698 1600 1 TURNER 7698 1500 2 WARD 7698 1250 3 MARTIN 7698 1250 3 JAMES 7698 950 5 MILLER 7782 1300 1 ADAMS 7788 1100 1 JONES 7839 2975 1 BLAKE 7839 2850 2 CLARK 7839 2450 3 SMITH 7902 800 1 KING 5000 1
Result Sets With Different Partitioning Rank the employees by salary within their manager, within the year they were hired, as well as overall SELECT ename ,sal ,manager ,RANK() OVER(PARTITION BY mgr ORDER BY sal DESC) m_rank ,TRUNC(TO_NUMBER(TO_CHAR(date_hired,'YYYY'))) year_hired ,RANK() OVER(PARTITION BY TRUNC(TO_NUMBER(TO_CHAR(date_hired,'YYYY')) ORDER BY sal DESC) d_rank ,RANK() OVER(ORDER BY sal DESC) rank FROM emp ORDER BY rank ,ename;
Result Sets With Different Partitioning (continued) ENAME SAL MGR M_RANK YEAR_HIRED D_RANK RANK ------- ---- ---- ---------- ---------- ---------- ---------- KING 5000 1 1981 1 1 FORD 3000 7566 1 1981 2 2 SCOTT 3000 7566 1 1987 1 2 JONES 2975 7839 1 1981 3 4 BLAKE 2850 7839 2 1981 4 5 CLARK 2450 7839 3 1981 5 6 ALLEN 1600 7698 1 1981 6 7 TURNER 1500 7698 2 1981 7 8 MILLER 1300 7782 1 1982 1 9 MARTIN 1250 7698 3 1981 8 10 WARD 1250 7698 3 1981 8 10 ADAMS 1100 7788 1 1987 2 12 JAMES 950 7698 5 1981 10 13 SMITH 800 7902 1 1980 1 14
Hypothetical Rank Rank a specified hypothetical value (2999) in a group ('what-if' query) SELECT RANK(2999) WITHIN GROUP (ORDER BY sal DESC) H_S_rank ,PERCENT_RANK(2999) WITHIN GROUP (ORDER BY sal DESC) PR ,CUME_DIST(2999) WITHIN GROUP (ORDER BY sal DESC) CD FROM emp; H_S_RANK PR CD -------- ---------- ---------- 4 .214285714.266666667 4/15 3/14 SELECT deptno ,RANK(20,'CLERK') WITHIN GROUP (ORDER BY deptno DESC,job ASC) H_D_J_rank FROM emp GROUP BY deptno; DEPTNO H_D_J_RANK ------ ---------- 10 1 20 3 30 7 A clerk in 20 would be higher than anyone in 10 A clerk would be third in ascending job order in department 20 (below analysts) A clerk in 20 would be lower than anyone in 30 (6 employees)
Frequent Itemsets (dbms_frequent_itemset) Typical question When a customer buys product x, how likely are they to also buy product y? Minimum fraction of different 'Documentation' customers having this combination mimimum items in set include items Number of Different customers maximum items in set exclude items 2 or 3 items per set Number of instances SELECT CAST(itemset AS fi_char) itemset ,support ,length ,total_tranx FROM TABLE(DBMS_FREQUENT_ITEMSET.FI_TRANSACTIONAL( CURSOR(SELECT TO_CHAR(sales.cust_id) ,TO_CHAR(sales.prod_id) FROM sh.sales ,sh.products WHERE products.prod_id = sales.prod_id AND products.prod_subcategory = 'Documentation'), 0.5, 2, 3, NULL, NULL)); ITEMSET SUPPORT LENGTH TOTAL_TRANX -------------------------------------- --------- ---------- ----------- FI_CHAR('40', '41') 3692 2 6077 FI_CHAR('40', '42') 3900 2 6077 FI_CHAR('40', '45') 3482 2 6077 FI_CHAR('41', '42') 3163 2 6077 FI_CHAR('40', '41', '42') 3141 3 6077
Frequent Itemsets (continued) Need to create type to accommodate the set Ranking functions can be applied to the itemset • The total transactions (TOTAL_TRANX) is the number of different customers involved with any product within the set of products under examination SELECT COUNT(DISTINCT cust_id) FROM sales WHERE prod_id BETWEEN 40 AND 45; COUNT(DISTINCTCUST_ID) ---------------------- 6077 prod_ids for 'Documentation' ,CURSOR(SELECT * FROM table(fi_char(40,45))) ,CURSOR(SELECT * FROM table(fi_char(42))) Include any sets involving 40 or 45 Exclude any sets involving 42 CREATE TYPE fi_char AS TABLE OF VARCHAR2(100); • Itemsets containing certain items can be included/excluded • Ranking functions can be applied to the itemset
Plan of Itemset Query Only one full table scan of sales -------------------------------------------------------------------------------- |Id | Operation | Name |Rows | -------------------------------------------------------------------------------- | 0| SELECT STATEMENT | | 8| | 1| FIC RECURSIVE ITERATION | | | | 2| FIC LOAD ITEMSETS | | | | 3| FREQUENT ITEMSET COUNTING | | 8| | 4| SORT GROUP BY NOSORT | | | | 5| BITMAP CONVERSION COUNT | | | | 6| FIC LOAD BITMAPS | | | | 7| SORT CREATE INDEX | | 500| | 8| BITMAP CONSTRUCTION | | | | 9| FIC ENUMERATE FEED | | | | 10| SORT ORDER BY | |43755| |*11| HASH JOIN | |43755| | 12| TABLE ACCESS BY INDEX ROWID| PRODUCTS | 3 | |*13| INDEX RANGE SCAN | PRODUCTS_PROD_SUBCAT_IX | 3 | | 14| PARTITION RANGE ALL | | 918K| | 15| TABLE ACCESS FULL | SALES | 918K| | 16| TABLE ACCESS FULL | SYS_TEMP_0FD9D6605_153B1EE| | --------------------------------------------------------------------------------
Applying Analytics to Frequent Itemsets SELECT itemset, support, length, total_tranx, rnk FROM (SELECT itemset, support, length, total_tranx ,RANK() OVER (PARTITION BY length ORDER BY support DESC) rnk FROM (SELECT CAST(ITEMSET AS fi_char) itemset ,support ,length ,total_tranx FROM TABLE(dbms_frequent_itemset.fi_transactional (CURSOR(SELECT TO_CHAR(sales.cust_id) ,TO_CHAR(sales.prod_id) FROM sh.sales ,sh.products WHERE products.prod_id = sales.prod_id AND products.prod_subcategory = 'Documentation') ,0.5 ,2 ,3 ,NULL ,NULL)))) WHERE rnk < 4; ITEMSET SUPPORT LENGTH TOTAL_TRANX RNK -------------------------------- ---------- ---------- ----------- ---------- FI_CHAR('40', '42') 3900 2 6077 1 FI_CHAR('40', '41') 3692 2 6077 2 FI_CHAR('40', '45') 3482 2 6077 3 FI_CHAR('40', '41', '42') 3141 3 6077 1
Analytic Functions Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance
Expanding Windows Partition (first) or entire result set OVER (ORDER BY col_name) ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW Window Default value for window setting - produces an expanding window Partition (second)
Sliding Windows Partition (first) or entire result set OVER (ORDER BY col_name) ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING Window 3 ROWS 5 ROWS Produces a sliding window Partition (second)
Aggregate Functions Aggregate functions can be used as analytic functions Must be embedded in the OVER clause Analytic aggregate values can be easily included within row-level reports Analytic functions are applied after computation of result set Optimizer often produces a better execution plan Aggregate level is determined by the partitioning subclause Similar effect to GROUP BY clause If no partitioning subclause, aggregate is across the complete result set
Aggregate Functions – the OVER Clause Could easily include row-level data e.g. ename and sal SELECT deptno ,AVG(sal) FROM emp GROUP BY deptno; DEPTNO AVG(SAL) ---------- ---------- 30 1566.66667 20 2175 10 2916.66667 SELECT deptno ,AVG(sal) OVER (PARTITION BY deptno) avg_dept ,AVG(sal) OVER () avg_all FROM emp; DEPTNO AVG_DEPT AVG_ALL ---------- ---------- ---------- 10 2916.66667 2073.21429 10 2916.66667 2073.21429 10 2916.66667 2073.21429 20 2175 2073.21429 20 2175 2073.21429 20 2175 2073.21429 20 2175 2073.21429 20 2175 2073.21429 30 1566.66667 2073.21429 30 1566.66667 2073.21429 30 1566.66667 2073.21429 30 1566.66667 2073.21429 30 1566.66667 2073.21429 30 1566.66667 2073.21429 No subclause Analytic aggregates cause no reduction in rows
Analytic versus Conventional SQL Performance Average sal per department • The requirement • Data at different levels of grouping ENAME SAL DEPTNO AVG_DEPT AVG_ALL ------ ---- ------ ---------- ---------- CLARK 2450 10 2916.66667 2073.21429 KING 5000 10 2916.66667 2073.21429 MILLER 1300 10 2916.66667 2073.21429 JONES 2975 20 2175 2073.21429 FORD 3000 20 2175 2073.21429 ADAMS 1100 20 2175 2073.21429 SMITH 800 20 2175 2073.21429 SCOTT 3000 20 2175 2073.21429 WARD 1250 30 1566.66667 2073.21429 TURNER 1500 30 1566.66667 2073.21429 ALLEN 1600 30 1566.66667 2073.21429 JAMES 950 30 1566.66667 2073.21429 BLAKE 2850 30 1566.66667 2073.21429 MARTIN 1250 30 1566.66667 2073.21429 Overall average sal
Conventional SQL Performance SELECT r.ename,r.sal,g.deptno,g.ave_dept,a.ave_all FROM emp r ,(SELECT deptno,AVG(sal) ave_dept FROM emp GROUP BY deptno) g ,(SELECT AVG(sal) ave_all FROM emp) a WHERE g.deptno = r.deptno ORDER BY r.deptno; ----------------------------------------------- | Id | Operation | Name | Rows | ----------------------------------------------- | 0 | SELECT STATEMENT | | 15 | | 1 | MERGE JOIN | | 15 | | 2 | SORT JOIN | | 3 | | 3 | NESTED LOOPS | | 3 | | 4 | VIEW | | 1 | | 5 | SORT AGGREGATE | | 1 | | 6 | TABLE ACCESS FULL| EMP | 14 | | 7 | VIEW | | 3 | | 8 | SORT GROUP BY | | 3 | | 9 | TABLE ACCESS FULL| EMP | 14 | |* 10 | SORT JOIN | | 14 | | 11 | TABLE ACCESS FULL | EMP | 14 | ----------------------------------------------- 1M row emp table : 48.35 seconds 230790 consistent gets
Analytic Function Performance SELECT ename,sal,deptno ,AVG(sal) OVER (PARTITION BY deptno) ave_dept ,AVG(sal) OVER () ave_all FROM emp; ------------------------------------------- | Id | Operation | Name | Rows | ------------------------------------------- | 0 | SELECT STATEMENT | | 14 | | 1 | WINDOW SORT | | 14 | | 2 | TABLE ACCESS FULL| EMP | 14 | ------------------------------------------- 1M row emp table : 21.20 seconds 76930 consistent gets
Aggregating Over an Ordered Set of Rows – Running Totals The ORDER BY clause creates an expanding window (running total) of rows SELECT empno ,ename ,sal ,SUM(sal) OVER(ORDER BY empno) run_total FROM emp5 ORDER BY empno; EMPNO ENAME SAL RUN_TOTAL ----- ------ ---- --------- 7369 SMITH 800 800 7499 ALLEN 1600 2400 7521 WARD 1250 3650 7566 JONES 2975 6625 7654 MARTIN 1250 7875 7698 BLAKE 2850 10725 7782 CLARK 2450 13175 7788 SCOTT 3000 16175 7839 KING 5000 21175 7844 TURNER 1500 22675 7876 ADAMS 1100 23775 7900 JAMES 950 24725 7902 FORD 3000 27725 7934 MILLER 1300 29025 : : : : ------------------------------- |Id| Operation | Name| ------------------------------- | 0| SELECT STATEMENT | | | 1| WINDOW SORT | | | 2| TABLE ACCESS FULL| EMP5| ------------------------------- emp table of 5000 rows 0.07 seconds 33 consistent gets No index necessary
Running Total With Conventional SQL (1) • Self-join solution SELECT e1.empno ,e1.sal ,SUM(e2.sal) FROM emp5 e1, emp5 e2 WHERE e2.empno <= e1.empno GROUP BY e1.empno, e1.sal ORDER BY e1.empno; 13.37 seconds 66 consistent gets ------------------------------------------------- | Id | Operation | Name | ------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT GROUP BY | | | 2 | MERGE JOIN | | | 3 | SORT JOIN | | | 4 | TABLE ACCESS BY INDEX ROWID| EMP5 | | 5 | INDEX FULL SCAN | PK_EMP5| |* 6 | SORT JOIN | | | 7 | TABLE ACCESS FULL | EMP5 | -------------------------------------------------
Running Total With Conventional SQL (2) • Subquery in SELECT list solution – column expression SELECT empno ,ename ,sal ,(SELECT SUM(sal) sumsal FROM emp5 WHERE empno <= b.empno) a FROM emp5 b ORDER BY empno; 4.62 seconds 97948 consistent gets ----------------------------------------------- | Id | Operation | Name | ----------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | SORT AGGREGATE | | | 2 | TABLE ACCESS BY INDEX ROWID| EMP5 | |* 3 | INDEX RANGE SCAN | PK_EMP5| | 4 | TABLE ACCESS BY INDEX ROWID | EMP5 | | 5 | INDEX FULL SCAN | PK_EMP5| -----------------------------------------------
Aggregate Functions With Partitioning Find average salary of employees within each manager Use PARTITION BY to specify the grouping SELECT ename, mgr, sal ,ROUND(AVG(sal) OVER(PARTITION BY mgr)) avgsal ,sal - ROUND(AVG(sal) OVER(PARTITION BY mgr)) diff FROM emp; ENAME MGR SAL AVGSAL DIFF ---------- ------- ---------- ---------- ---------- SCOTT 7566 3000 3000 0 FORD 7566 3000 3000 0 ALLEN 7698 1600 1310 290 WARD 7698 1250 1310 -60 JAMES 7698 950 1310 -360 TURNER 7698 1500 1310 190 MARTIN 7698 1250 1310 -60 MILLER 7782 1300 1300 0 ADAMS 7788 1100 1100 0 JONES 7839 2975 2758 217 CLARK 7839 2450 2758 -308 BLAKE 7839 2850 2758 92 SMITH 7902 800 800 0 KING 5000 5000 0
Analytics on Aggregates Analytics are processed last SELECT deptno ,SUM(sal) ,SUM(SUM(sal)) OVER () Totsal ,SUM(SUM(sal)) OVER (ORDER BY deptno) Runtot_deptno ,SUM(SUM(sal)) OVER (ORDER BY SUM(sal)) Runtot_sumsal FROM emp GROUP BY deptno ORDER BY deptno; DEPTNO SUM(SAL) TOTSAL RUNTOT_DEPTNO RUNTOT_SUMSAL ------ -------- ------ ------------- ------------- 10 8750 29025 8750 8750 20 10875 29025 19625 29025 30 9400 29025 29025 18150 + sum(20) + sum(30) + sum(20) + sum(30)
Aggregate Functions and the WHERE clause Analytic functions are applied after production of the complete result set Rows excluded by the WHERE clause are not included in the aggregate value Include only employees whose name starts with a ‘S’ or ‘M’ The average is now only for those rows starting with 'S' or 'M' SELECT ename ,sal ,ROUND(AVG(sal) OVER()) avgsal ,sal - ROUND(AVG(sal) OVER()) diff FROM emp WHERE ename LIKE 'S%' OR ename LIKE 'M%'; ENAME SAL AGSAL DIFF ------ ---- ----- ----- SMITH 800 1588 -788 MARTIN 1250 1588 338 SCOTT 3000 1588 1412 MILLER 1300 1588 -288
RATIO_TO_REPORT Each row’s fraction of total salary can easily be found when the total salary value is available Example: sal/SUM(sal) OVER() The function RATIO_TO_REPORT performs this calculation SELECT ename ,sal ,SUM(sal) OVER() sumsal ,sal/SUM(sal) OVER() ratio ,RATIO_TO_REPORT(sal) OVER() ratio_rep FROM emp;
RATIO_TO_REPORT (continued) The query on the previous slide gives this result ENAME SAL SUMSAL RATIO RATIO_REP ---------- ------- ---------- ---------- ---------- SMITH 800 29025 .027562446 .027562446 ALLEN 1600 29025 .055124892 .055124892 WARD 1250 29025 .043066322 .043066322 JONES 2975 29025 .102497847 .102497847 MARTIN 1250 29025 .043066322 .043066322 BLAKE 2850 29025 .098191214 .098191214 CLARK 2450 29025 .084409991 .084409991 SCOTT 3000 29025 .103359173 .103359173 KING 5000 29025 .172265289 .172265289 TURNER 1500 29025 .051679587 .051679587 ADAMS 1100 29025 .037898363 .037898363 JAMES 950 29025 .032730405 .032730405 FORD 3000 29025 .103359173 .103359173 MILLER 1300 29025 .044788975 .044788975
Analytic Functions Overview of Analytic Functions Ranking Functions Partitioning Aggregate Functions Sliding Windows Row Comparison Functions Analytic Function Performance
Sliding Windows The OVER clause can have a sliding window subclause Not permitted without ORDER BY subclause Specifies size of window (set of rows) to be processed by the analytic function Window defined relative to current row Slides through result set as different rows become current Size of window is governed by ROWS or RANGE ROWS physical offset, a number of rows relative to the current row RANGE logical offset, a value interval relative to value in current row Syntax for sliding window : BETWEEN <starting point> AND <ending point>
Sliding Windows Example For each employee, show the sum of the salaries of the preceding, current, and following employee (row) Window includes current row as well as the preceding and following ones Must have order subclause for “preceding” and “following” to be meaningful First row has no preceding row and last row has no following row SELECT ename ,sal ,SUM(sal) OVER(ORDER BY sal DESC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) sal_window FROM emp ORDER BY sal DESC ,ename;
Sliding Windows Example (continued) ENAME SAL SAL_WINDOW ---------- ---------- ---------- KING 5000 8000 FORD 3000 11000 SCOTT 3000 8975 JONES 2975 8825 BLAKE 2850 8275 CLARK 2450 6900 ALLEN 1600 5550 TURNER 1500 4400 MILLER 1300 4050 MARTIN 1250 3800 WARD 1250 3600 ADAMS 1100 3300 JAMES 950 2850 SMITH 800 1750 Calculation: =5000+3000 =5000+3000+3000 =3000+3000+2975 =3000+2975+2850 =2975+2850+2450 =2850+2450+1600 =2450+1600+1500 =1600+1500+1300 =1500+1300+1250 =1300+1250+1250 =1250+1250+1100 =1250+1100+950 =1100+950+800 =950+800
Partitioned Sliding Windows Partitioning can be used with sliding windows A sliding window does not span partitions SELECT ename ,job ,sal ,SUM(sal) OVER(PARTITION BY job ORDER BY sal DESC ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) sal_window FROM emp ORDER BY job ,sal DESC ,ename;
Partitioned Sliding Windows(continued) ENAME JOB SAL SAL_WINDOW ---------- --------- ---------- ---------- FORD ANALYST 3000 6000 SCOTT ANALYST 3000 6000 MILLER CLERK 1300 2400 ADAMS CLERK 1100 3350 JAMES CLERK 950 2850 SMITH CLERK 800 1750 JONES MANAGER 2975 5825 BLAKE MANAGER 2850 8275 CLARK MANAGER 2450 5300 KING PRESIDENT 5000 5000 ALLEN SALESMAN 1600 3100 TURNER SALESMAN 1500 4350 MARTIN SALESMAN 1250 4000 WARD SALESMAN 1250 2500 Calculation =3000+3000 =3000+3000 =1300+1100 =1300+1100+950 =1100+950+800 =950+800 =2975+2850 =2975+2850+2450 =2850+2450 =5000 =1600+1500 =1600+1500+1250 =1500+1250+1250 =1250+1250