210 likes | 332 Views
The Model Clause explained. Tony Hasler, UKOUG Birmingham 2012. Who is Tony Hasler?. Google Tony Hasler! My blog contains all the material from this presentation on the front page Additional blog entry (October) on the model clause. Health Warning about the model clause.
E N D
The Model Clause explained Tony Hasler, UKOUG Birmingham 2012 Tony Hasler, Anvil Computer Services Ltd.
Who is Tony Hasler? Google Tony Hasler! My blog contains all the material from this presentation on the front page Additional blog entry (October) on the model clause Tony Hasler http://tonyhasler.wordpress.com
Health Warning about the model clause • No support for Active Session History (ASH) whilst on the CPU • Limited support for SQL Performance Monitor • No session statistics related to modelling • No trace events for diagnosis (that I can find) • Performance is poor when data spills to disk • Introduced in 10g but no new features in 11g • The main example in chapter 22 (10g) or chapter 23 (11g) in the Data Warehousing Guide (calculating mortgage amortization) is incorrect and always has been (mortgage fact ‘Payment’ should be ‘PaymentAmt’) • Perhaps this is not a flagship component! • However, the model clause: • Enables you to do stuff that cannot otherwise be (easily) done in SQL • The parallelisation capabilities can be useful in some cases Tony Hasler http://tonyhasler.wordpress.com
What is in this talk and what is not • IN: • What the model clause is • What it is used for and what it isn’t • Syntax and semantics of the main features • Recommendations on how to use the model clause • A real life example • Not in due to time constraints: • An exhaustive list of all model clause features • This can be found in the Data Warehousing Guide Chapter 22 (10g) or Chapter 23 (11g) • No demos Tony Hasler http://tonyhasler.wordpress.com
What the model clause is and what is it used for Part of a query block meaning it can appear after any select keyword. Provides spreadsheet like capabilities inside SQL Can perform unimplemented SQL features, e.g. moving medians. => Just because something is inefficient to execute doesn’t mean you don’t need to do it! Iteration is possible, such as with ‘Feedback loops’ (see Example 4 Calculating Using Simultaneous Equations in the Data Warehousing guide) A series of calculations within the model can be sequenced manually or dependencies can be determined by the engine (only use when necessary, not because of laziness) Tony Hasler http://tonyhasler.wordpress.com
Model terms compared with Excel Rows/columns <-> Dimensions Values <-> Measures Formulas <-> Rules Worksheets <-> Partitions Tony Hasler http://tonyhasler.wordpress.com
Comparison of Model clause terms with Excel spreadsheet terms Tony Hasler http://tonyhasler.wordpress.com
Overview of a query block with no model clause Joins and outer join predicates Only scalar functions allowed Logically precedes Only scalar functions allowed WHERE clause: inner join predicates and selection predicates Analytic calculations Logically precedes GROUP BY clause Only scalar functions allowed Logically precedes HAVING clause Scalar and aggregate functions allowed Logically precedes SELECT list Scalar, aggregate, and analytic functions allowed Logically precedes DISTINCT Logically precedes ORDER BY clause Scalar, aggregate, and analytic clauses allowed Tony Hasler http://tonyhasler.wordpress.com
Overview of a query block with a model clause Analytic calculations Joins, join predicates, selection predicates and GROUP BY clause Only scalar functions allowed Logically precedes Scalar and aggregate functions allowed HAVING clause Logically precedes Model PARTITION, DIMENSION and MEASURES clauses Scalar, aggregate, and analytic functions allowed Logically precedes Model RULES clause Independent calculations Logically precedes Only scalar functions on model outputs allowed SELECT list Logically precedes DISTINCT Logically precedes Only scalar functions on model outputs allowed ORDER BY clause Tony Hasler http://tonyhasler.wordpress.com
Recommendations on use of the model clause Use factored subqueries to avoid mixing aggregate functions and analytic functions with the model clause Use only scalar values as inputs to the model clause Perform all calculations in the model clause so that… The final select list is just a simple list of identifiers Be careful using the model clause in a sub query as the CBO assumes that the cardinality of the output of a model clause is the same as the cardinality of the input. Tony Hasler http://tonyhasler.wordpress.com
A simplified version of the real world problem CREATETABLE stock_holdings ( customer_name VARCHAR2(100) ,stock_name VARCHAR2(100) ,business_date DATE ,VALUENUMBER ); Calculate the moving one year average (mean) valueand moving standard deviation, and zscore for each customer, stock, and business date. Tony Hasler http://tonyhasler.wordpress.com
Mathematical terms loosely explained The Average (mean) is a “typical value” The Standard Deviation is a “typical difference between a value and the mean” The Zscore is the number of standard deviations that a particular value deviates from the mean. In other words it is a measure of how unusual a value is. Tony Hasler http://tonyhasler.wordpress.com
Problem 1: Oracle doesn’t support moving intervals of months or years properly!!! WITH q1 AS(SELECTDATE'2012-01-01'+ROWNUM mydate FROM DUAL CONNECTBYLEVEL<=400) ,q2 AS(SELECT mydate ,COUNT(*) OVER( ORDERBY mydate RANGEBETWEENINTERVAL'1'YEARPRECEDINGANDCURRENTROW) cnt FROM q1) SELECT * FROM q2 WHERE mydate =DATE'2013-01-10'; Result: MYDATE CNT 10/01/2013 367 Tony Hasler http://tonyhasler.wordpress.com
Model clause solution (1) WITH q1 AS(SELECTDATE'2012-01-01'+ROWNUM mydate,0 cnt FROM DUAL CONNECTBYLEVEL<=400) SELECT mydate, cnt FROM q1 MODELRETURNUPDATEDROWS DIMENSIONBY(mydate) MEASURES(cnt) RULES (cnt [DATE'2013-01-10'] = COUNT(*) [mydate BETWEENADD_MONTHS(CV()+1,-12)ANDCV()]); Tony Hasler http://tonyhasler.wordpress.com
Model clause solution (2) CREATETABLE business_dates ( business_date PRIMARYKEYNOTNULL ,business_days_in_year NOTNULL ,first_day_in_year NOTNULL ,business_day_number NOTNULL ) ORGANIZATIONINDEX AS WITH q1 AS( SELECTDISTINCT business_date,0AS business_days_in_year, SYSDATEAS first_day_in_year,0AS business_day_number FROM stock_holdings ) SELECT business_date ,business_days_in_year ,first_day_in_year ,business_day_number FROM q1 MODEL DIMENSIONBY(business_date) MEASURES(business_days_in_year,first_day_in_year,business_day_number) RULES( first_day_in_year[ANY] =ADD_MONTHS(CV(business_date)+1,-12), business_days_in_year[ANY] = COUNT(*)[business_date BETWEEN first_day_in_year[CV()] ANDCV()], business_day_number[ANY] = ROW_NUMBER()OVER(ORDERBY BUSINESS_DATE)); Tony Hasler http://tonyhasler.wordpress.com
Execution plan for previous statement ------------------------------------------------------- | Id | Operation | Name | ------------------------------------------------------- | 0 | CREATE TABLE STATEMENT | | | 1 | LOAD AS SELECT | BUSINESS_DATES | | 2 | SQL MODEL ORDERED | | | 3 | VIEW | | | 4 | HASH UNIQUE | | | 5 | TABLE ACCESS FULL | STOCK_HOLDINGS | | 6 | WINDOW (IN SQL MODEL) SORT| | ------------------------------------------------------- Tony Hasler http://tonyhasler.wordpress.com
Problem 2: avoiding data densification Oracle recommends the use of partitioned outer joins for data densification to simplify analytic functions. The model clause can also be used for data densification In my case, however, this would have multiplied the number of rows enormously as customers tended to hold particular stocks for about 10% of the time. We can use the model clause to avoid this by performing multiple calculations in sequence Tony Hasler http://tonyhasler.wordpress.com
Model clause solution SELECT customer_name,stock_name,business_date,first_day_in_year,VALUE,mov_avg,mov_stdd,zscore FROM stock_holdings sh, business_dates bd WHERE sh.business_date=bd.business_date MODEL PARTITIONBY(customer_name,stock_name) DIMENSIONBY(sh.business_date) MEASURES(VALUE,business_days_in_year, first_day_in_year,0AS mov_avg, 0AS mov_stdd,0AS zscore,0AS mov_sum,0AS mov_cnt) RULES( mov_sum[ANY] =SUM(VALUE)[business_date BETWEEN first_day_in_year[CV()]ANDCV()], mov_cnt[ANY] =COUNT(*) [business_date BETWEEN first_day_in_year[CV()] ANDCV()], mov_avg[ANY] = mov_sum[CV()]/ business_days_in_year[CV()], mov_stdd[ANY] = SQRT(( (VAR_POP(VALUE)[business_date BETWEEN first_day_in_year[CV()] ANDCV()] *mov_cnt[CV()]) +(mov_sum[CV()]* (AVG(VALUE)[business_date BETWEEN first_day_in_year[CV()] ANDCV()] - mov_avg[CV()]))) /business_days_in_year[CV()] ), zscore[ANY] =DECODE(mov_stdd[CV()],0,0,(VALUE[CV()]-mov_avg[CV()])/mov_stdd[CV()]) ); Tony Hasler http://tonyhasler.wordpress.com
Parallel execution plan -------------------------------------------------------- | Id | Operation | Name | -------------------------------------------------------- | 0 | SELECT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | | 3 | BUFFER SORT | | | 4 | SQL MODEL ORDERED | | | 5 | PX RECEIVE | | | 6 | PX SEND HASH | :TQ10000 | | 7 | NESTED LOOPS | | | 8 | PX BLOCK ITERATOR | | | 9 | TABLE ACCESS FULL| STOCK_HOLDINGS | |* 10 | INDEX UNIQUE SCAN | SYS_IOT_TOP_78729 | -------------------------------------------------------- Tony Hasler http://tonyhasler.wordpress.com
Summary The model clause allows you to build your own analytical functions and/or your own analytical windows – amongst other things. The model clause also allows you to parallelise calculations and can be useful even if the calculations are supported without the model clause However, model clause aggregates will be slower than standard analytic functions Performance degrades rapidly when partitions spill to disk. Tony Hasler http://tonyhasler.wordpress.com
Questions Tony Hasler http://tonyhasler.wordpress.com