Chapter 11 Group Functions (up to p.402)

Chapter 11Group Functions(up to p.402) Jason C. H. Chen, Ph.D. Professor of MIS School of Business Gonzaga University Spokane, WA 99258 USA chen@jepson.gonzaga.edu

Objectives • Differentiate between single-row and multiple-row functions • Use the SUM and AVG functions for numeric calculations • Use the COUNT function to return the number of records containing non-NULL values • Use COUNT(*) to include records containing NULL values • Use the MIN and MAX functions with nonnumeric fields

Objectives (continued) • Determine when to use the GROUP BY clause to group data • Identify when the HAVING clause should be used • List the order of precedence for evaluating WHERE, GROUP BY, and HAVING clauses • State the maximum depth for nesting group functions • Nest a group function inside of a single-row function

Objectives (continued) • Calculate the standard deviation and variance of a set of data, using the STDDEV and VARIANCE functions • Explain the concept of multidimensional analysis • Perform enhanced aggregation grouping with the GROUPING SETS, CUBE, and ROLLUP • Use composite columns and concatenated groupings in grouping operations

Refresh the Database • 1. Download chapter 11 files from Bb to c:\oradata\chapter11\ • 2. Run the following script file • Start c:\oradata\chapter11\JLDB_Build_11.sql

Group Functions • Return one result per group of rows processed • Are also called multiple-row and aggregate functions • All group functions ignore NULL values except COUNT(*) • Use DISTINCT to suppress duplicate values

Added Clauses Figure 11-1 SELECT statement syntax

SUM Function • Calculates total amount stored in a numeric column for a group of rows Figure 11-2 Using the SUM function to calculate order profit

AVG Function • Calculates the average of numeric values in a specified column Figure 11-4 Using the AVG function to calculate average profit

Figure 11-4 Using the AVG function to calculate average profit -- chapter 11, Figure 11-5(B); p.389 SELECT TO_CHAR(AVG( retail - cost), '$999.99') "Average Profit" FROM books WHERE category = 'COMPUTER';

COUNT Function • Two purposes • Count non-NULL values • Count total records, including those with NULL values

COUNT Function – Non-NULL Values • Include column name in argument to count number of occurrences Figure 11-9 Using the COUNT function with the DISTINCT option

COUNT Function – NULL Values • Include asterisk in argument to count number of rows Figure 11-11 Using the COUNT(*) function to include NULL values

MAX Function • Returns largest value Figure 11-13 Using the MAX function on numeric data

MIN Function • Returns the smallest value Figure 11-16 Using the MIN function on date data

Datatypes • The COUNT, MIN, and MAX functions can be used on values with character, numeric, and date datatypes

Grouping Data • GROUP BY clause • Used to group data • Must be used for any individual column in the SELECT clause with a group function • Cannot reference column aliases

GROUP BY Example Figure 11-18 Adding the GROUP BY clause

Common Error • A common error is missing a GROUP BY clause for nonaggregated columns in the SELECT clause Figure 11-17 Flawed query: Including both aggregate and nonaggregate columns requires a GROUP BY clause

It is an individual record (just “Profit” not “Highest Profit”) Figure 11-19 Inappropriate use of GROUP BY

Figure 11-20 Calculate the total amount due by each customer and order

Restricting Aggregated Output • HAVING clause serves as the WHERE clause for grouped data. • It is used to eliminate certain groups from further consideration. 4 records are eliminated Figure 11-21 Using a HAVING clause to restrict which groups are displayed

Restricting Aggregated Output (continued) • When included in the same SELECT statement, the clauses are evaluated in the order of: • WHERE • GROUP BY • HAVING

Restricting Aggregated Output (continued) Figure 11-22 Using the WHERE, GROUP BY, and HAVING clauses

Figure 11-23 Using a HAVING clause to restrict grouped output

Figure 11-24 Filtering correctly with the WHERE and HAVING clauses

It is quite inefficient and considered poor SQL programming practice. The statement must process ALL rows in the BOOKS table with the aggregated calculation and then eliminate categories. Figure 11-25 Filtering incorrectly with the HAVING clauses Figure 11-24 Filtering correctly with the WHERE and HAVING clauses

Nesting Functions • Inner function is resolved first • Maximum nesting depth: 2 Figure 11-26 Nesting group functions

Exercises • Practice all the examples in the text. • A Script file is available on the Bb (file name: ch11Queries.sql) • After completing all examples, do the HW. • In-class Exercise • #7 (p.424)

Homework - Hands-On Assignments Email me with one attachment (Oracle_ch11_Spool_Lname_Fname.) to: chen@jepson.gonzaga.edu with subject title of Bmis441-01_Oracle_ch11 (or Bmis441-02_Oracle_ch11) Read and Practice all examples on Chapters 11 • 1. Run the script files (in the folder \oradata\chapter11\): JLDB_Build_11.sql • 2. Read Oracle assignment and create a script file Oracle_ch11_Lname_Fname.sql for questions (ALL EVEN problems; pp.424-425) on “Hands-on Assignments”. Use appropriate COLUMN or other SQL commands to produce readable outputs (or your grade will be discounted –see a sample output on the Bb) • 3. Execute and test one problem at a time and make sure they are all running successfully. • 4. When you done, spool the script files (see next slide for spooling instructions) and email the file (Oracle_ch11_Spool_Lname_Fname.txt) to me by the midnight before the next class.

How to Spool your Script and Output Files After you tested the script file of Oracle_ch11_Lname_Fname.sql successfully, follow the instructions below to spool both script and output files: Step 0. Run the following script file from SQL*Plus (since you have created JLDB tables) • Start c:\oradata\chapter11\JLDB_Build_11.sql • 1. type the following on SQL> • Spool c:\oradata\Oracle_ch11_Spool_Lname_Fname.txt (make sure your name is entered) • 2. open Oracle_ch11_Lname_Fname.sql that you already tested • 3. copy and paste all the SQL commands (including all comments) to the SQL*PLUS • 4. type Spool Off on the SQL> The output should contain your personal information, all SQL commands and their solution on the .txt file and saved in C: drive (oradata\ folder) Email me with the spooled file (.txt) with attachment to: chen@jepson.gonzaga.edu with subject title of Bmis441-01_Oracle_ch11 (or Bmis441-02_Oracle_ch11)

Summary • The AVG, SUM, STDDEV, and VARIANCE functions are used only with numeric fields • The COUNT, MAX, and MIN functions can be applied to any datatype • The AVG, SUM, MAX, MIN, STDDEV, and VARIANCE functions all ignore NULL values • By default, the AVG, SUM, MAX, MIN, COUNT, STDDEV, and VARIANCE functions include duplicate values

Summary (continued) • The GROUP BY clause is used to divide table data into groups • If a SELECT clause contains both an individual field name and a group function, the field name must also be included in a GROUP BY clause • The HAVING clause is used to restrict groups in a group function • Group functions can be nested to a depth of only two. The inner function is always performed first, using the specified grouping. The results of the inner function are used as input for the outer function.

Summary (continued) • The STDDEV and VARIANCE functions are used to perform statistical analyses on a set of data • GROUPING SETS operations can be used to perform multiple GROUP BY aggregations with a single query • The CUBE extension of the GROUP BY calculates aggregations for all possible combinations or groupings of columns included • The ROLLUP extension of the GROUP BY calculates increasing levels of accumulated subtotals for the column list provided • Composite columns and concatenated groupings can be used in GROUPING SETS, CUBE, and ROLLUP operations • The GROUP_ID function helps eliminate duplicate grouping results

STDDEV Function Figure 11-27 Using the STDDEV function

VARIANCE Function • Determines data dispersion within a group Figure 11-28 Using the VARIANCE function

Enhanced Aggregation for Reporting • Oracle provides extensions to the GROUP BY clause, which allow both aggregation across multiple dimensions or the generation of increasing levels of subtotals with a single SELECT statement • A dimensionis a term used to describe any category used in analyzing data, such as time, geography, and product line • Each dimension could contain various levels of aggregation; for example, the time dimension may include aggregation by month, quarter, and year; the product dimension may include product type, sales, store, region and month.

REGION CUSTOMER • Figure Slicing a data cube

Excel Pivot Table Example Figure 11-30 A pivot table with two dimensions on a row

Excel Pivot Table Example (continued) Figure 11-31 A pivot table with one row and one column dimension

GROUPING SETS • The grouping sets expression is the component on which the other GROUP BY extensions, ROLLUP and CUBE, are built. • With this extension, you can use a single query statement to perform multiple GROUP BY clauses. • The single query in Figure 11-32 produces the average retail price for books in four groupings: 1) publisher (the Name column) and category, 2) category, 3) publisher, and 4) overall average. -- chapter 11, Figure 11-32; p.408 SELECT name, category, COUNT(isbn), TO_CHAR(AVG(retail), '99.99') "Avg Retail" FROM publisher JOIN books USING (pubid) WHERE pubid IN (2,3,5) GROUP BY GROUPING SETS (name, category, (name, category), ()) Try without GROUING SETS

Without GROUPING SETS -- chapter 11, Figure 11-32(a); p.408 SELECT name, category, COUNT(isbn), TO_CHAR(AVG(retail), '99.99') "Avg Retail" FROM publisher JOIN books USING (pubid) WHERE pubid IN (2,3,5) GROUP BY name, category;

Grouping Sets Figure 11-32 Using a GROUPING SETS expression in a GROUP BY clause

The CUBE Extension • The CUBE extension of GROUP BY instructs Oracle to perform aggregations for all possible combinations of the specified columns (e.g., two columns: Name (publisher name) and Category). • The outputs matches Figure 11-32. • If you need only a subset of the four aggregate levels calculated, you must use the GROUPING SETS expression because the CUBE extension always performs all aggregation levels. • Adding a GROUPING function to the CUBE extension (Fig. 11-35) to identify subtotal rows in the results (helpful in labeling sorting, and restricting output).

CUBE Figure 11-34 Using the CUBE extension of GROUP BY

Figure 11-35 The GROUPING function returns a 1 to identify subtotal rows

The ROLLUP Extension • The ROLLUP extension of GROUP BY calculates cumulative subtotals for the specified columns. • If multiple columns are indicated, subtotals are performed for each column in the argument list, except the one on the far right. A grand total is also calculated.

ROLLUP Figure 11-37 Using the ROLLUP extension of GROUP BY

Category is the column outside the ROLLUP and is considered the aggregate value. A subtotal is calculated for the aggregate value as well as for each unique value of the ROLLUP column in the aggregate value – i.e., by Category and each Name in each Category . Figure 11-38 Using a partial ROLLUP

Chapter 11 Group Functions (up to p.402)

Chapter 11 Group Functions (up to p.402)

Presentation Transcript

Chapter 11

Chapter 11

Chapter 11

Update on Stormwater Regulation

Chapter 11– Muscular Tissue

Chapter 11

Chapter 11

Chapter 11 Polynomial Functions

Group 402

Chapter 6: Functions

Chapter 12 Times of Change

Constraints Group functions

Chapter 16: Evolution of Populations

Chapter 11– Muscular Tissue

Group 402

Chapter 7

Chapter 11

Chapter 5 Functions

BUS 402 Week 11 Discussion

Chapter 11

Chapter 11

Chapter 11