340 likes | 461 Views
Section 5 - Grouping Data. The GROUP BY clause allows the grouping of data Aggregate functions are most often used with the GROUP BY clause GROUP BY divides a table into sets, then Aggregate functions return summary values for those sets. GROUP BY Syntax.
E N D
Section 5 - Grouping Data • The GROUP BY clause allows the grouping of data • Aggregate functions are most often used with the GROUP BY clause • GROUP BY divides a table into sets, then Aggregate functions return summary values for those sets.
GROUP BY Syntax • SELECT select_listFROM table_list[WHERE conditions]GROUP BY group_by_list;
Example • SELECT pub_id, COUNT(title)FROM titlesGROUP BY pub_id; • All items in the Select list that are not in the Group By list must generate a single value for each group
Groups within Groups • You may nest Groups within other groups by separating the columns with commas • Example:SELECT pub_id, type, COUNT(type)FROM titlesGROUP BY pub_id, type;
Restrictions • Again: Each item in the SELECT list must produce a single value • Wrong:SELECT pub_id, type, COUNT(type)FROM titlesGROUP BY pub_id;
More Restrictions • You can NOT use expressions in the GROUP BY clauseWrong:SELECT pub_id, SUM(price)FROM titlesGROUP BY pub_id, SUM(price);
No Column Numbers • Unlike the ORDER BY clause, you cannot use the column select list position number in the GROUP BY clauseWrong:SELECT pub_id, SUM(price)FROM titlesGROUP BY 1;
Multiple Summaries • To see the summary values for a publisher and for the type of books within that publisher you will need two SELECT statements • SELECT pub_id, SUM(price)FROM titlesGROUP BY pub_id; • SELECT pub_id, type, SUM(price)FROM titlesGROUP BY pub_id, type;
Exercise • Display a list of the authors and the state they live in. Sort the list by the author’s last name within state
Discussion • SELECT au_lname, au_fname, stateFROM authorsORDER BY state, au_lname;We don't need a Group By for this statement because no summary information was asked for
Exercise • Display a list of states and the number of authors that are from each state. Also, show how many different cities are in each state. Sort in state order.
Discussion • SELECT state, count(*), count(distinct city)FROM authorsGROUP BY stateORDER BY state, au_lname;This gets us the number of authors per state and the number of distinct cities in each state.If we didn't use the DISTINCT keyword we would count all authors who lived in a city.
NULLs and GROUPS • NULLs never equal another NULL • BUT... GROUP BY will create a separate group for the NULLs • Think of it as a Group of Unknowns
Example • The Type column contains NULLsSELECT type, COUNT(*)FROM titlesGROUP BY type; • Returns count of 1, if we used a COUNT(type) instead of Count(*) we'd get back a zero instead.Why?
Discussion • Count(*) counts whole rows and there is 1 row of a NULL type group • Count(type) counts the non-NULL type columns in the NULL type group and there are zero non-NULL values in the NULL group.
More NULLs • More than one NULL in a column?SELECT advance, COUNT(*)FROM titlesGROUP BY advance; • Two books have a NULL advance and they are grouped. [Note: zero is different group]
GROUP BY with WHERE • The WHERE clause allows grouping of a subset of rows. The WHERE clause acts first to find the rows you want • Then the GROUP BY clause divides the rows into groupsSELECT type, AVG(price)FROM titlesWHERE advance > 5000GROUP BY type;
No WHERE • Same statement, no WHERESELECT type, AVG(price)FROM titlesGROUP BY type; • NULL group returned[In the previous example, the WHERE clause eliminated the NULLs]
ORDER the GROUPS • GROUP BY puts rows into sets, but doesn't put them in order.SELECT type, AVG(price)FROM titlesWHERE advance > 5000GROUP BY typeORDER BY 2;
Exercise • Show the average position that an author appears on a book if the author has a royalty share less than 100%. Also, show the number of books written by the author. List the author using his social security number and sort by social security number within number of books order. Show the authors with the most number of books first.
Discussion • SELECT au_id, AVG(au_ord), COUNT(title_id)FROM titleauthorsWHERE royaltyshare < 1.0GROUP BY au_idORDER BY 3 DESC, au_id;
HAVING Clause • HAVING is like a WHERE clause for a GROUP • WHERE limits rows • HAVING limits GROUPs
HAVING Syntax • SELECT select_listFROM table_list[WHERE conditions]GROUP BY group_list[HAVING conditions];
HAVING Aggregates • The WHERE conditions apply before Aggregates are calculated • Then the HAVING conditions apply after Aggregates are calculated
HAVING vs. WHERE • WHERE comes after the FROM • HAVING comes after the GROUP BY • WHERE conditions cannot include Aggregates • HAVING conditions almost always include Aggregates
Example • SELECT type, count(*)FROM titlesGROUP BY typeHAVING COUNT(*) > 1; • NOTE: Cannot use WHERE instead of HAVING since WHERE does not allow Aggregates
HAVING without Aggregates • Applies to grouping columnsSELECT typeFROM titlesGROUP BY typeHAVING type LIKE 'p%';You could of used the WHERE clause to find types that began with 'p', as well
Exercise • List the editor positions that have at least three editors
Answer • SELECT ed_pos, count(*)FROM editorsGROUP BY ed_posHAVING count(*) >= 3;
HAVING Conditions • You may use more than one condition on a HAVING clauseSELECT pub_id, SUM(advance), AVG(price)FROM titlesGROUP BY pub_idHAVING SUM(advance) > 15000AND AVG(price) < 20AND pub_id > '0800';
Exercise • List the publisher id and the average advance for each book that the publisher sells and the total number of books they sell, but only if the total cost of all the books they sell (that are priced more than $10.00) is more than eighty dollars and they sell more than one book. Sort by pub_id and book count.
Discussion • SELECT pub_id, AVG(advance), COUNT(*)FROM titlesWHERE price > 10GROUP BY pub_idHAVING SUM(price) > 80AND Count(*) > 1ORDER BY 1, 3;
Discussion • The WHERE clause first eliminates all books that do not cost more than $10 • Then the GROUP BY forms the pub_id groups • Then the HAVING clause eliminates any groups whose total cost ( sum(price) ) is not greater than $80 and any pub_id that has not sold more than one book.
Section 5 - Last Slide • Please complete Assignment 4