190 likes | 305 Views
Relational Databases. Grouping & aggregation of data. So far we have looked at retrieving data from multiple tables and the different ways to join the relations/tables that are required. Pulling back lists of data may not be useful if there are large quantities of data retrieve
E N D
Relational Databases Grouping & aggregation of data
So far we have looked at retrieving data from multiple tables and the different ways to join the relations/tables that are required. Pulling back lists of data may not be useful if there are large quantities of data retrieve Count(*) has been used to retrieve the number of occurrences Aggregation is often required to convert data into information introduction
It is very useful if a DBMS can do some calculations and manipulations of data as it is retrieved from the DB. • There are 2 possible kinds of calculations or manipulations : • Those that take individual data values and return individual data values (i.e. scalar calculations). • Those that take a group of data values and return an individual data value (i.e. an aggregate calculation) • The purpose ofGroupBy is to allow aggregate calculations to be carried out. (Also called summaries). These include things like the count(*) which is used in many db systems and already seen in the seminars Purpose of ‘GroupBy’
Example of ‘GroupBy’ If the data is grouped by student then it is possible to calculate the average mark for the student 9298889 and 6655899 both have 2 marks in the system. Clustering the data a allows for sums To be calculated
The operand is considered to consist of 3 elements 1. Grouping attribute(s) : attributes used in order to split the operand up into groups of rows 2. Aggregate attribute(s) : attributes whose values are aggregated or summarised. 3. Irrelevant attribute(s) : attributes not used by the GroupBy operator. (these are used to filter the data but are not displayed The workings of a ‘GroupBy’ Cluster the data in defined groups One new summaryresult attributecreated peraggregate cluster It/they do notappear(s) inthe result.
Illustration of ‘GroupBy’ Procedure Filter attributes Group data Pull joined Data and aggregate data
Grouping in SQL Retrieve the average mark for each major: The major is held in the student table, the marks are held in the marks table so joining both tables will allow access to all required data, using an outer join ensures that all majors and students are considered. SQL> select major, avg(mark) 2 from student natural left join marks; select major, avg(mark) * ERROR at line 1: ORA-00937: not a single-group group function Why won’t this work?
SQL continued SQL> select major, trunc(avg(mark)) 2 from student natural join marks 3 group by major; MAJOR TRUNC(AVG(MARK)) ---------------------------------------- MInfSci 65 Comp Sci 59 Games 54 Comp Eng 77 ITMB 54 SQL> select major, avg(mark) 2 from student natural join marks 3 group by major; MAJOR AVG(MARK) ---------- ---------- MInfSci 65 Comp Sci59.7857143 Games 54.7054264 Comp Eng 77.2142857 ITMB 54.4444444 Using trunc will format Data and remove dp’s
Forthe attribute values in a group : • Sum adds them together; • Minfinds their minimum; • Maxfinds their maximum; • Avggets the average. Count counts up the number of tuples/rows in a groupor counts up the number of attribute values in a group. • Some SQL DBMSs feature additional aggregate functions, such as Stdev (= standard deviation) and Variance. Standard SQL Aggregate Functions
Executing an SQL ‘Select’ Statement The phrases are executed in the following order :- Joins / Cartesian Products done here. From Restrictions done here. Where Grouping done here. Group By Order By Sequencing done here. Select Projections done here.
Data stored in the database can be manipulated using a series of functions that are permitted in SQL extensions such as SQL plus and PL/SQL. Use of these functions mean that the data can be formatted into a more readable manner for the requirements SQL functions
We have covered data fields in the being of the course, while looking at attribute types, we can now do more detail. • The default display & input format for a date in Oracle is: DD-MON-YY, e.g. 29-NOV-04 • Internally Oracle stores dates in a numeric format as century, year, month, day, hours, minutes, seconds, e.g. • CENTURY YEAR MONTH DAY HOUR MINUTE SECOND • 20 04 11 29 10 30 28 • Holding date in such detail means that various calculations can use it yet the user may only see a top-level view • Transactions in a bank may need to look at the second a transaction occurs ye t the customers statement only needs to have the date. Formatting dates
A useful function in SQL is SYSDATE, which returns the current date and time determined by the server date. To find the current date using SQL*Plus type: SQL> SELECT sysdate FROM dual; SYSDATE --------- 29-NOV-04 Note: Dual is a dummy table which contains a single meaningless dummy value. Sysdate
Basic arithmetic that can be done with dates: • Add a number to or subtract a number from a date SYSDATE + 10 SYSDATE - 10 • Subtract two dates: finds the number of days between dates SYSDATE - (SYSDATE - 20) • Add hours by dividing the number of hours by 24 SYSDATE + (10/24) Using dates in calucations
Converting data types Data Type Conversion Implicit Explicit FromToFunctions number varchar2 TO_CHAR date varchar2 TO_CHAR varchar2/char date TO_DATE varchar2/char number TO_NUMBER
Converting with dates Convert a date to a character string: TO_CHAR(<date>, ‘<format_model>’) Examples of formats: • YYYY Four digit year • YEAR Year spelled out • MM Two digit month • MONTH Month spelled out • MON Three letter abbreviation of month • DD Two digit day of the month • DAY Day spelled out • TO_CHAR(SYSDATE, ‘DAY, DD MONTH YYYY’) • Monday , 29 November 2004
You may want to merge data together so that it is displayed as one element || is the SQL command for concatenation SQL> select 'The average mark for ' || major || ' is ' || trunc(avg(mark)) 2 from student natural join marks 3 group by major; ------------------------------------------------------------ The average mark for MInfSci is 65 The average mark for Comp Sci is 59 The average mark for Games is 54 The average mark for Comp Eng is 77 The average mark for ITMB is 54 Merging data and strings You will need this for your seminar tasks!