Understanding Database Tables and Relations Through Set Theory

Relations/Tables This weeks lecture will look at how database tables/relations work. Last week we identified how we could identify the possible tables and contents and this week we are taking this in more detail, looking at how a table/relation works (set theory) and then looking at how this set approach relates to the way relational databases are structured. Objectives of the Lecture :

The relational model that underpins the relational database is based on mathematical sets. We looked at tables already containing collections of related data (patient details, student rows, programmes, etc) Definition : “A set is a collection of things”. • A set has two main characteristics : • It has no sequence or ordering. • It has no duplicates. Example : { 2, 7, 5, 4 }  { 7, 5, 4, 2 } They are the same set because sets have no order. Example: { 2, 2, 7, 5, 4, 8, 8 } is not a set because it has duplicates. A database table/relation should hold data in some order where the value of the data is NOT dependent on the order it is retrieved in. Set Theory

Relation/table = Set of Tuples/rows Table A and B are the same because even though the data is in a different order it is still the same data in each table. Table C is different because the data in the location column is different, this changes the data on a row by row comparison. A B C

Relation - No Duplicate Tuples • The data in the relation/table should NOT hold any duplicate, duplicated data can lead to errors in any calculation. • There are ways to ensure that each row of data is unique by having one column/attribute in the table that is unique to each row (an extra column/attribute can be added if required) • student row number • NHS number • NI number • etc. • Alternative is to have an attribute/column that indicate multiple rows for example a ‘quantity’ attribute

Consider the following .... The above data has duplicated rows, we need to remove duplicates but we can’t just delete the rows because we need to consider the nature of the data. If the data relates to the products a shop sells then we may be able to simply remove the duplicates because retaining a single row will still reflect that the store sells ripple bars but if the tables relates to stock then to remove the duplicates will change the data and make it invalid, the store will need to know they have 3 ripples in stock.

Solution .... The way we handle the duplicated data depends on the nature of the data and how we are using it. A database developer must understand the data they are working with and its use.

Converting data elements to relations/table • This section will be covered in more detail throughout the module as your understanding develops more complexity to this procedure (known as normalisation). At a basic level, converting the data gathered in analysis: • Data elements are collected and ‘grouped’ • Look at related data, cluster things • Identify the data types that are needed and consider what size they should be • Varchar, numbers, dates, 50 characters etc • Do you need extra data to give meaning? • Analysis should indicate if the data currently held is sufficient or if more are needed • Does the data need to be given more granularity? • Name = Surname, forename, middle initial etc

example A company currently takes orders over the phone or by sales staff, the orders are noted on paper order forms and these are used to raise picking notes, invoices etc. The company wants to put these orders into a database so that they can use the data to analyse sales and do some sales forecasting and target marketing (send relevant promotions to interested customers rather than a blanket mail shot). The paper system records the following information: Order number, order line, invoice address, delivery address, customer name, customer id, cost of order, delivery cost, vat, discount code, additional notes, picking note id, pallet number, warehouse id, checked by, product details, product id, unit cost, quantity, line cost, (these are repeated values for each item on order)

Example continued – group data Order number, order line, invoice address, delivery address, customer name, customer id, total cost, delivery cost, vat, discount code, additional notes, picking note id, pallet number, warehouse id, checked by, product details, product id, unit cost, quantity, line cost, (these are repeated values for each item on order) The elements above come from different documents (invoice & order form for simplification) the source documents can be used to indicate possible relations/tables Order customer name, customer id, delivery address, product details, product id, unit cost, quantity, line cost, VAT, discount code, total cost, invoice address Invoice customer name, customer id, product details, product id, unit cost, quantity, line cost, VAT, discount code, total cost, invoice address Note: that some data is duplicated and the data seems more cumbersome than necessary (do we need the customer name and number?)

Example continued – relate the data Order customer name, customer id, delivery address, product details, product id, unit cost, quantity, line cost, VAT, discount code, total cost, invoice address Invoice customer name, customer id, product details, product id, unit cost, quantity, line cost, VAT, discount code, total cost, invoice address Consider the data, instead of duplicating the data we could include a reference to where the data is, if the invoice data has the order data on it why not simply hold something that links to the order row? The order data has 2 elements, there is the information that relates to the order in general and there is data that is repeated, the order data therefore breaks down into the order header info and the order line detail The above changes mean that we have all the data held but the structure of the data is change from the paper systems to a more RELATIONAL format.

Example .... • To relate the data we need to add a reference element • Breaking the data down gives a greater granularity • Consider renaming elements so it is clear which table they belong to (the reason for this is apparent in later elements) • The outcome of the relations/table could result in: • OrderHeader(ord_id, ord_date, ord_custid, ord_value, ord_vat, ord_deladdr) • Orderline (OL_ordrid (f),OL_ordline_no, OL_prod, OL_quant, OL_unitprice) • Invoice (In_vid, inv_date, inv_orderid (f), inv_custid (f), inv_comments) • Customer ???? • These are not definitive tables but you should see how they evolve. • Each of these relations/tables SHOULD have a unique element that prevents duplication of data (PRIMARY KEY) if a relation references another relation there needs to be a referencing element within the relation (FORIGN KEY (f))

Example – looking at relationships An order has 1 or more order lines (you may order many different product) Each order will have one or more invoices associate with them (think about when you order something from Amazon, dispatched in a number of parcels) If you have added a customer relation then a customer can place many orders but each order is only associated with one cusotmer customer Order Invoice Orderline

overview Regardless of the data that is currently in use within an organisation’s paper system, data held in a database should not be duplicated. Data should be re-organised so that duplications are removed, links between relations are identified so that the chain of data is clear and accessible. Additional elements that are not in the paper system will need to be added to make a computer based system The translation from paper systems to computer/database systems require extensive understanding of the way the data is to be used Each row in a relation should have a unique identifier Related relations must have common attributes having the same data types and sizes but not necessarily same names.

Do for next week! • Students should read through the notes on BB and these slides • Write up the notes you have made so that you can understand them when you come to review and revise them • Ensure that you can follow the example gone through today and ask yourself the following questions: • Do you understand why the information that is held on the paper system CANNOT simply be transferred onto a database system • Why does each row have to be unique and how do you ensure it is unique? • Why do we break the data into smaller subsets (orderheader and orderline tables/relations) • Why do we ‘add’ extra attributes

Understanding Database Tables and Relations Through Set Theory