150 likes | 233 Views
Relational Databases. Relations and tables. Relations/Tables.
E N D
Relational Databases Relations and tables
Relations/Tables This weeks lecture will look at how database tables/relations work. Last week we identified how we could identify the possible tables and contents and this week we are taking this in more detail, looking at how a table/relation works (set theory) and then looking at how this set approach relates to the way data is clustered. Objectives of the Lecture :
The relational model that underpins the relational database is based on mathematical sets. We looked at tables already containing collections of related data (patient details, student records, programmesetc) Definition : “A set is a collection of things”. • A set has two main characteristics : • It has no sequence or structure. • It has no duplicates. Example : { 2, 7, 5, 4 } { 7, 5, 4, 2 } They are the same set because sets have no order. Example: { 2, 2, 7, 5, 4, 8, 8 } is not a set because it has duplicates. A database table/relation should hold data in an order where the value of the data is NOT dependent on the order it is retrieved in. Set Theory
Relation/table = Set of Tuples/records Table A and B are the same because even though the data is in a different order it is still the same data in each table. Table C is different because the data in the location column is different, this changes the data on a row by row comparison. A B C
Relation - No Duplicate Tuples • The data in the relation/table should NOT hold any duplicate, duplicated data can lead to errors in any calculation. • There are ways to ensure that each row of data is unique by having one or more columns/attributes in the table that is unique to each row (an extra column/attribute can be added if required) • student record number • NHS number • NI number • etc. • Alternative is to have an attribute/column that indicate multiple records for example a ‘quantity’ attribute
Consider the following .... The above data has duplicated records, we need to remove duplicates but we can’t just delete the records because we need to consider the nature of the data, if the data relates to the products a shop sells then we may be able to simply remove the duplicates because retaining a single record will still reflect that the store sells ripple bars but if the tables relates to stock then to remove the duplicates will change the data and make it invalid, the store will need to know they have 3 ripples in stock.
Solution .... The way we handle the duplicated data depends on the nature of the data and how we are using it. A database developer must understand the data they are working with and its use.
This section will be covered in more detail throughout the course as your understanding develops more complexity to this procedure (known as normalisation). At a basic level, converting the data gathered in analysis • Data elements are collected and ‘grouped’ • Look at related data, cluster things • Identify the data types that are needed and consider what size they should be • Varchar, numbers, dates, 50 characters etc • Do you need extra data to give meaning? • Analysis should indicate if the data currently held is sufficient or if more are needed • Does the data need to be given more granularity? • Name = Surname, forename, middle initial etc Converting data elements to relations/table
example A company currently takes orders over the phone or by sales staff, the orders are noted on paper order forms and these are used to raise picking notes, invoices etc. The company wants to put these orders into a database so that they can use the data to analyse sales and do some sales forecast and target marketing (send relevant promotions to interested customers rather than a blanket mail shot). The paper system records the following information: Order number order line invoice address delivery address customer name customer id cost of order delivery cost Vat discount code additional notes picking note id pallet number warehouse id checked by, product details product id unit cost Quantity line cost (these are repeated values for each item on order)
Example continued – group data Order number order line invoice address delivery address customer name customer id cost of order delivery cost Vat discount code additional notes picking note id pallet number warehouse id checked by product details product id unit cost Quantity line cost (these are repeated values for each item on order) Order customer name customer id delivery address product details product id unit cost Quantity line cost VAT discount code total cost invoice address Invoice customer name customer id product details product id unit cost Quantity line cost VAT discount code total cost invoice address The new elements come from different documents (invoice & order form for simplification) the source documents can be used to indicate possible relations/tables Note: that some data is duplicated cumbersome (do we need the customer name and number?)
Example continued – relate the data Order customer name, customer id, delivery address, product details, product id, unit cost, quantity, line cost, VAT, discount code, total cost, invoice address Invoice customer name, customer id, product details, product id, unit cost, quantity, line cost, VAT, discount code, total cost, invoice address Consider the data, instead of duplicating the data. We could include a reference to where the data is, if the invoice data has the order data on it then why not simply hold something that links to the order row? The order data has 2 elements, there is the information that relates to the order in general and there is data that is repeated, the order data therefore breaks down into the order header info and the order line detail. The above changes mean that we have all the data held but the structure of the data is changed from the paper systems to a more RELATIONAL format.
To relate the data we need to add a reference element • Breaking the data down gives a greater granularity • Consider renaming elements so it is clear which table they belong to (the reason for this is apparent in later elements) • The outcome of the relations/table could result in: • These are not definitive tables but you should see how they evolve. • Each of these relations/tables SHOULD have a unique element that prevents duplication of data (PRIMARY KEY) if a relation references another relation there needs to be a referencing element within the relation (FORIGN KEY (f)) Example .... OrderHeaderord_id ord_date ord_custidord_value ord_vat ord_deladdr OrderlineOL_ordrid (f) OL_ordline_no OL_prodOL_quantOL_unitprice Invoice In_vid inv_dateinv_orderid (f) inv_custid (f) inv_comments Customer ????
Example – looking at relationships An order has 1 or more order lines (you may order many different product) Each order will have one or more invoices associate with them (think about when you order something from Amazon, dispatched in a number of parcels) If you have added a customer relation then a customer can place many orders but each order is only associated with one customer customer Order Invoice Orderline
Regardless of the data that is currently in use within an organisation’s paper system, data held in a database should not be duplicated. Data should be re-organised so that duplications are removed, links between relations are identified so that the chain of data is clear and accessible. Additional elements that are not in the paper system will need to be added to make a computer based system The translation from paper systems to computer/database systems require extensive understanding of the way the data is to be used Each record in a relation should have a unique identifier Related relations must have a common attribute that is the same datatype and size but not necessarily name. overview
Students should read through the notes on BB and these slides • Write up the notes you have made so that you can understand them when you come to review and revise them • Ensure that you can follow the example gone through today and ask yourself the following questions: • Do you understand why the information that is held on the paper system CANNOT simply be transferred onto a database system • Why does each record have to be unique and how do you ensure it is unique? • Why do we break the data into smaller subsets (orderheader and orderline tables/relations) • Why do we ‘add’ extra attributes What to do next week?