360 likes | 495 Views
Structure. What is a database? Table of information Rows are referred to as records Columns are referred to as fields Record identifier is referred to as a record key Example using an inventory file. Types of databases. Relational Most common
E N D
Structure • What is a database? • Table of information • Rows are referred to as records • Columns are referred to as fields • Record identifier is referred to as a record key • Example using an inventory file A337 - Reed Smith
Types of databases • Relational • Most common • Separate files that are related by one or more common pieces of information • Access programs are what combine the information • Hierarchical • Some files belong to other files. • UPC codes • Network • Object-Oriented A337 - Reed Smith
Relational Databases • Relations • How two files can be combined together • When we normalize files, we put information in smaller matrices that are more efficient. They are combined with relations • Example A337 - Reed Smith
Code Type Description Cost Price Units Quantity ROP Vendor Value on Hand Vendor Vendor Address CityStZip Terms Amount Owed Relations Inventory Master File Vendor File A337 - Reed Smith
Code Type Description Cost Price Units Quantity ROP Vendor Value on Hand Txn_No. Date Code Qty_Sold Relations Inventory Master File Sales Txn File A337 - Reed Smith
Relations A337 - Reed Smith
SQL (Structured Query Language) • The basic underlying language behind a DBMS is called a Structured Query Language and is abbreviated SQL. • Typically, we do not actually USE the SQL in most database applications such as ACCESS. Rather we use a psuedo-code variation of it. ACCESS then translates the psuedo-code version to SQL. A337 - Reed Smith
Database structure • Now, we will look at the structure of a database (a group of tables with fields and relations) • Two approaches to the structure issue: • Empirical (you already know what data there will be - you just want to organize it into tables) • Conceptual (you start with the question of “what information should I have?”) • First approach - NORMALIZATION A337 - Reed Smith
Normalization • We have briefly discussed the importance of file structure. • Now, we will formalize this discussion • A good resource for our discussion is chapter 3 of the Perry/Schneider text. A337 - Reed Smith
Normalization • Why? • Data structures need to: • Have a fixed structure • Minimize redundancy • Avoid insertion and deletion anomalies • How? • Restructure information such that: • Only flat (rectangular) files exist (1st normal form) • All items in each record depend upon (are identified by) the primary record key (2nd normal form) • If one field depends upon another then the “other” must be the primary record key (3rd normal form) • Examples A337 - Reed Smith
Normalize the following table: A337 - Reed Smith
What is wrong with this file? • The dimensions of the file are not defined • How many columns does it have???? • There is a lot of data redundancy • This is a problem for a couple of reasons A337 - Reed Smith
What is normalization • We will take the data in a file and redistribute it to other files. • We then put the data back together with a series of relations. • The primary tools will be: • Column choices • Primary key choices • Relations A337 - Reed Smith
Two ways to approach normalization • Conceptual approach • By understanding the relationships and the nature of all of the data columns, you can design the data structure so that the files are in the best possible “form” • Empirical approach • Given a dataset, you can draw inferences about the data based upon a “little” common sense and redundancies in the data • We will (for this class) tend towards this approach A337 - Reed Smith
Consider the following table: A337 - Reed Smith
First normal form • First normal form means that the file has a rectangular structure. • Anytime a column is “repeated” within a row (for any row), we must move that column data into a new row to eliminate the repeat. • This means we will have to have a composite primary key (a primary key with more than one column) A337 - Reed Smith
1NF: A337 - Reed Smith
So, now what is wrong? • Every item name (as an example) refers to a unique item ID. • In other words, the sales order number is not relevant in the determination of the item name • Similarly, the customer code and customer name do not depend upon the Item ID, they only depend upon the sales order number. • HOW DO WE KNOW THIS??? A337 - Reed Smith
So, what do we do? • We break this into (up to) three files. • The NON-KEY columns of one file will depend upon BOTH columns of the primary key. • The NON-KEY columns of the other file(s) will depend upon only one column of the file with a composite key. A337 - Reed Smith
2NF: A337 - Reed Smith
Are we done, yet??? • No, but almost - one thing left • Notice that the third column of the Sales Orders file has the Customer name and that depends upon the customer number. • The customer number is not the primary key to the file • Pull the customer number and customer name out and put them in a separate file with customer number as the primary key • PUT EVERYTHING BACK TOGETHER WITH RELATIONS A337 - Reed Smith
3NF: A337 - Reed Smith
Normalization Example 1 A337 - Reed Smith
1st Normal Form A337 - Reed Smith
Normalization:2NF and 3NF For normalizing tables, the “best” approach is to look at the data definitions in the data dictionary. Short of that, you will have to infer data characteristics from looking at the data and assuming that it tells you everything that you need to know. This is the approach that we will follow in this class - knowing that it is incomplete. The important characteristic that will tell you about the data when approaching it this way is DUPLICATE data. Duplicates can tell you what data is related to what other data because if every duplicate in one column corresponds to a duplicate in another column, the inference can be drawn that the two columns are related. Be careful - recall our discussion of functions - when going from 1NF (1st Normal Form) to 2NF. For this, it is only important to look if duplicates in a Potential Primary Key Column (what I jokingly referred to as PPKC) correspond with duplicates in some other column. It is not necessary that duplicates in that other column necessarily correspond to a duplicate in the PPKC. An example on the next page is COMPID M579. It appears twice for two different TAGNUMS. That is OK. It is only important that there are not two different COMPIDs for the same TAGNUM (you would have to look back to 1NF to find out). A337 - Reed Smith
2nd Normal Form To move to 2NF, look at duplicate value in ONE primary key column in 1NF and see if each duplicate in the primary key corresponds to a duplicate in another column. IF SO, those two columns can be pulled out into another table. Note that there is no 2NF table with PACKID as the primary key. This is because none of the data depends ONLY on PACKID. In other words, knowing the PACKID does help you to identify the SOFTCOST or the INSTDATE HINT: 2NF tables will have primary keys that are part or all of the primary keys for 1NF. NO new primary key columns will be added. A337 - Reed Smith
3rd Normal Form For 3NF, look at duplicates in NON-KEY columns and see if there is another NON-KEY column that has corresponding duplicates in the same places. IF SO, then those two columns can be combined (with no repeating rows) into another table. The primary key of that table will need to be in the original table too. HINT: For moving from 2NF to 3NF, new tables will have primary keys that ARE NOT a part of the primary key for a 2NF table (they will be new A337 - Reed Smith
Normalization Example 2 Don’t be tricked!!! The first column of the data is not always the primary key of the 1NF table. A337 - Reed Smith
1st Normal Form If 1NF has only one column as the primary key (no composite key) then 1NF and 2NF are the SAME!!!!! A337 - Reed Smith
2nd Normal Form A337 - Reed Smith
3rd Normal Form As we discussed in class, knowing the nature of the information, we can see that one person can have two cars and probably LIC_PLATE_NO should be the primary key for the REGISTRATION table and the foreign key for the PARKING_TICKETS table and that another table with LIC_PLATE_NO and LIC_PLATE_ST as a primary key and SSN as a foreign key should exist. See the next slide A337 - Reed Smith
3rd Normal Form You could not havefigured this out byjust looking at the data!!! You would not haveto do this on an EXAM! A337 - Reed Smith
Normalization Exercise A337 - Reed Smith
1st Normal Form A337 - Reed Smith
2nd Normal Form A337 - Reed Smith
3rd Normal Form A337 - Reed Smith