1 / 46

Database Design Fundamentals and SQL Queries Lecture Summary

Learn about database design principles, SQL query concepts, normalization, keys, and constraints. Dive into multi-table select statements, wildcard usage, functional dependencies, and types of relationships in database design.

susanwalker
Download Presentation

Database Design Fundamentals and SQL Queries Lecture Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. COP 2700 – Data Structures - SQL Lecture 4 – June 1, 2015 Database Design

  2. Announcements • Assignment 1 is tonight by 11:59 PM • Read chapter 5 for Wednesday’s class • Multi-Table Select Statements • The Mid Term Exam is next Monday (June 8)

  3. Tonight • Database Design • Normalization • Database Keys and Constraints

  4. Quick Review on a Couple Of Things • Count (Distinct Expression) • Used if you need to know the number of UNIQUE items in an aggregate. • Let’s do this using our Registration view. • How many grades has each student received or will receive? • select student_id, count(*) from transcript • group by student_id • How many grades has each student received (that is, grade is not null) • select student_id, count(Grade) from transcript • group by student_id • How many different grades has each student been given? • select student_id, count(Distinct Grade) from transcript • group by student_id

  5. Like – Wildcards for Strings • % - Matches any number of characters • Which Parts start with the letter “D” • SELECT Part_Num, Description FROM Part WHERE Description Like ‘D%’ • Which Parts have “Wash” as part of the description • SELECT Part_Num, Description FROM Part WHERE Description Like ‘%Wash%’ • How could I have done the above using String Functions? • SELECT Part_Num, Description FROM Part WHERE SubString(Description,1,1) = ‘D’ • SELECT Part_Num, Description FROM Part WHERE CharIndex(“Wash”,Description)>0 • _ - Matches one character (Underscore) • SELECT Part_Num, Description FROM Part WHERE Description LIKE '_o%' • Remember, if you need to search for a percent or under_score, you can use the Escape clause (but I have never ever had to do that!!) • SELECT Part_Num, Description FROM Part Where Description LIKE '%!%Wash%' Escape '!‘ Wil get all record with %wash somewhere in the descritpion.

  6. What is a database again?? • It is a model of a real world enterprise • So, when we are designing a database, there is normally a real world “manual” process that already exists. • Database design then is taking a real world “system” and simulating it using entities, relationships and attributes.

  7. Let’s “Design” the Henry Bookstore Database

  8. And now for some boring nomenclature • Functional Dependency • To help us to determine the correct set of tables to represent our real world, we need to know how the attributes in a table are related to each other. • For example, the Name of a book is dependant on the Code of the book…or a Book’s Code determines the Book’s Name. There is a one to one relation ship between the two attributes. • Based on the investigation of these attributes’ relationships, we can best determine our database design. • Functional dependency usually requires one to know the structures, the data and the real world policies • Be on the look out for “derived fields”. These are fields that can be calculated from existing data and really aren’t needed in the database.

  9. Candidate KeysPrimary Keys • Each table in our database needs to include some type of unique identifier for the columns in that table. • All other attributes are then functionally dependant on the identifier (which implies that that identifier determines the other attributes.) • The unique identifier may be a combination of existing attributes. • Or, if there is no separate combination of attributes that exists that can consistently determine the other attributes, a new column should be created to hold a unique key for each row. (Will look at Identity a little later tonight) • Any attribute (or combination of attributes) that can uniquely determine the other attributes in a row is called a “Candidate Key” • If there is only one attribute, then that is the Primary Key • If there are multiple candidate keys, then one of them is selected as the Primary Key. (The other may need to be set up as a Unique Index. More on that later as well!!) • Can you think of a database that may have multiple candidate keys?

  10. Types of Relationships • Relationships are those tables that tie together two or more entities • There are three different kind of Relationships we need to worry about in Database Design • one to many (one entity can relate to many other entities, but not the other way around…A book can have exactly one Publisher, but a Publisher may have many Books.) • usually this is denoted by including the primary key of the “one” in the table of the “many” • many to many (one entity can relate to many other entities and vs. versa…A book can have multiple sales locations AND each sales location can sell multiple books) • usually this is implemented by creating a new table with a primary key of the two primary keys from the base tables along with any additional attributes (like on-hand amount) • one to one (one entity can by related exactly with one other entity based on each primary key, and vs. versa.) • Let’s talk about this one. If this happens, what do you think we should do?

  11. Normalization • Once a first cut has been attempted to generate the tables (both entities and relationships) for a database, a process called Normalization is performed to try and make sure that issues will not happen in the future because of bad database design. • There are three “normal forms” that a database should adhere to to make sure that things called “anamolies” don’t happen.

  12. First Normal Form • The data must be able to be represented by complete rows and columns or data. (The books says no repeating groups. This will become a little less sketchy once we go through one.)

  13. Second Normal Form • The table must be in First Normal Form • AND • All attributes of the table must be dependant on the entire primary key of the table. There should not be any attribute dependant on a partial key. • This is usually identified as a redundancy of data

  14. Third Normal Form • The table must be in Second Normal Form • AND • No attribute should be functionally dependant an any other attribute besides the primary key. • This is again a matter of redundancy. • Funny thing, in some cases when you are doing initial database design, you may have a set of attributes that look like they belong, and that have no other attribute that they are dependant on, but the columns also are not dependant on the primary key!! • This is sometimes called the Boyce-Codd Normal Form after the name of the database scientists that developed the concept • There are higher normal forms, but we will not get into those in this class.

  15. And Here’s The Basic Flow • Document the heck out of the manual process • Look at reports, invoices, orders, transactions, anything you can get your hands on to determine • The data • The relationship of that data to other data • Identify the obvious entities and relationships from the manual process. • Identify the attributes associated with each entity/relationship • Determine the type of each attribute • Determine any specific requirements or constraints • List the functional dependencies of each attribute • Apply the three normalization rules to determine your initial database design.

  16. Let’s “Design” the Henry Bookstore Database

  17. Entities • At least two • Branch • Branch Name, Location, Employees • Book • Code, Title, Type, Price, Pub Code, Pub Name, Author • Let’s leave it this way for now

  18. Relationships • A Branch includes an On-Hand Quantity of a Book. • Is this one to many, many to many or one to one? • What attributes need to be included in this Relationship?

  19. Normal Forms – Branch Entity • What is the Candidate Primary Key? • Probably Branch Name, but we could run into trouble. • Suggest we add a column called Branch_Num to ensure uniqueness, but that imparts no real meaning.

  20. Auto Increment Columns • To handle the creation of Primary Keys for Tables that ensure uniqueness, each major database provider has developed a mechanism to auto-assign a numeric primary key. • In SQLServer, this ability is called Identity. • When the database is created, a column can be set up as an Identity Column, which means that on an Insert, the database will look at the highest value for that column throughout the database, and then assign the next highest value for the next inserted row. • We’ll play with this more in a couple of weeks when we get into database updates.

  21. Normal Forms – Branch Entity • So, our Branch Table now looks like this: • Branch Num (Numeric) • Branch Name (String) • Branch Location (String) • Number of Employees (Numeric) • Is this in first normal form? • Is this in second normal form? • Is this in third normal form?

  22. Normal Forms – Book Entity • Do we have a Candidate Key • Yes – Let’s Start with Book Num and see where we go!! • Are we in First Normal Form:

  23. Normal Forms – Book Entity NO!! – There is a repeating group!!

  24. Normal Forms – Book Entity

  25. Normal Forms – Book Entity • OK – We’re in First Normal Forms (no repeating groups), but that blows our candidate key since it is no longer unique. • So, our new candidate key will be • Book_Num + Author • Now, the candidate key is unique and can be considered a primary key

  26. Normal Forms – Book Entity • Are we in Second Normal Form? • No – We have attributes that are functionally dependant on only a part of our new Candidate key • Book Title and Type are only dependant on the Book Num, not the Author. • This can lead to Update Anomalies • Insert Anomaly • What happens if we have a new author that has yet to publish a book. How can we add the author to the database? • Delete Anomaly • If we delete all the books from an author, we are also deleting any record of that author. • Inconsistencies • We can mistakenly change the last name of an author on one book and forget to do it for another.

  27. Normal Forms – Book Entity • Solution – Remove Author from the Book Table and create it’s own Table. Include a Unique Author Num in the Table. • What is the relationship between Book and Author. • One to Many? Many to Many?

  28. Author EntityWrote Relationship • Author • Author_Num Primary Key (Numeric) • Last_Name (String) • First_Name (String) • Wrote • Book_Code • Author_Num • Primary key is the combination of these two columns

  29. Normal Forms – Book Entity Now in Second Normal Form – Candidate Key is Book Code. But, looking carefully at the data, we see that what we still have possible Update Anomalies.

  30. Update Anomalies • Delete Anomaly • If we delete the Black House record, we also remove any reference to the Random House publisher • Inconsistencies • With this set-up, we could update the Publisher City for one Harry Potter book, and not the other. Which means the Publisher could look like it is in two cities.

  31. Normal Forms – Book Entity • Third Normal Forms state that all attributes in a table are to be functionally dependant ONLY on the primary key. • Book Code does determine the publisher information • BUT Publisher Name and City are dependant on Publisher Code and not Book Code • Solution – Create a Publisher Entity Table • What is the relationship between Book and Publisher?

  32. Henry Book Store DatabaseThird Normal Form

  33. DDL For Creating a Table CREATE TABLE Table_Name (Column1 Data Type {Not Null}, Column2 Data Type {Not Null}, Column3 Data Type {Not Null}, Column4 Data Type {Not Null}, PRIMARY KEY (Column Name) );

  34. Data Types • Numeric • Precise – Numeric/Decimal (Length,Decimals), Bit, Int and others • Estimate – Float – Real – Use only for scientific applications • Character/String (values surrounded by single quotes) • Fixed Length – Char and nChar • Variable Length – Varchar and nVarchar • DateTime (stores dates and times) • DateTime, Date, Time • Binary. (Used to store pictures, programs, anything that can be represented as a binary file.) • For more information check out: • http://www.teratrax.com/sql-server-data-types-ranges/

  35. Example Create TableBook Table CREATE TABLE BOOK( BOOK_CODE varchar(4) NOT NULL, TITLE varchar(40) NULL, PUBLISHER_CODE varchar(3) NULL, TYPE varchar(3) NULL, PRICE decimal(4, 2) NULL, PAPERBACK varchar(1) NULL, PRIMARY KEY (BOOK_CODE) ); FYI. If the Primary Key is a single column, it can be just noted along with the column when it is created: BOOK_CODE varchar(4)NOT NULL PRIMARY KEY, A Primary Key cannot be NULL. Notice the semicolon at the end of the CREATE TABLE It is used to separate commands (and can be used in between Selects as well, although SQLServer does a good job without it.)

  36. Referential Integrity • Now that we have our database all set up, what would happen if we deleted one of the Publishers in our Publisher table? • To prevent this from happening, DBMS have created Foreign Keys. • A Foreign Key is called a Constraint since it prevents things from happening or constrains the actions you can take. • A Foreign Key is the primary key of another table that is included to represent a relationship in the base table. • What is the Foreign Key(s) in the above Book table?

  37. Foreign Key Definition CREATE TABLE BOOK( BOOK_CODE varchar(4) NOT NULL PRIMARY KEY, TITLE varchar(40) NULL, PUBLISHER_CODE varchar(3) NOT NULL FOREIGN KEY REFERENCES Publisher (Publisher_Code), TYPE varchar(3) NULL, PRICE decimal(4, 2) NULL, PAPERBACK varchar(1) NULL); If the Foreign Key is multiple columns (or you want to give it a specific name) then it must be applied after the table definition CONSTRAINT FK_Publisher FOREIGN KEY (PUBLISHER_CODE )REFERENCES Publisher (Publisher_Code)

  38. What Happens with a Foreign Key • Once a Foreign Key is established. • One cannot add a record to the base table that does not already have a record in the Foreign Key’s table. • One cannot delete a record in the Foreign Key’s table if it is used in the base table. • One cannot change the value of the Primary Key in the Foreign Key’s table if it is used in the base table.

  39. What Happens with a Foreign Key • With our Book and Publisher Tables • One cannot add a Publisher Code to a row in the Book table where that Publisher Code is not defined in the Publisher table. • One cannot delete a record in the Publisher table if the primary key of that table is used in the Book table. • One cannot change the value of the Primary Key of the Publisher Table if it is used in the Book Table. • Let’s Try • Let’s look at our Henry Database again. What other Foreign Keys should we have?

  40. USE master GO CREATE DATABASE Henry GO USE Henry2 GO CREATE TABLE AUTHOR( AUTHOR_NUM decimal(2, 0) NOT NULL PRIMARY KEY, AUTHOR_LAST varchar(12) NULL, AUTHOR_FIRST varchar(10) NULL); CREATE TABLE dbo.PUBLISHER( PUBLISHER_CODE varchar(3) NOT NULL PRIMARY KEY, PUBLISHER_NAME varchar(25) NULL, CITY varchar(20) NULL); CREATE TABLE dbo.BRANCH( BRANCH_NUM decimal(2, 0) NOT NULL PRIMARY KEY, BRANCH_NAME varchar(50) NULL, BRANCH_LOCATION varchar(50) NULL, NUM_EMPLOYEES decimal(2, 0) NULL); CREATE TABLE dbo.BOOK( BOOK_CODE varchar(4) NOT NULL PRIMARY KEY, TITLE varchar(40) NULL, PUBLISHER_CODE varchar(3) NULL FOREIGN KEY REFERENCES Publisher (Publisher_Code), TYPE varchar(3) NULL, PRICE decimal(4, 2) NULL, PAPERBACK varchar(1) NULL); CREATE TABLE dbo.WROTE( BOOK_CODE varchar(4) NOT NULL FOREIGN KEY REFERENCES Book (Book_Code), AUTHOR_NUM decimal(2, 0) NOT NULL FOREIGN KEY REFERENCES Author (Author_Num), SEQUENCE decimal(1, 0) NULL, PRIMARY KEY (BOOK_CODE, AUTHOR_NUM)); CREATE TABLE dbo.INVENTORY( BOOK_CODE varchar(4) NOT NULL FOREIGN KEY REFERENCES Book (Book_Code), BRANCH_NUM decimal(2, 0) NOT NULL FOREIGN KEY REFERENCES Branch (Branch_Num), ON_HAND decimal(2, 0) NULL, PRIMARY KEY (BOOK_CODE, BRANCH_NUM)); Henry Database Definition

  41. Auto-Increment ColumnsIdentity • If we have a Entity without a clear Primary Key, or one is just not available, we could just make one up. • But that becomes a hassle as new records are added. • SQLServer has the ability to auto-generate a unique key by: • Setting a column to a numeric type field and saying that it is NOT NULL • Adding the tag IDENTITY to the column at creation time. • When doing this, the way one inserts information into this table will have to change. More on this when we get into database updates. • But, let’s try with a quick fake table:

  42. Auto-Increment ColumnsIdentity CREATE TABLE Author_Too( AUTHOR_NUM decimal(2, 0) NOT NULL PRIMARY KEY IDENTITY, AUTHOR_LAST varchar(12) NULL, AUTHOR_FIRST varchar(10) NULL);

  43. Constraints • So far we have seen three different Constraints • NOT NULL • PRIMARY KEY – This column (or combination of columns) must be unique within the table. • FOREIGN KEY – This column can only contain the primary key of the table to which it points. • Database Management Systems also allow a column to be defined with other constraints, such as a range of values or allowing only specific values. We’ll play with these later in the class if time permits.

  44. Using SQL Server Management Studio to Build Tables • Let’s play a little bit and create a table like the Book table, but using the GUI instead of a script file. • Let’s also look at Primary Keys, Identity, Foreign Key and a couple of simple Constraints.

  45. Now, let’s apply what we just did to another real world system • The Transcript/Registration • Break up into groups of 4 or 5. • Identify the obvious entities, relationships and their attributes • Go through Normalization • Come up with a Database Design for the System. • Turn in one entity relationship model for the group.

  46. Assignment 2

More Related