530 likes | 652 Views
Normalization, Roberts’s Rules and Introduction to Data Modeling. CSCI 6442. Agenda. Roberts’s Rules Normalization Roberts’s Rules and Normalization. Why Are We Talking About This?. To design a database, we choose a set of entities that models a problem
E N D
Normalization, Roberts’s Rules and Introduction to Data Modeling CSCI 6442
Agenda • Roberts’s Rules • Normalization • Roberts’s Rules and Normalization
Why Are We Talking About This? • To design a database, we choose a set of entities that models a problem • We will store data in tables corresponding to our entity choices • The names of the entity types, and what’s in which table, becomes embedded in our programs • Changing later on is complex, so we want a stable model of the problem
Midterm Question The first question on the midterm will deal with normal forms. It will deal with the relationship between normal forms and Roberts’s Rules. This one question will count more than any other question on the exam. The homework assignment for next week looks a lot like Question 1 on the midterm.
Syntax and Semantics • Syntax deals with the structure and form of a statement or language • Semantics deals with the meaning that is conveyed by a statement or language
Question • Is normalization a syntactic or a semantic construct? • That is, does it deal with the form of information, or is it involved with meaning?
Intentional vs. Extensional Data • Extensional data—the data that is actually present • Intentional data—all the data that is allowed to be present Question: does normalization deal with intentional or extensional data?
Entity and Entity Type • An entity is something that we record information about in the database • An entity type is a set of similar things that we store information about • An entity instance is one example of some entity type. • Usually we don’t say entity instance and entity type when context makes the meaning clear; we just say entity.
Relations • We use a relation to model a single entity type • The relation is a set of tuples • Each tuple is an ordered collection of values of attributes of the entity type • Each tuple of the relation corresponds to a single instance of the entity type
Facts • A value of an attribute in a row conveys one fact about an entity instance • An attribute is a fact stating that “This entity instance has the value <value>” • Consider emp(empno,ename,job,deptno) • Each value of ename in a row states that “This person’s name is <value>”. • Each row of this table can be viewed as a collection of four facts
Rule 1 Each relation describes exactly one entity type. A relation models a distinct entity type, and each tuple of the relation models an instance of that entity. The relation models an entity by storing its attributes. The attributes that identify it are called candidate keys; the other attributes are non-key.
Do these follow Rule 1? DESK(SER#, HEIGHT, WIDTH, COST, CUSTODIANSALARY) EMP-CAR (EMP#, ENAME, DEPTNO, CARVIN#, CARMAKE, CARYEAR) EMP(EMP#, ENAME, JOB,DEPTNO, DEPTCITY)
Rule 2 Each fact is represented only once in the database. A tuple (aka row) is a collection of facts about an entity instance, one fact per column. Each fact can appear only once, in one row of one table.
Rule 3 Each tuple can reside in only one relation. A relation is a model of an entity type, not a station on a factory assembly line. Instead of moving a tuple from relation to relation, add an attribute that characterizes status.
Rule 3 Example • As a person is being interviewed and hired, they change status: • Resume received • Resume being evaluated • Selected for interview • Selected for hire • Hired • As status changes, we could more the person’s row from one table to another. Should we?
Rule 4 If the cardinality of an attribute is greater than one, then database design must be insensitive to cardinality. It’s easy—and very risky—to presume that the cardinality of various entity types and subtypes will remain the same.
Rule 4 Examples • Company car • College degree • Telephone number • Home address • Business address • Email address
Example of Roberts’s Rules EMP ( EMPNO, ENAME, DEPTNO, DNAME) DEPT (DEPTNO, DNAME, DLOC) This relation violates the following Roberts’s rules : • Rule 1. The EMP table describes employee as well as department • Rule 2. In the EMP table, if we have the same DEPTNO in multiple rows, DNAME will be represented multiple times.
Another Example EMP (ENAME, DEGREE1, DEGREE2, DEGREE3) This schema violates the following Roberts’s rule : Rule 4. The design assumes every employee has a maximum of 3 degrees. If an employee has 4 degrees, then the database needs to be restructured by adding DEGREE4 in the EMP table. Rule 4 deals with an aspect of data independence. It can be stated informally as: "Grow down, not across"
A Question Are Rule 1 and Rule 2 equivalent? They are equivalent if the set of relations that satisfy Rule 1 is the same as the set of relations that satisfies Rule 2. This is a homework problem.
Normalization • A set of formal rules that are intended to be a definition of a properly-structured database • A normal form generally deals with and removes certain anomalous behavior from the use of a relation that is normalized.
Examples of Anomalies • Insert anomalies • If we want to enter information about a new entity in the database we need to enter information about some other entity first • Delete anomalies • In order to delete information about an entity we must delete information about another entity • Update anomalies • In order to change the value of a single fact we may have to change many stored values in the database
Basic Concepts • Entity Type: a class of an object that we record information about. Aka relation, table • Attribute: a characteristic of an entity. Aka column. • Entity Instance: a single occurrence of an entity type. Aka tuple, row
Candidate Keys Candidate key: a set of attributes Ai, Aj,…Ak that is a candidate key has two (time-invariant) properties: • Uniqueness – no two tuples have the same value for the candidate key • 2. Minimality – if any Ai is discarded from the candidate key, then the uniqueness property is lost. It is the smallest set of attributes that identifies a row. How many candidate keys can a table have?
Primary Key One of the candidate keys is selected to be the primary identifier of rows. It is called the primary key. The selection is usually made based on the usefulness of the attribute that is the primary key.
Functional Dependence • R.X→R.Y or R.X FD R.Y • Given a relation R, attribute Y of R is functionally dependent on attribute X of R iff each X-value in R has associated with it precisely one Y-value in R (at any one time) • In other words, for each value of X in table R, there is one and only one value of Y. A given X value must always occur with the same Y value.
Functional Dependence Examples Does X→Y? Does Y→X?
Anomalies Update anomalies: If one copy of repeated data is updated, inconsistency is created unless all copies are similarly updated. Insert anomalies: It may not be possible to store some information unless some other information is stored as well. Delete anomalies: It may not be possible to delete some information without losing some other information as well.
Full Functional Dependence Y is fully functionally dependent on X iff X→Y and no subset of X determines Y. That is, X is the smallest collection of columns that determines Y.
“Aboutness” FD is about “aboutness” If A is FD on X, then A is “about” X Suppose X is employee ID, EID; then EID determines salary, SAL But SAL is “about” the employee identified by EID
Multi-Valued Dependency R.X is said to multi-value determine R.Y if there is a set of values for Y that must appear in any relation where R.X appears. For example, if a course has two textbooks, then there will be an MVD between the course number and the names of the books.
Fourth Normal Form A relation is said to be in fourth normal form iff it is in third normal form and it does not have more than one multi-valued dependency.
Example of 4NF Is this relation in 4NF? SPORT-INSTRUMENT Instrument MVD SID MVD Sport
Converting to 4NF SPORT INSTRUMENT Instrument SID MVD SID MVD Sport
What does 4NF not permit? 4NF does not permit multiple MVDs in a single relation
Boyce-Codd Normal Form A relation is said to be in Boyce-Codd normal form iff every determinant is a key. BCNF deals with problems that can be caused by overlapping candidate keys.
Example of BCNF How does this relation comply with Rule 1 and Rule 2? Are there any anomalies? S# SNAME QTY P#
Converting to BCNF S# SNAME S# QTY P#
What does BCNF not allow? BCNF brings the restrictions on “aboutness” to candidate and composite keys
Rule 1: One Entity Type Per Table • Each row must be about a single entity type • Can’t have information about two entity types • Think of FD. RR1 requires FD, does not allow transitive FD.
Rule 2: Each Fact Represented Once • What must happen for a single fact to be represented more than once? • Most likely, there is a transitive dependency • So RR2 seems to disallow transitive dependency