620 likes | 755 Views
CS157A Lecture 15. 3NF and Boyce-Codd Normal Form. Prof. Sin-Min Lee Department of Computer Science San Jose State University. What it’s all about. Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable form for enforcing F.
E N D
CS157A Lecture 15 3NF and Boyce-Codd Normal Form Prof. Sin-Min Lee Department of Computer Science San Jose State University
What it’s all about • Given a relation, R, and a set of functional dependencies, F, on R. • Assume that R is not in a desirable form for enforcing F. • Decompose relation R into relations, R1,..., Rk, with associated functional dependencies, F1,..., Fk, such that R1,..., Rk are in a more desirable form, 3NF or BCNF. • While decomposing R, make sure to preserve the dependencies, and make sure not to lose information.
FLT-SCHEDULE flt# weekday airline dtime from atime to DL242 MO WE FR DELTA 10:40 ATL 12:30 BOS SK912 SA SU SAS 12:00 CPH 15:30 JFK AA242 MO FR AA 08:00 CHI 10:10 ATL FLT-SCHEDULE flt# weekday airline dtime from atime to DL242 MO DELTA 10:40 ATL 12:30 BOS DL242 WE DELTA 10:40 ATL 12:30 BOS DL242 FR DELTA 10:40 ATL 12:30 BOS SK912 SA SAS 12:00 CPH 15:30 JFK SK912 SU SAS 12:00 CPH 15:30 JFK AA242 MO AA 08:00 CHI 10:10 ATL AA242 FR AA 08:00 CHI 10:10 ATL Primitive Domains Attributes must be defined over domains with atomic values
FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo-33297 DL242 10/24/00 Delta t-up-73356 DL242 10/25/00 Delta o-ge-98722 AA121 10/24/00 American p-rw-84663 AA121 10/25/00 American q-yg-98237 AA411 10/22/00 American h-fe-65748 Bad Database Design- redundancy of fact • redundancy: airline name repeated for same flight • inconsistency: when airline name for a flight changes, it must be changed many places
FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo-33297 DL242 10/24/00 Delta t-up-73356 DL242 10/25/00 Delta o-ge-98722 AA121 10/24/00 American p-rw-84663 AA121 10/25/00 American q-yg-98237 AA411 10/22/00 American h-fe-65748 Bad Database Design- fact clutter • insertion anomalies: how do we represent that SK912 is flown by Scandinavian without there being a date and a plane assigned? • deletion anomalies: cancelling AA411 on 10/22/00 makes us lose that it is flown by American. • update anomalies: if DL242 is flown by Sabena, we must change it everywhere.
FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo-33297 DL242 10/24/00 Delta t-up-73356 DL242 10/25/00 Delta o-ge-98722 AA121 10/24/00 American p-rw-84663 AA121 10/25/00 American q-yg-98237 AA411 10/22/00 American h-fe-65748 FLIGHTS-AIRLINE DATE-AIRLINE-PLANE flt# airline date airline plane# DL242 Delta AA121 American AA411 American 10/23/00 Delta k-yo-33297 10/24/00 Delta t-up-73356 10/25/00 Delta o-ge-98722 10/24/00 American p-rw-84663 10/25/00 American q-yg-98237 10/22/00 American h-fe-65748 Bad Database Design- information loss
FLIGHTS-AIRLINE DATE-AIRLINE-PLANE flt# airline date airline plane# DL242 Delta AA121 American AA411 American 10/23/00 Delta k-yo-33297 10/24/00 Delta t-up-73356 10/25/00 Delta o-ge-98722 10/24/00 American p-rw-84663 10/25/00 American q-yg-98237 10/22/00 American h-fe-65748 FLIGHTS flt# date airline plane# DL242 10/23/00 Delta k-yo-33297 DL242 10/24/00 Delta t-up-73356 DL242 10/25/00 Delta o-ge-98722 AA121 10/24/00 American p-rw-84663 AA121 10/25/00 American q-yg-98237 AA211 10/22/00 American h-fe-65748 AA411 10/24/00 American p-rw-84663 AA411 10/25/00 American q-yg-98237 AA411 10/22/00 American h-fe-65748 Bad Database Design- information loss • information loss: we polluted the database with false facts; we can’t find the true facts.
FLIGHTS-AIRLINE DATE-AIRLINE-PLANE flt# airline date airline plane# DL242 Delta AA121 American AA411 American 10/23/00 Delta k-yo-33297 10/24/00 Delta t-up-73356 10/25/00 Delta o-ge-98722 10/24/00 American p-rw-84663 10/25/00 American q-yg-98237 10/22/00 American h-fe-65748 Bad Database Design- dependency loss • dependency loss: we lost the fact that (flt#, date) ® plane#
FLIGHTS-DATE-PLANE flt# date plane# FLIGHTS-AIRLINE DL242 10/23/00 k-yo-33297 DL242 10/24/00 t-up-73356 DL242 10/25/00 o-ge-98722 AA121 10/24/00 p-rw-84663 AA121 10/25/00 q-yg-98237 AA411 10/22/00 h-fe-65748 flt# airline DL242 Delta AA121 American AA411 American Good Database Design • no redundancy of FACT (!) • no inconsistency • no insertion, deletion or update anomalies • no information loss • no dependency loss
X Y Functional Dependencies and Keys Let X and Y be sets of attributes in R • Y is functionally dependent on X in R iff for each x Î R.X there is precisely one yÎ R.Y • Y is fully functional dependent on X in R if Y is functional dependent on X and Y is not functional dependent on any proper subset of X • We use keys to enforce functional dependencies in relations: X ® Y
FLIGHTS flt# date airline plane# FLIGHTS flt# date airline plane# FLIGHTS flt# date airline plane# Functional Dependencies and Keys the FLIGHT relation will not allow the FDs to be enforced by keys plane# is not determined by flt# alone airline is not determined by flt# and date
cust# name address cust# name address cust# name address cust# name address cust# name address Functional Dependencies and Keys real world database name address Consider the meaning separate combined
How to Compute Meaning- Armstrong’s inference rules Rules of the computation: • reflexivity: if YÍ X, then X®Y • Augmentation: if X®Y, then WX®WY • Transitivity: if X®Y and Y®Z, then X®Z Derived rules: • Union: if X®Y and X®Z, the X®YZ • Decomposition: if X®YZ, then X®Y and X®Z • Pseudotransitivity: if X®Y and WY®Z, then XW®Z Armstrong’s Axioms: • sound • complete
Overview of NFs NF2 1NF 2NF 3NF BCNF
Normal Forms- definitions • NF: non-first normal form • 1NF: R is in 1NF. iff all domain values are atomic2 • 2NF: R is in 2. NF. iff R is in 1NF and every nonkey attribute is fully dependent on the key • 3NF: R is in 3NF iff R is 2NF and every nonkey attribute is non-transitively dependent on the key • BCNF: R is in BCNF iff every determinant is a candidate key • Determinant: an attribute on which some other attribute is fully functionally dependent.
FLT-INSTANCE flt# date plane# airline from to miles airline flt# from plane# date to miles Example of Normalization
airline flt# from plane# date to miles airline flt# flt# from plane# date to miles airline flt# from to flt# plane# from date to miles Example of Normalization 1NF: 2NF: 3NF & BCNF:
R A B C A C B R1 R2 C B A C 3NF that is not BCNF Candidate keys: {A,B} and {A,C} Determinants: {A,B} and {C} A decomposition: Lossless, but not dependency preserving!
When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF. • 3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys • i.e. composite candidate keys with at least one attribute in common. • BCNF is based on the concept of a determinant. • A determinant is any attribute (simple or composite) on which some other attribute is fully functionally dependent. • A relation is in BCNF is, and only if, every determinant is a candidate key.
The theory • Consider the following relation and determinants. Example 1. Given R(a,b,c,d) a,c -> b,d a,d -> b • To be in BCNF, all valid determinants must be a candidate key. In the relation R, a,c->b,d is the determinate used, so the first determinate is fine. • Example 2. If {a, b} is not a key, a,d->b suggests that a,d can be the primary key, which would determine b. However this would not determine c. This is not a candidate key, and thus R is not in BCNF.
Two possible keys • DB(Patno,PatName,appNo,time,doctor) • Determinants: • Patno -> PatName • Patno,appNo -> Time,doctor • Time -> appNo • Two options for 1NF primary key selection: • DB(Patno,PatName,appNo,time,doctor) (example 1a) • DB(Patno,PatName,appNo,time,doctor) (example 1b)
Example 1a • DB(Patno,PatName,appNo,time,doctor) • No repeating groups, so in 1NF • 2NF – eliminate partial key dependencies: • DB(Patno,appNo,time,doctor) • R1(Patno,PatName) • 3NF – no transient dependences so in 3NF • Now try BCNF.
BCNF Every determinant is a candidate key DB(Patno,appNo,time,doctor)R1(Patno,PatName) • Is determinant a candidate key? • Patno -> PatNamePatno is present in DB, but not PatName, so irrelevant.
Continued… DB(Patno,appNo,time,doctor)R1(Patno,PatName) • Patno,appNo -> Time,doctorAll LHS and RHS present so relevant. Is this a candidate key? Patno,appNo IS the key, so this is a candidate key. • Time -> appNoTime is present, and so is appNo, so relevant. Is this a candidate key? If it was then we could rewrite DB as: DB(Patno,appNo,time,doctor)This will not work, so not BCNF.
Rewrite to BCNF • DB(Patno,appNo,time,doctor)R1(Patno,PatName) • BCNF: rewrite to DB(Patno,time,doctor) R1(Patno,PatName) R2(time,appNo) • time is enough to work out the appointment number of a patient. Now BCNF is satisfied, and the final relations shown are in BCNF
Example 1b • DB(Patno,PatName,appNo,time,doctor) • No repeating groups, so in 1NF • 2NF – eliminate partial key dependencies: • DB(Patno,time,doctor) • R1(Patno,PatName) • R2(time,appNo) • 3NF – no transient dependences so in 3NF • Now try BCNF.
BCNF Every determinant is a candidate key DB(Patno,time,doctor) R1(Patno,PatName) R2(time,appNo) • Is determinant a candidate key? • Patno -> PatNamePatno is present in DB, but not PatName, irrelevant. • Patno,appNo -> Time,doctorNot all LHS present so not relevant • Time -> appNoTime is present, but not appNo, so not relevant. • Relations are in BCNF.
Summary - Example 1 This example has demonstrated three things: • BCNF is stronger than 3NF, relations that are in 3NF are not necessarily inBCNF • BCNF is needed in certain situations to obtain full understanding of the data model • there are several routes to take to arrive at the same set of relations in BCNF. • Unfortunately there are no rules as to which route will be the easiest one to take.
Example 2 Grade_report(StudNo,StudName,(Major,Adviser, (CourseNo,Ctitle,InstrucName,InstructLocn,Grade))) • Functional dependencies • StudNo -> StudName • CourseNo -> Ctitle,InstrucName • InstrucName -> InstrucLocn • StudNo,CourseNo,Major -> Grade • StudNo,Major -> Advisor • Advisor -> Major
Example 2 cont... • UnnormalisedGrade_report(StudNo,StudName,(Major,Advisor, (CourseNo,Ctitle,InstrucName,InstructLocn,Grade))) • 1NF Remove repeating groups • Student(StudNo,StudName) • StudMajor(StudNo,Major,Advisor) • StudCourse(StudNo,Major,CourseNo, Ctitle,InstrucName,InstructLocn,Grade)
Example 2 cont... • 1NFStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo, Ctitle,InstrucName,InstructLocn,Grade) • 2NF Remove partial key dependenciesStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName,InstructLocn)
Example 2 cont... • 2NFStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName,InstructLocn) • 3NF Remove transitive dependenciesStudent(StudNo,StudName)StudMajor(StudNo,Major,Advisor)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName)Instructor(InstructName,InstructLocn)
Example 2 cont... • BCNF Every determinant is a candidate key • Student : only determinant is StudNo • StudCourse: only determinant is StudNo,Major • Course: only determinant is CourseNo • Instructor: only determinant is InstrucName • StudMajor: the determinants are • StudNo,Major, or • Advisor Only StudNo,Major is a candidate key.
Example 2: BCNF • BCNFStudent(StudNo,StudName)StudCourse(StudNo,Major,CourseNo,Grade)Course(CourseNo,Ctitle,InstrucName)Instructor(InstructName,InstructLocn)StudMajor(StudNo,Advisor)Adviser(Adviser,Major)
Problems BCNF overcomes • If the record for student 456 is deleted we lose not only information on student 456 but also the fact that DARWIN advises in BIOLOGY • we cannot record the fact that WATSON can advise on COMPUTING until we have a student majoring in COMPUTING to whom we can assign WATSON as an advisor.
Split into two tables In BCNF we have two tables
Returning to the ER Model • Now that we have reached the end of the normalisation process, you must go back and compare the resulting relations with the original ER model • You may need to alter it to take account of the changes that have occurred during the normalisation process Your ER diagram should always be a prefect reflection of the model you are going to implement in the database, so keep it up to date! • The changes required depends on how good the ER model was at first!
Video Library Example • A video library allows customers to borrow videos. • Assume that there is only 1 of each video. • We are told that: video(title,director,serial)customer(name,addr,memberno)hire(memberno,serial,date) title->director,serial serial->title serial->director name,addr -> memberno memberno -> name,addr serial,date -> memberno
What NF is this? • No repeating groups therefore at least 1NF • 2NF – A Composite key exists:hire(memberno,serial,date) • Can memberno be found with just serial or date? • NO, therefore the relations are already in 2NF. • 3NF?
Test for 3NF • video(title,director,serial) • title->director,serial • serial->director • Director can be derived using serial, and serial and director are both non keys, so therefore this is a transitive or non-key dependency. • Rewrite video…
Rewrite for 3NF • video(title,director,serial) • title->director,serial • serial->director • Becomes: • video(title,serial) • serial(serial,director)
Check BCNF • Is every determinant a candidate key? • video(title,serial) - Determinants are: • title->director,serial Candidate key • serial->title Candidate key • video in BCNF • serial(serial,director) Determinants are: • serial->director Candidate key • serial in BCNF
customer(name,addr,memberno) Determinants are: • name,addr -> memberno Candidate key • memberno -> name,addr Candidate key • customer in BCNF • hire(memberno,serial,date) Determinants are: • serial,date -> memberno Candidate key • hire in BCNF • Therefore the relations are also now in BCNF.