820 likes | 954 Views
Normalization: Kroenke Chapters 3 and 4. A relation is categorized by one of several normal forms . An aid to design helps characterize relations that experience anomalies in update operations Higher normal forms TEND to be better design, but not guaranteed .
E N D
A relation is categorized by one of several normal forms. • An aid to design • helps characterize relations that experience anomalies in update operations • Higher normal forms TEND to be better design, but not guaranteed.
Remember the one fact-one place theme! • Deletion anomaly: • Deleting 1 fact inadvertently deletes another. • Insertion anomaly: • inserting 1 fact not possible without inserting another seemingly unrelated fact.
First Normal Form – 1NF • A relation is 1NF if each attribute is atomic • That is, attributes are simple types (int, float, string, char, etc)
Second Normal Form – 2NF • How about this as a base table? Primary key
Definition: • R is a relation; • X and Y are attributes of R. • Y is functionally dependent on X iff each X-value in R has precisely one Y-value in R associated with it. • A common notation is X Y.
Status S# Qty P# City S# Name Example: • In the supplier’s S table, Status, City, and Name are functionally dependent on S#. • In the SP table, Qty is functionally dependent on the combined attributes of S# and P#
City and status are not functionally dependent on each other. • There may be several entries containing ‘London’ but different status values. • There may be several entries containing a status of 50 but have different cities.
QTY is not functionally dependent on either P# or S#. • S1 might have multiple QTY values for different parts • Similar for P1
Def: Y is Fully Functionally Dependent on X if X Y but Y is not functionally dependent on any proper subset of X. • In S, (S#, Status) City -- but not fully because S# City • in SP: (S#, P#) Qty -- fully because neither S# nor P# by itself determines Qty. • The functional dependence requires BOTH S# and P#.
Semantic notation. Must understand meaning of data, NOT a consequence of table data. • For example, suppose that each city in S has the same status. • Is it coincidence or by design?
Why is this important? • What if we combined the relations S and SP into a single relation, Firstas in a few slides previous? First(S#, P#, Status, City, QTY) • Underlined attributes represent the primary key.
Insertion Anomalies: • Cannot enter fact that a supplier is located in a city unless that supplier already supplies some part. • Why?
Deletion Anomalies • Suppose S3 no longer supplies P2. • Delete (S3, P2, 10, Paris, 200) • if that is the ONLY part S# supplied, you lose fact that S3 is in Paris.
Update Anomalies • S1 moves from London to Amsterdam. • May have to update many entries. • Violates the “one fact, one place” guideline • These problems are caused by dependences on a proper subset of the primary key.
See also Kroenke’s example on page 95 and the text on page 96.
Second Normal Form (2NF) • A relation is 2NFiff it is 1NF and every non-key attribute is fully functionally dependent on the primary key. • There are no attributes dependent on a proper subset of the primary key.
Table First is NOT 2NF. Some nonkey attributes are not fully dependent on the primary key (S#, P#). • Some are dependent on S# only • The S and SP tables ARE 2NF. • They are a better design in this case. • Similar example in Fig 3-10 on page 106 of text.
How about this table? • Does it contain redundancy? • Are there update anomalies?
Status S# City Transitive dependencies • Suppose a supplier status is determined by the supplier’s city. • That is, City Status. • Since also S# City then S# status is a result of these dependencies. • A Transitive dependency exists as shown below.
Fee SID dorm • Similarly, a Housing table that links a student with a dorm and a residence fee would also likely have a transitive dependence.
Insertion anomalies: • Cannot state fact that a supplier in Rome must have a status of 50 unless there is already a supplier there. • Cannot state fact a dorm has a specific cost unless there is already a student there.
Deletion anomalies • Delete (S5, 30, Athens) • If it’s the ONLY Athens, lose fact that status for Athens must be 30. • Delete (100, Randolph, $3200) from the Housing table. • If that’s the only “Randolph” then you lose the connection between dorm and cost.
Update anomalies • “Change status of London supplier” may mean multiple updates. • Violates the “one fact” – “one place” rule. i.e. that each fact should be stored in one place.
Third Normal Form (3NF) • A relation is 3NFiff it is 2NF and every non-key attribute is nontransitively dependent on the primary key. i.e. non-key attributes are mutually independent. • Again, it’s a consequence of the meaning of the data, not the data itself.
Suppose all London suppliers had a status of 50. • Is that coincidence? • Is it by design?
Question: • Is 3NF better than 2NF? • Maybe. • In the cases presented here, probably so. • An employee table where EmpIDAddressZipCode is not 3NF. • We may not care about AddressZip_Code unless it’s a UPS or Post Office application.
Table Decomposition • Dividing a table into 2 or more tables to achieve a higher normal form. • Previously we divided First into tables S and SP to achieve 2NF.
Now we find that S is not 3NF, so we should decompose S into two tables. • We have options: • SS(S#, Status) and CS(City, Status) • SC(S#, City) and SS(S#, Status) or • SC(S#, City) and CS(City, Status) • Which is best?
Need to ask: • Does the decomposition result in a loss of information? • For example, can we still relate the attributes that have been separated into two tables? • Are the two relations independent of each other?
Option 1: SS(S#, Status) and CS(City, Status) • Cannot get the city of a supplier. • Can you see why?
Option 2: SC(S#, City) and SS(S#, Status) • Relations not independent. • If two suppliers are in the same city, must make sure they have the same status. • Requires monitoring of changes, possibly the use of triggers. Extra work.
CAN get the status of a city but ONLY if there’s a supplier there. Otherwise there’s a loss of information. • Can’t store the status of a city unless there’s a supplier there.
Option 3: SC(S#, City) and CS(City, Status) • Two relations are independent. • No loss of information • Best option
Fee SID dorm Decompose the Housing table into one of • SD(SID, Dorm) and DF(Dorm, Fee) • SD(SID, Dorm) and SF(SID, Fee) • SF(SID, Fee) and DF(Dorm, Fee) Which is better? Construct a similar argument
Best decomposition frequently follows the FD arrows. • This is a guideline, not an absolute rule.
Determinants • Consider SMA(SID, MID, AID) where a student has one advisor for a major and an advisor advises for one major. • This table is 3NF since there is only one non-key attribute. • S2 drops Physics and you may lose the fact that A3 advises for Physics. SID AID MID
Def: If Y is fully functionally dependent on X then X is a determinant. • Def: A tuple is an entry from a relation. The name is rooted in the historical development by E.J. Codd who used mathematical models to describe relations. • Def: An attribute is a candidate key if that attribute uniquely identifies a tuple. A primary key is chosen from a list of candidates keys. • Every candidate key is a determinant.
Boyce-Codd Normal Form (BCNF) • A relation is BCNFif every determinant is a candidate key. • SMA is NOT BCNF since AID is a determinant but not a candidate key.
Possible decompositions: • SA(SID, AID) and AM(AID, MID) • No Loss but relations are not independent. • How do you “Find the major of S1”. • It requires a search of two tables which seems somewhat counterintuitive.
SM(SID, MID) and AM(AID, MID). • Cannot get advisor of a student.
SA(SID, AID) and SM(SID, MID). • Cannot get who advises what. • None of the three possible decompositions seems satisfactory.
Student (S) redundant Advisor (A) Major (M) • Solution: Look at bigger picture (E-R diagram) • Relations: • S, M, A (With a foreign key matching the primary key in M), SM, and SA to implement the many-many relationships
NOTE: With BOTH SM and SA, it is possible for inconsistency to occur. • Could have (S1, M5) in SM; • (S1, A3) in SA; • and have M8 as a foreign key for advisor A3 in the Advisor table. • Would need software or triggers to assure consistency which adds to overhead.
On the other hand, relationship between S and M is derived from relationships between S and A and between A and M. • This provides an argument that the relationship between S and M should not be shown as a separate relationship
Of course, then the fact that “ a student is majoring is something” is NOT explicitly stored. • The design is based on business rules which we assume to be correct. • May not always be the case.
Maybe the business rule that states “a student is majoring in something” is flawed. • Allows a student to choose a major without having an advisor first.
Perhaps a better rule is “a student has an advisor, which determines the major”. • It would be a model that forces student to choose an advisor, which may be a better rule since many students do NOT seek out advisors in timely fashion.
Multivalued Dependencies (MVDs) • Consider SMA(Student, Major, Activity) • A student can have multiple majors and participate in multiple activities.
This relation is BCNF vacuously (There are no determinants) • Can’t store the major of a student unless that student has an activity.