360 likes | 734 Views
Normalisation. 5. Outline. Boyce-Codd Normal Form (BCNF) normalisation non-loss decomposition Heath’s theorem normalisation process semantic assumptions and FDs CKs decomposition normalisation vs dependency preservation a decomposition may yield to a better solution than another one
E N D
Outline • Boyce-Codd Normal Form (BCNF) • normalisation • non-loss decomposition • Heath’s theorem • normalisation process • semantic assumptions and FDs • CKs • decomposition • normalisation vs dependency preservation • a decomposition may yield to a better solution than another one • either-or situations: normalise or preserve FDs
2NF and 3NF • optional • 2NF • a relation is in 2NF if and only if it is in 1NF and all non-key attributes are irreducibly dependent on the candidate keys • 3NF (Zaniolo) • R is a relation; X is any set of attributes of R; A is any single attribute of R; consider the following conditions: • X contains A • X contains a candidate key of R • A is contained in a candidate key of R • if either of the three is true for every FD X A then R is in 3NF
BCNF • a relation is in Boyce/Codd normal form (BCNF) if and only if every non-trivial irreducible FD has a candidate key as its determinant • informally • the determinant of each relevant FD is a CK
Example • devise examples in class • relations in BCNF • relations not in BCNF
BCNF • any relation can be non-loss decomposed into an equivalent set of BCNF relations • BCNF 3NF 2NF 1NF • BCNF is still not guaranteed to be free of any update anomalies
Normalisation • the process of transforming a relation with redundancies into an “equivalent” set of relations that have less redundancies • “transformation” projection • input :: one relation, say R • output :: many relations, say R1, …, Rn • “equivalent” non-loss decomposition • R1 join R2 … join Rn = R • R1, …, Rn should have normal forms higher than or equal to that of R
Non-loss decomposition • (Patient, Symptom, Doctor, Office, Diagnosis) • semantic assumptions • exercise
Lossy Decomposition • (Patient, Symptom, Doctor, Office, Diagnosis) • semantic assumptions • exercise
Heath’s theorem • can be used as the basis for normalisation • theorem • suppose • R = (A, B, C), where A, B and C are disjoint sets of attributes • AB • then • R = (A, B) join (A, C) • state “in English”
Normalisation – rules of thumb • take as basis for normalisation/Heath’s theorem a “problem” FD • maximise B when applying Heath’s theorem, on the basis of AB • try to maintain a one-to-one correspondence with real life entities
Normalisation • steps • semantic assumptions • FDs • CKs • decomposition
Simple example • (M_id, M_name, Type, Value) • M_id M_name • M_id Type • M_id Value • Type Value • not BCNF • Heath’s theorem for Type Value results • (Type, Value) • (M_id, M_name, Type) • both relations are now in BCNF
Example (R) (project, task, max-budget, duration, payment-rate, contractor, contr-time) FDs: (project, task) max_budget, duration (task, max_budget, duration) payment_rate (project, task, contractor) contr_time (project, task, max-budget, duration, payment-rate, contractor, contr-time)
Example – decomposition for R • Heath’s theorem for R (the initial relation) based on • task, max_budget, duration payment_rate • leads to: • R1 (task, max_budget, duration, payment_rate) • R2 (project, task, max_budget, duration, contractor, contr_time) • R1 is in BCNF • R2 is not in BCNF, due to • project, task max_budget, duration
Example – decomposition for R2 • Heath’s theorem for R2, based on • project, task max_budget, duration • leads to • R21 (project, task, max_budget, duration) • R22 (project, task, contractor, contracted_time) • R21 is in BCNF • R22 is in BCNF
Example – solution • (task, max_budget, duration, payment_rate) • (project, task, max_budget, duration) • (project, task, contractor, contracted_time)
Decomposition – 2 or more solutions • in the normalisation process, it may be possible that a certain (non-loss) decomposition yields to a better solution than another one
Decomposition – 2 solutions – example • Modules(M_id, M_name, Type, Value) • solution #1 • Modules_Descr(M_id, M_name, Type) • Type_Val(Type, Val) • solution #2 • Modules_Descr(M_id, M_name, Type) • Module_Val(M_id, Val) • are they both non-loss? (apply Heath’s theorem) • is there one better than the other?
Solution #1 vs Solution #2 • updates • u1: insert the fact that a 3 semester module is worth 1.5cu • u2: modify 1 semester modules; they are not worth 0.5cu any longer, they are 0.75cu • u3: change the type of a module but forget to change its value • solution #2 • u1 and u2 are impossible or very difficult to perform • u3 is allowed • solution #1 • u1 and u2 are straightforward • u3 is not allowed
Solution #1 vs Solution #2 • solution #1 • more expressive • certain facts cannot be expressed in solution #2; e.g. the value of a new type • updates can be independently performed on the two component relations (i.e. all constraints are properly expressed) • in solution #2: Type Value is lost, so this constraint must be enforced by the user by procedural code • independent projections • updates can be performed independently on each projection, without the danger of ending with inconsistent data
M-id Type M_name M-id Type M_name Value M_id Value Type Independent projections M_name M-id Type Value Solution #1 Solution #2 one transitive : intra one direct : lost all direct : intra all transitive : inter
Independent projections - Risanen • R1 and R2 are two projections of R; R1 and R2 are independent if and only if • every FD in R is a logical consequence of the FDs in R1 and R2 • the common attributes of R1 and R2 for a candidate key for at least one of R1 or R2 • atomic relation • cannot be decomposed into independent projections
Dependency preservation • R was decomposed (normalisation) into R1, …, Rn • S - the set of FDs for R • S1, …, Sn - the set of FDs for R1, …, Rn (each Si refers to only the attributes of Ri) • S’ = S1 … Sn (usually, S’ S) • the decomposition is dependency preserving if S’+ = S+
Normalisation vs dependency preservation • there are cases when there is an either-or situation regarding the normalisation and the preserving of functional dependencies: • either the relation is normalised and some FDs are lost • or, some FDs are not lost (they are expressed in the original relation), but the relation is not in its higher normal form possible • in this case, no solution is better than the other • other criteria will have to be considered to judge better
Patient Doctor Disease Normalisation vs dependency preservation: Example • a patient is treated by a single doctor for a certain disease • each doctor only treats one kind of disease • a doctor can treat more than one patient • is this relation BCNF? • can you identify update anomalies? • consider also (Patient, Disease, Doctor, Treatment) • with Patient, Disease Treatment
Possible decompositions non-loss? (choose PKs) non-loss? (choose PKs) Heath’s theorem (choose PKs)
BCNF vs dependency preservation and do not enforce a FD existing in the original specification, namely: e.g. a patient can be given two doctors that treat the same disease (the system will not disallow this); the constraint would have to be maintained by procedural code
BCNF vs dependency preservation • not every FD is expressible through normalisation • when the relation was in its original form (3NF) • (Patient, Disease) Doctor was expressed • a doctor could not be assigned to more than one patient-disease • Doctor Disease was not expressed • generated update anomalies • in BCNF (decomposed) • Doctor Disease was expressed • (Patient, Disease) Doctor was not expressed • generated update anomalies (refer to previous slide) • this latter FD would not have been expressed even if the decomposition in all three 2-attribute relations had been considered
Conclusions • normal forms : formalisation of common sense • art engineering • possibility for automation; difficult, because of non-determinism (more than one choices at one step) • BCNF • always achievable • notalways free of update anomalies, because it cannot always express all the FDs existing in the problem • there are higher normal forms (4NF, 5NF) • defined on the basis of other concepts (not FDs)