450 likes | 841 Views
I am a pig. Then, you must have 4 legs!. Decomposition and Functional Dependency. Outline. Redundancy Decomposition at first glance Functional dependency Dependency properties. Why?. We have learnt that, an ER diagram can be directly converted to relational tables.
E N D
I am a pig. Then, you must have 4 legs! Decomposition andFunctional Dependency
Outline • Redundancy • Decomposition at first glance • Functional dependency • Dependency properties
Why? • We have learnt that, an ER diagram can be directly converted to relational tables. • However, these tables may contain redundancy, namely, repetition of the same information. • To eliminate redundancy (as much as possible), we need to refine the relational tables (known as normalization in db terminology).
An example of redundancy • Negative impacts of redundancy • Higher space consumption. • Higher update overhead • Imagine the operations we need to do, if we raise the hourly-wage of B1 to 120. • Insertion/update anomaly • The DBA must prevent insertion of tuples resulting in inconsistent hourly-wages, e.g., (7, B1, 200). • Solution?
An example of redundancy (cont.) rating id hourly-wages • In fact, the two tables could have been obtained directly, if we had designed a “perfect ER diagram”. • It is not realistic to assume that we can always discover the perfect ER diagram. • We need a tool to refine our design, even if we started from an imperfect diagram. have salary employee
Outline • Redundancy • Decomposition at first glance • Functional dependency • Dependency properties
Basic questions to ask • Do we need to decompose a relation? • What problems (if any) does a given decomposition cause? • Lossless-join • Dependency-preservation
Decomposition • Just now, we decomposed EMPLOYEE into two tables, to avoid redundancy. • Note that, the new tables can reproduce the original EMPLOYEE. • This is a rule we must obey:the decomposed tables must be able to reproduce the original table. Lossless Join!
Illegal decomposition • This decomposition violates the rule mentioned earlier. • How do we judge whether a decomposition is legal?
Another illegal decomposition Lossless Join? NO! • The judgment is not that easy!
legal decomposition • Checking the “legitimacy” of decomposition: • The new tables must have common attribute(s). • The common attribute(s) must be the candidate key of at least one new table.
Some confusing notions Better Definitions Later • Key? An abbreviation of candidate key. • Candidate key? A minimal set of attributes that uniquely identifies every tuple. • Primary key? A candidate key selected by a database designer • Superkey Any superset of a candidate key.
Decomposition may not be obvious • How to decompose the above table to minimize redundancy? • Before we can answer the question, we need to gain more understanding about redundancy.
Outline • Redundancy • Decomposition at first glance • Functional dependency • Dependency properties
Why does redundancy exist? • Reason: • ratingdetermineshourly-wages. • Once the tuple’s rating is known, its hourly-wages is also decided. • A concise representation: ratinghourly-wages.
Functional dependency • ratinghourly-wages • is called a functional dependence (FD). • Do we have “ratingid”? • If tuple’s rating is known, are we sure about its id? • No.
FD (cont.) • Do we have hourly-wagesrating? • Yes, different ratings havedifferent hourly-wages. • Namely, if we know a tuple’s hourly-wages, then its rating has only one possibility. • Do we have idrating? • Yes, because each employee has only a single rating.
FD (cont.) • Do we have idid? • Of course, known as a trivial FD. • Do we have id (rating, hourly-wages)? • Yes, because each employee has only a single (rating, hourly-wages)-combination. My id = 1 You are at scale B1 and earn $100 per hour.
FD (cont.) • Do we have (id, rating) hourly-wages? • In English, if a tuple’s (id, rating)-combination is decided, how many possibilities for hourly-wages? • In fact, once (id, rating) is decided, we know exactly which employee is concerned. • Therefore, hourly-wages has only one possibility. • So “(id, rating) hourly-wages” is true.
Functional dependency definition • Let L and R be two sets of attributes. • LR means that • if we know a tuple’s L, then there is only a singlepossibility for the tuple’s R! • I.e., if we know L, we know R. • ratinghourly-wages • hourly-wagesrating • idrating • idid • id (rating, hourly-wages) • (id, rating) hourly-wages L R!
Secret of redundancy • In general, a table has redundancy, if there is a FD, whose left handside is not a candidate key. • For example, the only candidate key of EMPLOYEE is id. • EMPLOYEE has redundancy, because we have ratinghourly-wages.
Where are FDs from? • Two channels. • First, common senses. • HK-idname. • countrycapital. • (father, mother) eldest-child. • … • Second, special constraints of the underlying application. • If every employee has her/his own office • emp-idoffice-number. • If every customer can have a single account • cust-idacc-id.
Outline • Redundancy • Decomposition at first glance • Functional dependency • Dependency properties
A candidate key determines all • For example, a candidatekey of EMPLOYEE is id. • Thus, id determines any combination of the attributes. • idid • idrating • idhourly-wages • idrating, hourly-wages • idid,rating, hourly-wages • If we know the tuple’s id, then its any attribute has only 1 possibility.
Super key determines all • A candidate key is id. • Then, (id, rating) determines any combination of the attributes. • (id, rating) id • (id, rating) rating • (id, rating) hourly-wages • (id, rating) rating, hourly-wages • (id, rating) id,rating, hourly-wages • We only need id to claim that the tuple’s any attribute has only 1 possibility. • So, of course, given its (id, rating), we can make the same claim.
Trivial functional dependences • “id id”is trivially true. • Put it in English, and you willfind out. • “If we know a tuple’s id = 1, then we know its id.” I have 4 legs. Guess how many legs I have. Don’t waste my time.
Trivial functional dependences (cont.) • L R is trivial, if L contains R. • Examples: • (id, rating)id • (id, rating)rating • (id, rating)(id, rating) • (id, hourly-wages)id • (id, rating, hourly-wages)(id, rating) • …
Inference rules for FDs • Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold • One example: id rating rating hourly-wages What can we derive from these two FDs? id hourly-wages • More rules to come in next page
Union • Given • cust-id cust-name (1) • cust-id cust-city (2) • we can derive • cust-id(cust-name, cust-city)(3) • Reasoning: • By (1), if we know cust-id, then we know cust-name. • By (2), if we know cust-id, then we know cust-city.. • Hence, if we know cust-id, then we know the (cust-name, cust-city)-combination.
Transitivity • Given • creditcard-nocust-id (1) • cust-idcust-name (2) • we can derive • creditcard-nocust-name (3) • Reasoning: • By (1), if we know creditcard-no, then we know cust-id. • By (2), if we know cust-id, then we know cust-name. • Hence, if we know creditcard-no, then we know cust-name.
Augmentation • Given • creditcard-no cust-id(1) • we can derive • (creditcard-no, branch-id) (cust-id, branch-id) (2) • Reasoning: • By (1), if we know creditcard-no40101342, we know cust-id1. • Hence, if (40101342, B1) = (creditcard-no, branch-id) of a tuple, we know that (1, B1) = (stu-id , branch-id) of the tuple
FD derivation • Given • creditcard-nocust-id (1) • (cust-id, branch-id)acc-id (2) • We can derive (creditcard-no, branch-id) acc-id as follows. • From(1), we have • (creditcard-no, branch-id) (cust-id,branch-id)(3) • Augementation • From (3) and (2), by transitivity, we have • (creditcard-no, branch-id) acc-id
Summary of Inference Rules • Let R be a relation schema, W, X, Y,Z be subsets of R. • Reflexivity • If Y ⊆ X, then X Y (trivial FD’s) • Augmentation • If X Y, then XZ YZ, for every Z • Transitivity • If X Y and Y Z, then X Z • Union (Combining) Rule • If X Y and X Z, then X YZ • Decomposition (Splitting) Rule • If X YZ, then X Y and X Z • Pseudo-transitivity Rule • If X Y and WY Z, then XW Z
Prove FDs • Consider R(A, B, C, D, E) with FDs F ={A→B, B →D, DE→C} • Prove or disprove F |= AE → C • {A→B, B →D, DE→C} • |= {A → D, DE→C} (Transitivity Rule: • If X Y and Y Z, then X Z) • |= {AE → C} (Pseudo-transitivity Rule • If X Y and WY Z, then XW Z)
Disprove FDs • Consider R(A, B, C, D, E) with FDs F ={A→B, B →D, DE→C} • Prove or disprove F |= A → C Find a counter example. ABCDE --------------------------- 24313 24614
Closure Test • A standard way to test if FDs hold is to compute the closure of Y, denoted Y+ • Note that Y + is a set of attributes, not FDs • Basis step:Y+ = Y. • Induction: • Look for an FD’s left side X that is a subset of the current Y+ • If the FD is X -> A, add A to Y+.
Prove FDs: Arevisit • Consider R(A, B, C, D, E) with FDs F ={A→B, B →D, DE→C} • Prove or disprove F |= AE → C AE+=AEBDC SinceC AE+, AE →C is implied by F
Disprove FDs: A revisit • Consider R(A, B, C, D, E) with FDs F ={A→B, B →D, DE→C} • Prove or disprove F |= A → C A+=ABD SinceC A+,A → CisnotimpliedbyF
Find candidate keys • Consider a relation with schema R(A, B, C, D) and FD = {AB → C, C → D, D → A}.What are all the candidate keys of R?What are all the superkeys of R that are not candidate keys? • Listallpossiblecombinationsofallattributesandcheck! • A+=A;B+=B;C+=CDA;D+=DA • AB+=ABCD;AC+=ACD;AD+=AD;BC+=BCDA;BD+=BDAC CD+=CDA • ABC+=ABCD;ABD+=ABCD;ACD+=ACD;BCD+=ABCD • ABCD+=ABCD
Can we do better? • Tricks for finding the key: • If an attribute neverappears on the RHS of any FD, it must be part of the key • If an attribute never appears on the LHS of any FD, but appears on the RHS of any FD, it must not be part of any key
Anexample Consider R = {A, B, C, D, E, F, G, H} with a set of FDs F = {CD→A, EC→H, GHB→AB, C→D, EG→A, H→B, BE→CD, EC→B} Find all the candidate keys of R
Anexample(cont.) F = {CD→A, EC→H, GHB→AB, C→D, EG→A, H→B, BE→CD, EC→B} • First, we notice that: • EFGnever appear on RHS of any FD. So, EFGmust be part of ANY key of R • Anever appears on LHS of any FD, but appears on RHS of some FD. So, Ais not part of ANY key of R • We now see if EFG is itself a key… • EFG+ = EFGA ≠ R; So, EFGalone is not key
Anexample(cont.) • Checking by adding single attribute with EFG(except A): • BEFG+ = ABCDEFGH = R; it’s a key [BE→CD, EG→A, EC→H] • CEFG+ = ABCDEFGH = R; it’s a key [EG→A, EC→H, H→B, BE→CD] • DEFG+ = ADEFG ≠ R; it’s not a key [EG→A] • EFGH+ = ABCDEFGH = R; it’s a key [EG→A, H→B, BE→CD] • If we add any further attribute(s), they will form the superkey. Therefore, we can stop here searching for candidate key(s). • Therefore, candidate keys are: {BEFG, CEFG, EFGH}