Decomposition and Functional Dependency

I am a pig. Then, you must have 4 legs! Decomposition andFunctional Dependency

Outline • Redundancy • Decomposition at first glance • Functional dependency • Dependency properties

Why? • We have learnt that, an ER diagram can be directly converted to relational tables. • However, these tables may contain redundancy, namely, repetition of the same information. • To eliminate redundancy (as much as possible), we need to refine the relational tables (known as normalization in db terminology).

An example of redundancy

An example of redundancy • Negative impacts of redundancy • Higher space consumption. • Higher update overhead • Imagine the operations we need to do, if we raise the hourly-wage of B1 to 120. • Insertion/update anomaly • The DBA must prevent insertion of tuples resulting in inconsistent hourly-wages, e.g., (7, B1, 200). • Solution?

An example of redundancy (cont.)

An example of redundancy (cont.) rating id hourly-wages • In fact, the two tables could have been obtained directly, if we had designed a “perfect ER diagram”. • It is not realistic to assume that we can always discover the perfect ER diagram. • We need a tool to refine our design, even if we started from an imperfect diagram. have salary employee

Basic questions to ask • Do we need to decompose a relation? • What problems (if any) does a given decomposition cause? • Lossless-join • Dependency-preservation

Decomposition • Just now, we decomposed EMPLOYEE into two tables, to avoid redundancy. • Note that, the new tables can reproduce the original EMPLOYEE. • This is a rule we must obey:the decomposed tables must be able to reproduce the original table. Lossless Join!

Illegal decomposition • This decomposition violates the rule mentioned earlier. • How do we judge whether a decomposition is legal?

Another illegal decomposition Lossless Join? NO! • The judgment is not that easy!

legal decomposition • Checking the “legitimacy” of decomposition: • The new tables must have common attribute(s). • The common attribute(s) must be the candidate key of at least one new table.

Some confusing notions Better Definitions Later • Key? An abbreviation of candidate key. • Candidate key? A minimal set of attributes that uniquely identifies every tuple. • Primary key? A candidate key selected by a database designer • Superkey Any superset of a candidate key.

Decomposition may not be obvious • How to decompose the above table to minimize redundancy? • Before we can answer the question, we need to gain more understanding about redundancy.

Why does redundancy exist? • Reason: • ratingdetermineshourly-wages. • Once the tuple’s rating is known, its hourly-wages is also decided. • A concise representation: ratinghourly-wages.

Functional dependency • ratinghourly-wages • is called a functional dependence (FD). • Do we have “ratingid”? • If tuple’s rating is known, are we sure about its id? • No.

FD (cont.) • Do we have hourly-wagesrating? • Yes, different ratings havedifferent hourly-wages. • Namely, if we know a tuple’s hourly-wages, then its rating has only one possibility. • Do we have idrating? • Yes, because each employee has only a single rating.

FD (cont.) • Do we have idid? • Of course, known as a trivial FD. • Do we have id (rating, hourly-wages)? • Yes, because each employee has only a single (rating, hourly-wages)-combination. My id = 1 You are at scale B1 and earn $100 per hour.

FD (cont.) • Do we have (id, rating) hourly-wages? • In English, if a tuple’s (id, rating)-combination is decided, how many possibilities for hourly-wages? • In fact, once (id, rating) is decided, we know exactly which employee is concerned. • Therefore, hourly-wages has only one possibility. • So “(id, rating) hourly-wages” is true.

Functional dependency definition • Let L and R be two sets of attributes. • LR means that • if we know a tuple’s L, then there is only a singlepossibility for the tuple’s R! • I.e., if we know L, we know R. • ratinghourly-wages • hourly-wagesrating • idrating • idid • id (rating, hourly-wages) • (id, rating) hourly-wages L R!

Secret of redundancy • In general, a table has redundancy, if there is a FD, whose left handside is not a candidate key. • For example, the only candidate key of EMPLOYEE is id. • EMPLOYEE has redundancy, because we have ratinghourly-wages.

Where are FDs from? • Two channels. • First, common senses. • HK-idname. • countrycapital. • (father, mother) eldest-child. • … • Second, special constraints of the underlying application. • If every employee has her/his own office • emp-idoffice-number. • If every customer can have a single account • cust-idacc-id.

A candidate key determines all • For example, a candidatekey of EMPLOYEE is id. • Thus, id determines any combination of the attributes. • idid • idrating • idhourly-wages • idrating, hourly-wages • idid,rating, hourly-wages • If we know the tuple’s id, then its any attribute has only 1 possibility.

Super key determines all • A candidate key is id. • Then, (id, rating) determines any combination of the attributes. • (id, rating) id • (id, rating) rating • (id, rating) hourly-wages • (id, rating) rating, hourly-wages • (id, rating) id,rating, hourly-wages • We only need id to claim that the tuple’s any attribute has only 1 possibility. • So, of course, given its (id, rating), we can make the same claim.

Trivial functional dependences • “id id”is trivially true. • Put it in English, and you willfind out. • “If we know a tuple’s id = 1, then we know its id.” I have 4 legs. Guess how many legs I have. Don’t waste my time.

Trivial functional dependences (cont.) • L R is trivial, if L contains R. • Examples: • (id, rating)id • (id, rating)rating • (id, rating)(id, rating) • (id, hourly-wages)id • (id, rating, hourly-wages)(id, rating) • …

Inference rules for FDs • Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold • One example: id rating rating hourly-wages What can we derive from these two FDs? id hourly-wages • More rules to come in next page

Union • Given • cust-id cust-name (1) • cust-id cust-city (2) • we can derive • cust-id(cust-name, cust-city)(3) • Reasoning: • By (1), if we know cust-id, then we know cust-name. • By (2), if we know cust-id, then we know cust-city.. • Hence, if we know cust-id, then we know the (cust-name, cust-city)-combination.

Transitivity • Given • creditcard-nocust-id (1) • cust-idcust-name (2) • we can derive • creditcard-nocust-name (3) • Reasoning: • By (1), if we know creditcard-no, then we know cust-id. • By (2), if we know cust-id, then we know cust-name. • Hence, if we know creditcard-no, then we know cust-name.

Augmentation • Given • creditcard-no cust-id(1) • we can derive • (creditcard-no, branch-id)  (cust-id, branch-id) (2) • Reasoning: • By (1), if we know creditcard-no40101342, we know cust-id1. • Hence, if (40101342, B1) = (creditcard-no, branch-id) of a tuple, we know that (1, B1) = (stu-id , branch-id) of the tuple

FD derivation • Given • creditcard-nocust-id (1) • (cust-id, branch-id)acc-id (2) • We can derive (creditcard-no, branch-id) acc-id as follows. • From(1), we have • (creditcard-no, branch-id)  (cust-id,branch-id)(3) • Augementation • From (3) and (2), by transitivity, we have • (creditcard-no, branch-id)  acc-id

Summary of Inference Rules • Let R be a relation schema, W, X, Y,Z be subsets of R. • Reflexivity • If Y ⊆ X, then X Y (trivial FD’s) • Augmentation • If X Y, then XZ YZ, for every Z • Transitivity • If X Y and Y Z, then X Z • Union (Combining) Rule • If X Y and X Z, then X YZ • Decomposition (Splitting) Rule • If X YZ, then X Y and X  Z • Pseudo-transitivity Rule • If X Y and WY Z, then XW Z

Prove FDs • Consider R(A, B, C, D, E) with FDs F ={A→B, B →D, DE→C} • Prove or disprove F |= AE → C • {A→B, B →D, DE→C} • |= {A → D, DE→C} (Transitivity Rule: • If X Y and Y Z, then X Z) • |= {AE → C} (Pseudo-transitivity Rule • If X Y and WY Z, then XW Z)

Disprove FDs • Consider R(A, B, C, D, E) with FDs F ={A→B, B →D, DE→C} • Prove or disprove F |= A → C Find a counter example. ABCDE --------------------------- 24313 24614

Closure Test • A standard way to test if FDs hold is to compute the closure of Y, denoted Y+ • Note that Y + is a set of attributes, not FDs • Basis step:Y+ = Y. • Induction: • Look for an FD’s left side X that is a subset of the current Y+ • If the FD is X -> A, add A to Y+.

Prove FDs： Arevisit • Consider R(A, B, C, D, E) with FDs F ={A→B, B →D, DE→C} • Prove or disprove F |= AE → C AE+=AEBDC SinceC  AE+, AE →C is implied by F

Disprove FDs： A revisit • Consider R(A, B, C, D, E) with FDs F ={A→B, B →D, DE→C} • Prove or disprove F |= A → C A+=ABD SinceC A+,A → CisnotimpliedbyF

Find candidate keys • Consider a relation with schema R(A, B, C, D) and FD = {AB → C, C → D, D → A}.What are all the candidate keys of R?What are all the superkeys of R that are not candidate keys? • Listallpossiblecombinationsofallattributesandcheck! • A+=A;B+=B;C+=CDA;D+=DA • AB+=ABCD;AC+=ACD;AD+=AD;BC+=BCDA;BD+=BDAC CD+=CDA • ABC+=ABCD;ABD+=ABCD;ACD+=ACD;BCD+=ABCD • ABCD+=ABCD

Can we do better? • Tricks for finding the key: • If an attribute neverappears on the RHS of any FD, it must be part of the key • If an attribute never appears on the LHS of any FD, but appears on the RHS of any FD, it must not be part of any key

Anexample Consider R = {A, B, C, D, E, F, G, H} with a set of FDs F = {CD→A, EC→H, GHB→AB, C→D, EG→A, H→B, BE→CD, EC→B} Find all the candidate keys of R

Anexample(cont.) F = {CD→A, EC→H, GHB→AB, C→D, EG→A, H→B, BE→CD, EC→B} • First, we notice that: • EFGnever appear on RHS of any FD. So, EFGmust be part of ANY key of R • Anever appears on LHS of any FD, but appears on RHS of some FD. So, Ais not part of ANY key of R • We now see if EFG is itself a key… • EFG+ = EFGA ≠ R; So, EFGalone is not key

Anexample(cont.) • Checking by adding single attribute with EFG(except A): • BEFG+ = ABCDEFGH = R; it’s a key [BE→CD, EG→A, EC→H] • CEFG+ = ABCDEFGH = R; it’s a key [EG→A, EC→H, H→B, BE→CD] • DEFG+ = ADEFG ≠ R; it’s not a key [EG→A] • EFGH+ = ABCDEFGH = R; it’s a key [EG→A, H→B, BE→CD] • If we add any further attribute(s), they will form the superkey. Therefore, we can stop here searching for candidate key(s). • Therefore, candidate keys are: {BEFG, CEFG, EFGH}

Decomposition and Functional Dependency

Decomposition and Functional Dependency

Presentation Transcript

Tutorial 4: Functional Dependency

Functional Analysis and Physical Decomposition

Dependency Preserving Decomposition

Keys and Functional Dependency

Functional Dependency

Functional Dependency

Functional Decomposition

Functional Dependency

Functional Decomposition

Functional Dependency Grammars

Chapter 4 Functional Dependency and Normalization

Tutorial 7 : Functional Dependency

Introduction to Functional Decomposition

Analysis SW– Functional Decomposition

Functional Dependency

Functional Dependency and Normalization

Functional Decomposition

Functional Dependency and Normalization

More Functional Decomposition

More Functional Decomposition

Introduction to Functional Decomposition