Functional Dependencies

Functional Dependencies Why FD's Meaning of FD’s Keys and Superkeys Inferring FD’s Source: slides by Jeffrey Ullman

Improving Relation Schemas • Set of relation schemas obtained by translating from an E/R diagram might need improvement • modeling with E/R diagrams is an art, not a science; relies on experience and intuition • multiple alternative designs are possible, how to choose? • Problems caused by redundant storage of info • wasted space • anomalies when updating, inserting or deleting tuples • Basic idea: replace a relation schema with a collection of "smaller" schemas. Called decomposition

Improving Relation Schemas • There is a theory for systematically guiding the improvement of relational designs, called normalization • Normalization uses the notion of "constraints" on the information • functional dependencies • multi-valued dependencies • referential integrity constraints

Functional Dependencies • X -> A is an assertion about a relation R that whenever two tuples of R agree on all the attributes of X, then they must also agree on the attribute A. • Say “X -> A holds in R.” • Convention: …, X, Y, Z represent sets of attributes; A, B, C,… represent single attributes. • Convention: no set formers in sets of attributes, just ABC, rather than {A,B,C }.

Example Consumers(name, addr, candiesLiked, manf, favCandy) • Reasonable FD’s to assert: • name -> addr • name -> favCandy • candiesLiked -> manf

Because name -> favCandy Because name -> addr Because candiesLiked -> manf Example Data name addr candiesLiked manf favCandy Janeway Voyager Twizzlers Hershey Smarties Janeway Voyager Smarties Nestle Smarties Spock Enterprise Twizzlers Hershey Twizzlers

FD’s With Multiple Attributes • No need for FD’s with > 1 attribute on right. • But sometimes convenient to combine FD’s as a shorthand. • Example: name -> addr and name -> favCandy become name -> addr favCandy • > 1 attribute on left may be essential. • Example: store candy -> price

Keys of Relations • K is a superkey for relation R if K functionally determines all of R. • K is a key for R if K is a superkey, but no proper subset of K is a superkey.

Example Consumers(name, addr, candiesLiked, manf, favCandy) • {name, candiesLiked} is a superkey because together these attributes determine all the other attributes. • name -> addr favCandy • candiesLiked -> manf

Example, Cont. • {name, candiesLiked} is a key because neither {name} nor {candiesLiked} is a superkey. • name doesn’t -> manf; candiesLiked doesn’t -> addr. • There are no other keys, but lots of superkeys. • Any superset of {name, candiesLiked}.

E/R and Relational Keys • Keys in E/R concern entities. • Keys in relations concern tuples. • Usually, one tuple corresponds to one entity, so the ideas are the same. • But --- in poor relational designs, one entity can become several tuples, so E/R keys and Relational keys are different.

Example Data name addr candiesLiked manf favCandy Janeway Voyager Twizzlers Hershey Smarties Janeway Voyager Smarties Nestle Smarties Spock Enterprise Twizzlers Hershey Twizzlers Relational key = {name candiesLiked} But in E/R, name is a key for Consumers, and candiesLiked is a key for Candies. Note: 2 tuples for Janeway entity and 2 tuples for Twizzlers entity.

Discovering Keys • Suppose schema was obtained from an E/R diagram. • If relation R came from an entity set E, then key for R is the keys of E • If R came from a binary relationship from E1 to E2: • many-many: key for R is the keys of E1 and E2 • many-one: key for R is the keys for E1 (but not the keys for E2) • one-one: key for R is the keys for E1; another key for R is the keys for E2

Key Example name name addr manf Con- sumers Likes Candies husband 2 1 Likes(consumer, candy) Favorite Buddies key: consumer candy Favorite(consumer, candy) wife Married key: consumer Married(husband, wife) Buddies(name1, name2) keys: husband or wife key: name1 name2

More FD’s From Application • Example: “no two courses can meet in the same room at the same time” tells us: hour room -> course. • Ultimately, FD's and keys come from the semantics of the application. • FD's and keys apply to the schema, not to specific instantiations of the relations

Manipulating FD's • Need to be able to reason about FD's in order to support the normalization process (improving relational schemas) • Some preliminaries: • can split FD X -> A B into X -> A, X -> B • can combine FD's X -> A, X -> B into FD X -> A B • Since A -> A is trivially true, no point in having any attribute on the RHS that is also on the LHS

Closure of a Set of FD's • If we have FD's A -> B and B -> C, then it is also true that A -> C. • Ex: If name -> address and address -> phone, then name -> phone. • What about a chain of such deductions? • Called closure

FD Closure Algorithm Input: a set of attributes {A1,…,An} and a set of FD's S • Z := {A1,…,An} • find an FD in S of the form X -> C such that all the attributes in X are in Z but C is not in Z. Add C to Z • repeat step 2 until there is nothing more that can be put in Z • return Z as the closure of {A1,…,An}

Example of Closure Algorithm • Given relation with attributes A, B, C, D, E, F and FD's • AB -> C • BC -> A, BC -> D • D -> E • CF -> B • Compute closure of {A,B}. • Answer: • Z := {A,B} • add C • add A and D • add E • final answer is Z = {A,B,C,D,E}

Use of Closure Algorithm • Now we can check if a particular FD A1 … An -> B follows from a set of FD's S: • compute {A1,…,An}+ using S • if B is in the closure, then the FD follows • otherwise it does not

Example • Given same set of attributes and FD's as in previous example: • Does AB -> D follow? • compute {A,B}+ = {A,B,C,D,E}. Since D is in {A,B}+, YES. • Does D -> A follow? • compute {D}+ = {D,E}. Since A is not in {D}+, NO.

Finding All Implied FD’s • Motivation: “normalization,” the process where we break a relation schema into two or more schemas. • Example: ABCD with FD’s AB ->C, C ->D, and D ->A. • Decompose into ABC, AD. What FD’s hold in ABC ? • Not only AB ->C, but also C ->A !

d1=d2 because C -> D a1b1cd1 a2b2cd2 comes from a1=a2 because D -> A Why? ABCD ABC a1b1c a2b2c Thus, tuples in the projection with equal C’s have equal A’s; C -> A.

Basic Idea • Start with given FD’s and find all nontrivial FD’s that follow from the given FD’s. • Nontrivial = left and right sides disjoint. • Restrict to those FD’s that involve only attributes of the projected schema.

Simple, Exponential Algorithm • For each set of attributes X, compute X+. • Add X ->A for all A in X + - X. • However, drop XY ->A whenever we discover X ->A. • Because XY ->A follows from X ->Ain any projection. • Finally, use only FD’s involving projected attributes.

A Few Tricks • No need to compute the closure of the empty set or of the set of all attributes. • If we find X + = all attributes, so is the closure of any superset of X.

Example • ABC with FD’s A ->B and B ->C. Project onto AC. • A +=ABC ; yields A ->B, A ->C. • We do not need to compute AB + or AC +. • B +=BC ; yields B ->C. • C +=C ; yields nothing. • BC +=BC ; yields nothing.

Example --- Continued • Resulting FD’s: A ->B, A ->C, and B ->C. • Projection onto AC : A ->C. • Only FD that involves a subset of {A,C }.

A Geometric View of FD’s • Imagine the set of all instances of a particular relation. • That is, all finite sets of tuples that have the proper number of components. • Each instance is a point in this space.

Example: R(A,B) {(1,2), (3,4)} {(5,1)} {} {(1,2), (3,4), (1,3)}

An FD is a Subset of Instances • For each FD X ->A there is a subset of all instances that satisfy the FD. • We can represent an FD by a region in the space. • Trivial FD = an FD that is represented by the entire space. • Example: A -> A.

Example: A -> B for R(A,B) A -> B {(1,2), (3,4)} {(5,1)} {} {(1,2), (3,4), (1,3)}

Representing Sets of FD’s • If each FD is a set of relation instances, then a collection of FD’s corresponds to the intersection of those sets. • Intersection = all instances that satisfy all of the FD’s.

Instances satisfying A->B, B->C, and CD->A Example A->B B->C CD->A

Implication of FD’s • If an FD Y -> B follows from FD’s X1 -> A1,…,Xn -> An, then the region in the space of instances for Y -> B must include the intersection of the regions for the FD’s Xi-> Ai. • That is, every instance satisfying all the FD’s Xi-> Aisurely satisfies Y -> B. • But an instance could satisfy Y -> B, yet not be in this intersection.

Example B->C A->C A->B

Functional Dependencies