260 likes | 474 Views
Logical Database Design (1 of 3). John Ortiz. Introduction. The logical design is a process of refining DB schema based on functional dependencies. Why is a logical design needed?
E N D
Logical Database Design (1 of 3) John Ortiz
Introduction • The logical design is a process of refining DB schema based on functional dependencies. • Why is a logical design needed? • Relations translated from ER may have introduced data redundancy which leads to inconsistencies and update anomalies. • What is involved in logical design? • Check for normal forms • If necessary, decompose relations Logical Database Design (1)
Students A Database with Redundancy Assume all students of same major are assigned to the same academic advisor. • (FD): major Advisor, Office • What problems does it have? Logical Database Design (1)
Students Update Anomalies • Insertion anomaly: How to add a new major? • Modification anomaly: What would happen if we change office of Smith in the first tuple? • Deletion anomaly: What would happen if Scott is deleted? Logical Database Design (1)
Major_Advisor Students A Better Design • Decompose Students into two relations • Decomposition can remove redundancy • It may also cause problems if not done carefully. Logical Database Design (1)
Cardinality Ratio • The number of relationship instances that an entity can participate in. • There are three: • One-to-one (1:1) • One-to-Many (1:M) • Many-to-many (M:N) • Note: “1” may mean either zero or one and “many” may mean zero to the DBMS maximum Logical Database Design (1)
1:1 Relationships • A one-to-one relationship simply means that for any single instance of one entity, it may participate in zero or one instance in the related entity • E.g. Employee manages Department • An employee may or may not be a department manager, however, each department needs one manager • Employee has partial participation, generally a department would have total participation Logical Database Design (1)
1:1 Relationships • The foreign key should be placed in the entity with total participation • If both have total participation, the location of the foreign key is arbitrary • Additionally, if both participations are total, the two entities may be merged • Why would you not want to do this? • Access Restrictions • Special Attributes • Referenced by another entity • Performance in Distributed Databases Logical Database Design (1)
Tool Bag ID Color Emp_ID Job StartDate Address Tool Bag ID Emp_ID Salary Med Recs Emp_ID Flying Hours Storage Bin Tool Bag ID 1:1 Relationships Maint only! • Access Restrictions • Special Attributes • Referenced by another entity AIRLINE EMPLOYEES 1 Pilots only! 1 1 Confidential! Logical Database Design (1)
1:M Relationships • A one-to-many relationship means that for any single instance of one entity, it may participate in zero or more instances in the related entity, but the related entity may only participate in zero or one instance. • Subordinate to supervisor. Each subordinate generally will have one supervisor, but each supervisor usually may have zero or more employees. • Not absolute! Sometimes a subordinate may have >1 supervisors – consider this in design! Logical Database Design (1)
1:M Relationships • The foreign keyalways goes in the “many” part of the relationship! • This is counter-intuitive, be careful! • The FK must always reference a PK in the related entity or be null. • referential integrity • What results from placing the FK in the “one” part of the relationship? • Can part of a PK also be a FK? • That coupled with the “existence dependancy” indicates which type of entity? Logical Database Design (1)
1:M Relationship Design Thought • Consider the attribute “class” of a STUDENT entity. Given that all possible values are “Freshman”, “Sophomore”, “Junior”, “Senior”, or “Grad Student.” Is there any reason to place it in its own separate entity? Logical Database Design (1)
M:N Relationships • A many-to-many relationship means that for any single instance of one entity, it may participate in zero or more instances in the related entity, and the related entity may participate in zero or more instances as well. • Subordinate to supervisor. Consider allowing each subordinate to have zero or more supervisors. • Now where does the FK go? Logical Database Design (1)
M:N Relationships • Actually, there are two FKs, one for each entity. • These two keys are placed in their own entity. • Together they make up the PK of that entity. • In this case, all FKs must reference a valid PK in the related entity – why? • This entity may be called a link table, join table, junction table, or simply many-to-many relationship. Logical Database Design (1)
M:N Relationships • The link table may contain it’s own attributes. • For example, an evaluation by a supervisor. • That evaluation may NOT go in the subordinate entity, because there may be others from different supervisors. • It can not go in the supervisor entity because there may be more than one subordinate. • Only the subordinate/supervisor entities together can uniquely determine that evaluation. • Consider course grade vs. G.P.A. Logical Database Design (1)
Functional Dependencies (FDs) • Let R(A1, ..., An) be a relation schema. Let X and Y be two subsets of {A1, ..., An}, i.e., X, Y R. R satisfies a functional dependency, denoted by X Y, if in every legal instance r(R), for any pair of tuples t1 and t2 in r(R), ift1[X] = t2[X], then t1[Y] = t2[Y]. • If X Y, we say that “X (functionally) determines Y”. • X may also be called a determinant • FDs are about ALL states of a relation. Logical Database Design (1)
An Example of FDs • Consider a relation Student(SID, Name, Address, Phone, Major) Some FDs may be: S NM, NA P, … • Notations • Use single letter for attributes, i.e., S for SID, N for Name, A for Address, … • Use the relation name for the set of attributes of relation, i.e., Students for {SNAPM} Logical Database Design (1)
Properties of FDs • X Y says redundant X-values always cause the redundancy of Y-values. • FDs are given by DBAs or DB designers. • FDs are enforced/guaranteed by DBMS. • Given an instance r of a relation R, we can only determine that some FD is not satisfied by R, but can not determine if an FD is satisfied by R. Logical Database Design (1)
R Which FD is satisfied? • Which FD does R(A, B, C, D) satisfy, if the following instance is the only instance of R? A B, A C, C A, A D, B D, AB D Logical Database Design (1)
Closure of FDs • Given some FDs, new FDs can often be inferred. E.g., from S NM and M D, we can infer S D. Let F be a set of FDs satisfied by R, and X, Y, K be subsets of attributes of R. • F (logically) implies X Y if whenever R satisfies (all FDs in) F, R also satisfies X Y. • The closure of F, denoted by F+, is the set of FDs implied by F, that is, F+ = {X Y | F implies X Y}. Logical Database Design (1)
Candidate Key & Trivial FDs • Let F be a set of FDs satisfied by R, and K R. K is a candidate key (CK) of R if • K R is in F+ (i.e., F implies K R); and • there is no X K, such that X R is also in F+ (minimality). (no part of K R ) • An FD X Y is trivial if Y X. • Some trivial FDs are: AB B, A , . • Find the candidate keys and trivial FDs in the next example. What is ABC? Logical Database Design (1)
F+: An Example * Example: Let F = {AB C, C B} be a set of FDs satisfied by R(A, B, C). F+ = {A , A A, AB , AB A, AB B, AB C, AB AB, AB AC, AB BC, AB ABC, AC , AC A, AC B, AC C, AC AB, AC AC, AC BC, AC ABC, ABC , ABC A, ABC B, ABC C, ABC AB, ABC AC, ABC BC, ABC ABC, B , B B, BC , BC B, BC C, BC BC, C , C B, C C, C BC, } Logical Database Design (1)
Inference Rules for FDs Let F be a set of FDs satisfied by R, and X, Y, Z R. • Armstrong’s Axioms (1974) for deriving new FDs (IR1) Reflexivity: If X Y, then X Y is satisfied by R. (IR2) Augmentation: If X Y is satisfied by R, then XZ YZ is also satisfied by R. (IR3) Transitivity: If X Y and Y Z are satisfied by R, then so is X Z. Logical Database Design (1)
Compute FD Closure F+ • Let F be a set of FDs. To compute F+, start with FDs in F, repeatedly apply IR1-IR3, until no new FD can be derived. Theorem: Armstrong's Axioms are sound and complete. • Sound: no incorrect FD will be added to F+. • Complete: all FDs in F+ will be generated. Logical Database Design (1)
Additional Axioms • Additional rules derivable from Armstrong's Axioms. (IR4) Decomposition: { X YZ } { X Y, X Z } (IR5) Union: { X Y, X Z } X YZ (IR6) Pseudotransitivity: { X Y, WY Z } WX Z Logical Database Design (1)
Closure of Attributes How to determine if F implies X Y? • Method 1: Check if X Y is in F+. • Problem: F+ is too expensive to compute! • Method 2: Compute closure of X under F. X+ = { A |= X A F+ } • X+ is the set of attributes that are functionally determined by X under F. Theorem: X Y F+ if and only if Y X+. Proof: Use IR4 & IR5. Logical Database Design (1)