1.36k likes | 1.65k Views
PART 2. RELATIONAL DATABASES. Chapter 7 Relational-Database Design normalization of relational schemas. Main Contents in This Chapter. VII-1 Why normalization needed? pitfalls in relational-database design (§7.1 ) principles of relation normalization
E N D
PART 2 RELATIONAL DATABASES
Chapter 7 Relational-Database Designnormalization of relational schemas
Main Contents in This Chapter • VII-1 Why normalization needed? • pitfalls in relational-database design (§7.1 ) • principles of relation normalization • VII- 2 Functional dependency (§7.4) • VII- 3 Definitions of normal forms • the first normal form(§7.2 ), second normal form, third normal form(§7.3.4 ), BCNF(§7.3.2 ) • VII- 4 Decomposition properties (§7.4.4, §7.4.5) • VII- 5 3NF decomposition (§7.5.2) • VII- 6 BCNF decomposition (§7.5.1) Database System Concepts - Chapter 7 Relational-Database Design -
VII-1 Why Normalization Needed ? VII-1-1 Pitfalls in Relational-Database design (§7.1) • As shown in Fig. 7.0.1, logical DBS design consists of • initial relational schema generating (§2.9) • relational schema normalizing • A bad DB design, i.e. schema not being normalized well, may result in • repetition of information • inability to represent certain information Database System Concepts - Chapter 7 Relational-Database Design -
Application area/problem in real world requirements analysis Specification of functional requirements DBMS independent conceptual DBS design Conceptual DBS schema , i.e. E-R schema Initial relational schema generating(§2.9) DBMS dependent logical DBS design Relational schema normalizing (chapter 7) logical DBS schema , i.e. relational data schema physical DBS design physical DBS schema about physical storage structure and access method Fig.7.0.1 DBS design
VII-1-1 Pitfalls in Relational-Database design (cont.) Fig.7.0.2 Case study used in chapter 7 Database System Concepts - Chapter 7 Relational-Database Design - return
VII-1-1 Pitfalls in Relational-Database design (cont.) • For all schema concerning loans in Fig.7.0.2 • Branch-schema, Borrower-schema, Loan-schema • combine these three relations into one single relation • Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) • relation lending is shown in Fig.7.1 Database System Concepts - Chapter 7 Relational-Database Design -
t1 Perryridge, Horseneck, 1700000, Adams, L-31, 1500 New-York, Long-island, 200000, null, null, null t2 Fig.7.1 Sample lending(Lending-schema) relation
VII-1-1 Pitfalls in Relational-Database design (cont.) • Inserting problem and information redundancy • adding a new loan to the DB • the loan is made by the Perryridge branch to Adams in the amount of $1500, and loan-number is L-31 • tuple t1 = (Perryridge, Horseneck, 1700000, Adams, L-31, 1500) is inserted into DB • data for branch-name, branch-city, assets are repeated for each loan in the 3rd row, the10th row, and the last row that a branch“Perryridge”makes, space is wasted Database System Concepts - Chapter 7 Relational-Database Design -
VII-1-1 Pitfalls in Relational-Database design (cont.) • Deleting problem • e.g. the branch Perryridge is canceled, all loans in this branch should then be removed • in Fig 7.1, every tuples containing Perryridge branch should be deleted • the 3rd row and the10th row Database System Concepts - Chapter 7 Relational-Database Design -
VII-1-1 Pitfalls in Relational-Database design (cont.) • Updating problem and information redundancy • information redundancy in DB complicates updating, introducing possibility of inconsistency of assets value • e.g. changeassets of Perryridge branch from 1700000 to 1900000, • in Fig7.1, every tuples belonging to Perryridgebranch should be updated Database System Concepts - Chapter 7 Relational-Database Design -
VII-1-1 Pitfalls in Relational-Database design (cont.) • Information Representation Problem • to represent directly information about a new-openedbranch (branch-name, branch-city, assets) in which there no exists loan, a tuple containing null vaules, such as t2 = (New-York, Long-island, 200000, null, null, null) , is inserted into the relation lending • null values in DB complicate data handling in DBS • Why ? • lending(branch-name, branch-city, assets, customer-name, loan-number, amount) includes two types of information about only bank and information about loan Database System Concepts - Chapter 7 Relational-Database Design -
VII-1-1Pitfalls in Relational-Database design (cont.) • in E-R diagram of the banking enterprise, the relationship from bank to loan is one-to-many, i.e. a bank branch may make several loans • An improved design: • decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) Database System Concepts - Chapter 7 Relational-Database Design -
VII-1-2 Principles of Relation Normalization • 在逻辑DBS设计过程中,将概念 DB设计结果E—R图进行转换,得到面向特定应用领域的初始关系模式集 • 这些初始关系模式集中可能存在多种(作为完整性约束的)关系模式属性间的数据依赖 (Data Dependencies) 关系 • 函数依赖 (functional dependencies, FD, §7.4) • 多值依赖 (Multivalued Dependencies, MVD, §7.6, Appendix C/C.1 ) • 连接依赖 (Join Dependencies, JD, Appendix C/C.2) Database System Concepts - Chapter 7 Relational-Database Design -
VII-1-2 Principles of Relation Normalization (cont.) • 如果直接根据初始关系模式构造DBS,由于初始关系模式中数据依赖关系的存在,可能会违反DB的完整性约束,导致DBS使用过程中出现如下问题,影响DBS的正确性、性能、效率 • 数据冗余问题、插入问题、更新问题、删除问题(pitfalls,§7.1) Database System Concepts - Chapter 7 Relational-Database Design -
VII-1-2 Principles of Relation Normalization (cont.) • 因此,对初始关系模式集,需要根据关系规范化理论,在保证关系模式的 • 函数无损连接性(lossless join),和/或 • 函数依赖保持性 (dependency preservation) 约束前提下,对关系模式集进行规范化处理——等价变换/模式分解 • 关系模式规范化主要步骤为 • 根据函数依赖的Armstrong’s 公理系统 ( §7.4.1 )和多值依赖 的公理系统 (Appendix C/C.1),从初始关系模式集中已知的函数依赖和多值依赖出发,推导出初始关系模式集中所有的函数依赖( §7.4.1/7.4.2/7.4.3)和多值依赖 Database System Concepts - Chapter 7 Relational-Database Design -
VII-1-2 Principles of Relation Normalization (cont.) • 对具有函数依赖和多值依赖的初始关系模式集,采用 • 模式分解算法 , , 对其进行(等价)分解和变换,将其转换为各种范式形式,包括: • 1NF(§7.2) 、 2NF (Exercise 7.26) 、 BCNF(§7.3.2 ) 、 3NF (§7.3.4) 、 4NF(§7.6.2) , 以消除模式集中的函数依赖和多值依赖带来的负面影响, 保证数据库系统的完整性 • 关系模式规范化处理的基本要求为: • 静态关系具有第一范式形式 • (理论上)动态关系最好具有第三范式形式 Database System Concepts - Chapter 7 Relational-Database Design -
VII-1-2 Principles of Relation Normalization (cont.) • 3种数据依赖间的关系 • 函数依赖是特殊的多值依赖 • 多值依赖又是连接依赖的特例 • 范式1NF、2NF、3NF、BCNF可以看作由符合范式要求的各种关系模式组成的关系模式的集合 e.g. 1NF = { R | R满足第一范式的定义} • 各种范式间的关系,参见Fig.7.A.1 1NF 2NF 3NF BCNF 4NF 5NF Database System Concepts - Chapter 7 Relational-Database Design -
1NF 消除非主属性对键的部分函数依赖 2NF 函数依赖保持性 消除非主属性对键的传递函数依赖 3NF 消除主属性对键的部分和传递函数依赖 BCNF 消除非平凡且非函数依赖的多值依赖 4NF 无损连接性 消除非平凡连接依赖 5NF Fig.7.0.3 关系范式间相互关系
VII-1-2 Principles of Relation Normalization (cont.) • 给定一个关系模式,可以采用规范化算法将其转换为1NF、2NF、3NF、BCNF • 对连接依赖和第五范式,无相应的模式规范化算法 Database System Concepts - Chapter 7 Relational-Database Design -
VII-2 Functional Dependencies (§7.3/7.4 ) • Function: f: X →Y, x∈ X, y∈ Y, y = f (x) e.g. y = 2x • for x1, x2∈X, if x1= x2, then f (x1) = f (x2) • Contents in section VII-2 • concepts about FD, §7.3.1 • Armstrong Axioms to derive all implied FD, i.e. closure of FD, §7.4.1 • an efficient algorithm to compute the closure§7.4.1 • “minimal” closure of FD, §7.4.3 Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-1 Basic Concepts • Definition1. Functional dependency(FD) holds on R /*函数依赖FD在关系模式R上成立/保持 */ Let R be a relation schema, and R , R , thefunctional dependency holds onschemaR if and only if for anylegalrelationsr(R), whenever any two tuplestiand tj in ragree on the attributes , they also agree on the attributes , that is, ti[] = tj [] ti[ ] = tj [ ] Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-1 Basic Concepts (cont.) • Keys in relational schema can be defined in terms of FD • K is a superkey for relation schema R, if and only if K R • K is a candidate key for R, if and only if • K R, and • for no K, R Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-1 Basic Concepts (cont.) • Definition2. (A particular) r(R) satisfy FD, or FD is satisfied by r(R) /*关系r(R)满足函数依赖集FD, FD 被r满足*/ Given FD ={ } holding on R, and a relation r(R) on R, • if r is legal under functional dependency set FD, r is said to satisfy FD • note: r is legal under FD={ } means • for ti, tj ∈r(R), if ti[]= tj [], then ti[]= tj [] • FD requires that the values for a certain set of attributes determines uniquely the value for another set of attributes Database System Concepts - Chapter 7 Relational-Database Design -
A B C 1 4 2 3 5 6 3 4 6 7 3 8 9 1 0 Example One • Given relation r(R) shown below, which FD is satisfied by r • A. A →B B. AC → B C. BC →A D. B → C Fig. 7.0.4 t1 t2 t3 t4 t5 Database System Concepts - Chapter 7 Relational-Database Design -
Example One (cont.) • t2[A]=t3[A]=3, t2[B]=5 ≠ t3[B]=4, A →B is not satisfied • t2[AC]=t3[AC]=36, t2[B]=5 ≠ t3[B]=4, AC →B is not satisfied • BC →A is satisfied • t1[B]=t3[B]=4, t1[C]=2 ≠ t3[C]=6, B →C is not satisfied Database System Concepts - Chapter 7 Relational-Database Design -
Example Two • Consider the schema R=(employee_ID, date, turnover per-day, department_name, manager) that describes the information about the turnover per-day (日营业额) for each employee everyday, the department that each employee works at, and the manager of the department the employee works at. it is assumed that • at every day, each employee has only one turnover per-day • each employee works at only one department • each department is managed by only one manager • According to the descriptions mentioned above, list all the functional dependencies that hold on R Database System Concepts - Chapter 7 Relational-Database Design -
Example Two (cont.) • Answer • F = { employee_ID, date → turnover per-day, employee_ID →department_name, department_name →manager } Database System Concepts - Chapter 7 Relational-Database Design -
FD holds on R vs FD is satisfied by r(R) • For a schema R, there may be more than one relation instance r(R), i.e. r1(R) , r2(R) , r3(R) ,…, rm(R) , defined on R • e.g. consider R= (A, B, C, D) , and with respect to the instances r1(R) and r2(R) in Fig.7.5 • Relation r(R) satisfies vs. holds on schema R • if holds on R, then every legalr(R) satisfies this R • but for schema R and, if only some ri(R) satisfies R, may not hold on R. • e.g. in Fig.7.5, AC and ABD are satisfied by r1(R) , but AC is not satisfied by r2(R) , so AC does not holds on R Database System Concepts - Chapter 7 Relational-Database Design -
FD1= {AC, ABD} , satisfied by r1 FD2= {ABD} , satisfied by r2 r1 r2 t2 t2 b3 b3 t6 AC does not hold on R Fig.7.5 instance r1 and r2 defined on schema R
VII-2-1 Basic Concepts (cont.) • E.g. True or false ? • for a relation r(R) defined on schema R, if r satisfies functional dependency FD ={ }, then FD holds on schema R • answer: false • With respect to Fig. 2.4 • in general, customer-street customer-city does not hold on customer schema, because two different cities may have the same street • but customer-street customer-city is satisfied by relation instancecustomer as shown in Fig.2.4 Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-1 Basic Concepts (cont.) • Functional dependencies allow us to express constraints that cannot be expressed using superkeys • E.g. consider the schema in Fig.7.2: • bor_loan= (customer-id, loan-number, amount). • with respect to the primary key, we have • customer-id, loan-number amount • but the following FD also holds on bor_loan loan-numbercustomer-name Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-1 Basic Concepts (cont.) • Def. Trivial FD (平凡依赖) A functional dependency is trivial , if for any schema R, where R, R, is satisfied by all relations/instances r on R or: on R is trivial if it holds on any schema R, where R, R • E.g. A A is trivial, customer-name customer-name • In general, is trivial if • e.g. customer-name, loan-number customer-name ift1[customer-name, loan-number ]= t2[customer-name, loan-number ] then t1[customer-name]= t2[customer-name] Database System Concepts - Chapter 7 Relational-Database Design -
a functional dependency is transitive if: ,( ), , ; and is called transitive dependent on VII-2-1 Basic Concepts (cont.) • Def. Transitive dependency (传递函数依赖) • E.g. Student(sno, sname, address, depart) • for transitive dependency sno→address , there are snosname, sname address Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-1 Basic Concepts (cont.) • Def.Partial dependency (部分函数依赖) • a functional dependency is partial if there is a proper subset of , i.e. , such that ; and is partially dependent on • /*非最小化/冗余 • E.g. Student(sno, sname, address, depart) • for partial dependency (sno, sname)→address , there are sname→address, sno→address Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-2 Closure of a Set of Functional Dependencies (cont.) (§7.4.1) • Given a set F set of functional dependencies, there may be some other functional dependencies that are logically implied by F • for example, R(A, B, C) if F ={A B, B C}, then it is inferred that A C • e.g. Fig. 7.0.5 in next slide • For a given FD set F on R, how to derive all the other functional dependencies on R ? • closures of functional dependencies Database System Concepts - Chapter 7 Relational-Database Design -
A B C 1 4 2 3 5 6 4 4 2 7 3 8 9 1 0 F ={A B , B C }; A C is logically implied by F t1 t2 t3 t4 t5 Fig. 7.0.5
VII-2-2 Closure of a Set of Functional Dependencies (cont.) • Def. Given a schema R, a functional dependency f on R is logically implied by a set of FD F on R , if every instance r(R) that satisfies F also satisfies f • e.g. Fig. 7.0.5 • Def. Given a set F of functional dependencies, the closure of F, denoted asF+ • F+ = { f | f is logically implied by F } • e.g. in Fig. 7.0.5, {A B , B C }+ = {A B , B C, A C } Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-2 Closure of a Set of Functional Dependencies (cont.) • F+ can be derived on the basis of Armstrong’s axioms • Three basic Armstrong’s Axioms e.g. • if , then • reflexivity, 自反律 • if , then • augmentation, 增广律 • if , and , then • transitivity,传递律 • For the proof of reflexivity and augmentation, refer to Appendix A Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-2 Closure of a Set of Functional Dependencies (cont.) • Other three additive Armstrong’s axioms • union: if holds and holds, then holds • decomposition: if holds, then holds and holds • pseudotransitivity: if holds and δ holds, then δ holds • These three additive rules can be derived from the three basic Armstrong’s Axioms mentioned above Database System Concepts - Chapter 7 Relational-Database Design -
An Example of Functional Dependency • Question Which rule about functional dependencies shown below is right • A. if then • B if AC , BCD then ABD • C. if ABC then BC • D if , then • Answer: B Database System Concepts - Chapter 7 Relational-Database Design -
An Example - Closure of Functional Dependency • R = (A, B, C, G, H, I) F = { A BA CCG HCG IB H } • Some members of F+ • A H • by transitivity from A B andB H Database System Concepts - Chapter 7 Relational-Database Design -
An Example - Closure of Functional Dependency (cont.) • AG I • by augmentingA C with G, to get AG CG • then by transitivity with AG CG andCG I • CG HI • by union with CG H and CG I Database System Concepts - Chapter 7 Relational-Database Design -
F+ = Frepeatfor each functional dependency f in F+ apply reflexivity and augmentation rules on fadd the resulting functional dependencies to F+for each pair of functional dependencies f1and f2 in F+iff1 and f2 can be combined using transitivitythen add the resulting functional dependency to F+until F+ does not change any further VII-2-2 Closure of a Set of Functional Dependencies (cont.) Fig.7.8 Algorithm to compute F+ Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-3 Closure of Attributes (§7.4.2) • Def. An attribute B is functionally determined by if B • Def. Given a set of attributes , the closure of under F , denoted by +, is { | is functionally determined by under F} • is functionally determined byunder F , if and only if ∈F+ • +is in F+ • E.g. R(A, B, C, D) and F={A B, B C} • F+= {A B, B C, A C, …} • (A) += (ABC), A ABC ∈F+ Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-3 Closure of Attributes (cont.) Input: , F Output: + result := ;while (changes to result) do for each inF do begin if result /* result =(, …) then result := result end Fig.7.9 An efficient algorithm to compute + underF Database System Concepts - Chapter 7 Relational-Database Design -
An Example • R = (A, B, C, G, H, I) • F = {A B, A C, CG H, CG IB H } • Computing (AG)+ 1. result = AG /* or denoted as {A, G } 2. result = ABCG /* A C , A B 3. result = ABCGH /* CG H 4. result = ABCGHI /* CG I /* or { A, B, C, G, H, I } Database System Concepts - Chapter 7 Relational-Database Design -
An Example(cont.) • (AG)+ = R, AG is a superkey of R • Is AG a candidate key? • step1. is AG a super key? • does AG R? == Is (AG)+ = R • yes • step2. is any subset of AG a superkey? • does AR? == is (A)+ = R ?, no • does GR? == is (G)+ = R ?, no • so, AG is a candidate key Database System Concepts - Chapter 7 Relational-Database Design -
VII-2-3 Closure of Attributes (cont.) • Usage-I. Testing for superkey To test whether is a superkey of Runder F, i.e. whether R, we check if R = + • Usage-II. Testing functional dependencies To determine whether or not holds on R under F, we check if + • Usage-III. Computing closure F+ for each γ R, computeγ+={S} under F; for each S γ+, output γS as a functional dependency in F+ Database System Concepts - Chapter 7 Relational-Database Design -