380 likes | 554 Views
Relational database design. Lecture 3 Implementation of rules Analytical database design. Maria Bergholtz Su/KTH. From conceptual schema to relational database schema. BIL. PERSON. Regno: String 1..1 UNIK. Namn: String 1..1 UNIK. äger. 0..*. 0..*. Type: String 1..1. Age: Integer.
E N D
Relational database design Lecture 3 Implementation of rules Analytical database design Maria Bergholtz Su/KTH
From conceptual schema to relational database schema BIL PERSON Regno: String 1..1 UNIK Namn: String 1..1 UNIK äger 0..* 0..* Type: String 1..1 Age: Integer Ett sätt: Ett annat sätt: PERSON Namn Ålder PERSON((Namn), Ålder) ÄGANDE ÄGER((Namn, Regnr)) Namn Regnr BIL((Regnr), Märke) BIL Regnr Märke Where have all the rules gone?
SALE Sum1: Int 1..1 Sum2: Int 1..1 PRODUCT Name: String 1..1 Colours: String 1..* DOG CAT ANIMAL Name: String 1..1 Weight: Int 1..1 Rules in UML class diagrams, database, application code? I As classes: II With graphical symbols: multiplicitet or symbols for mutual exclusiveness! III In code, (in database or application code)
SQL - DDL UNIQUE UNIQUE This becomes two tables: and SQL – DDL example CREATE TABLE EMPLOYEE (EID VARCHAR(11) NOT NULL UNIQUE, BID CHAR(7) NOT NULL, PRIMARY KEY(EID), FOREIGN KEY(BID) REFERENCES BUSINESS ON DELETE CASCADE ON UPDATE CASCADE)
¤ actionthat makes it execute (insert, update, delete, retrieve) ¤ object (attribute/column or table) affected ¤ conditionthat will execute the effect-part of the trigger ¤ effects that will influence the database
EMPLOYEE Name Adress Salary Kalle Byvägen 3 10000 Olle Solstigen 4 20000 Stina Ekgränd 2 18000 Kurt Byvägen 5 17000 Triggers example CREATE TRIGGER Signal_large_increase: AFTER UPDATE OF SALARY ON EMPLOYEE REFERENCING NEW AS n OLD AS o FOR EACH ROW MODE DB2SQL IF n.salary > 1.1 * o.salary SIGNAL SQLSTATE ‘75000’ (‘Salary increase > 10 %’) EMPLOYEE BUSINESS Eid Bid Bid Anna SKRUV AB SKRUV AB Lisa SKRUV AB BULT AB Oscar BULT AB CREATE TRIGGER on_update_cascade AFTER UPDATE ON Business REFERENCIN NEW AS n OLD as o FOR EACH ROW MODE DB2SQL BEGIN UPDATE Employee SET Employee.Bid = n.Bid WHERE Employee.Bid = o.Bid; END@ EMPLOYEE BUSINESS Eid Bid Bid Anna SKRUV AB SKRUV AB Lisa SKRUV AB BULTIA AB Oscar BULTIA AB
Stored procedures • A stored procedure – a database object, i.e. stored in the DBMS • Procedural; parameter passing, control constructs, but also SQL-statements, and much more • In client-server-environments: less data traffic • In all environments: code reuse, code sharing, pre-parsing, DBMS-rules applies (i.e. entity and referential integrity)
Stored procedures CREATE PROCEDURE on_update_cascade (IN old_business, IN new_business) LANGUAGE SQL BEGIN UPDATE Employee SET Employee.Bid = new_business WHERE Employee.Bid = old_business; END@ CREATE TRIGGER ouc AFTER UPDATE ON Business REFERENCING NEW AS n OLD AS o FOR EACH STATEMENT MODE DB2SQL BEGIN ATOMIC call on_update_cascade(o.Bid, n.Bid); END@ EMPLOYEE BUSINESS Eid Bid Bid Anna SKRUV AB SKRUV AB Lisa SKRUV AB BULT AB Oscar BULT AB EMPLOYEE BUSINESS Eid Bid Bid Anna SKRUV AB SKRUV AB Lisa SKRUV AB BULTIA AB Oscar BULTIA AB
Tabell 2 Tabell 1 Name Address Phone Lisa Byvägen 3 11111 Olle Solgränd 2 22222 Pelle Solgränd 4 23456 Analytical database design • ”Debug” databases i in terms of redundancy • Repetition of 1-3NF • BCNF, 4NF, 5NF
Redundancy. The same type of fact is repeated several times in the table. Update anomalies. The same fact must be updated on several places (as opposed to be updated in one place only) – time consuming and error prone. Redundancy and update anomalies PERSON&CAR
Functional dependency: Pnr Name Function ´For each value of Pnr, e.g. 2222 Maria 610321 there must exist at most one value of tt Name. Two or several Pnr might Eva 11111 have the same value on Name though. Sture There also are Names that are not 22222 related to Pnr. Olle As well as Pnr:s not related to Name 33333 Not function Maria 610321 Here the functional dependency is Eva not met since there are two 11111 different values on Name: Eva and Sture 33333 Maria. 22222 Olle
Functional dependencies – exemple PERSON&CAR Yes Does Personnr Name hold?
Functional dependencies – exemple PERSON&CAR Njet. Does Personnr Regnr hold?
PERSON Name: String 1..1 Weight: Integer 1..1 Address: String 1..1 Postalcode: Integer 1..1 DOG Name: String 1..1 Weight: Integer 1..1 Favorite food: String 1..1 1..1 ägs_av 0..* There is no notation for functional dependencies WITHIN a class – i.e. functional dependencies between attributes!
First normal form A table is in first normal form if all columns of the table contain atomic values. Not 1NF 1NF As the relational model is defined, all relations are in first normal form (1 NF)
Partial functional dependency A B is called a partial functional dependency if there exists a C that constitutes a proper subset of A and where C B. B is fully functionally dependent of A if B is functionally dependent of A and not partially dependent of A. Personnr, Regnr Name Personnr, Regnr Type Personnr Name Regnr Type Name is partially dependent of (Personnr, Regnr) Type is partially dependent of (Personr, Regnr)
PRODUCT PRODUCT Pid Colour Pid Colour Price Pen Blue 3 Pen White 3 Pen Red 3 Nail Metal 5 Hammer Grey 33 Chair White 6 Chair Red 6 Chair Blue 6 Table White 9 Table Blue 9 Table Red 9 Vas White 44 Stool Yellow 10 A table is in second normal form (2NF) if each column that is not part of the key is fully functionally dependent of the key. Each column shall be dependent on the entire key, not only part of the key. 2NF Pen Blue Pen White Pen Red Nail Metal Hammer Metal Chair White Chair Red Chair Blue Table White Table Blue Table Red Vas White Stool Yellow Pid --> Price PRODUCTPRICE Pid Price Pen 3 Nail 5 Hammer 33 Chair 6 Table 9 Vas 44 Stool 10
Transitive dependencies C is transitively dependent of A through B if there exist functional dependencies A B and B C, such that A is neither functionally dependent of B nor functionally dependent of C. Course Teacher, Room, No_of_places i.e.: Course Room Room No_of_places No_of_places is transitively dependent on Course through Room.
Third normal form (3NF) A table is in third normal form if and only if it is in second normal form, and if each column that is not part of the primary key is functionally but not transitively dependent of the primary key. Each column shall be dependent on the key and nothing but the key.
3NF If a column is transitively dependent of the key through another column, turn these columns into a table of its own. COURSE Course Room Room Number_of_places COURSE ROOM A foreign key is created when the original table is split into two: COURSE.Room constitute a foreign key towards ROOM.Room
Problems with not normalized tables: Not normalising leads to so called update anomalies: IINSERT: It is not possible to add information about the number of places in a room if the room is not used by any course. DELETE: When the single course used by one particular room is removed, so is all info about that room. UPDATE: If we re-build a room in order to accommodate more people, we must change the number of places in that room in each and every row that that room is present – risk of inconsistency! Each column shall be dependent on the key (”1NF”), the whole key (”2NF”) and nothing but the key (”3NF”).
Space vs Time vs Consistency Title Author Libretto Weight Pages Publisher Hamlet Shakespeare ”Once upon….The End.” 35 1000 Bonniers Hamlet Shakespeare ”Once upon….The End.” 46 1500 Pierson Hamlet Shakespeare ”Once upon….The End.” 60 2000 Nordstedts etc. Hamlet Shakespeare ”Once upon….The End.” 79 2500 Prentice-Hall 1984 Orwell ” ” 20 100 Bonniers etc. Title Author Libretto Hamlet Shakespeare ”Once upon…The End.” 1984 Orwell ” ” Title Weight Pages Publisher Hamlet 35 1000 Bonniers Hamlet 46 1500 Pierson Hamlet 60 2000 Nordstedts etc. 1984 20 100 Bonniers etc.
BCNF Boyce-Codds Normal Form A relation is in Boyce-Codds normal form iff every determinant is a candidate key All attributes in a relation shall be fully dependent of the whole primary key, regardless of what key is chosen as primary key.
BCNF cont. A relation R has a number of attributes, U. X is said to be a determinant of R if X is both a subset of U (reads X = a number of attributes, could be one only), and there exists a functional dependency X Y for som Y that is not a subset of X. If now all determinants are kandidate keys in R, then that relation is in BCNF. Or, in other words, if there is one single functional dependenncy where the determinant (”left hand side”) is NOT a candidate key for R, then R is NOT in BCNF.
BCNF cont. Two functional dependencies:: Course à Teatcher Teacher, Time à Course R TEACHER TIME COURSE Maria oktober *62 Maria november *62 Petia november *58 Maria december DSVL1:5 R1 COURSE TEACHER *62 Maria *58 Petia DSVL1:5 Maria R2 COURSE TIME *62 oktober *62 november *58 november DSVL1:5 december
Nonloss decomposition "Nonloss" decomposition is a rather odd term. Suppose that relation R is decomposed into a number of relations, R’, R’’ etc. We try to re-create R by joining, R’ with R’’ and so on. The resulting relation, lets call it R-loss, may then contain a number of so called spurious tuples that where not present in the original R. A nonloss decomposition guarantees that a join over the decomposed Tables will result in the original relation.
Control of non-loss decompositions For each functional dependency X Y: For all rows that have the same symbol in the column (s) that correspond to X: If any of the rows have an ”a” in the column that correspond to Y: Write “a” in all Y-columns (s) of that row. If an entire row contains “a:n” only the decomposition is non loss and the control need not go further. If all functional dependecies have been checked an no row still contains ”a”:s only, then the decomposition is not non-loss.
BCNF-decomposition, non-loss or not? Two functional dependencies: Course à Teacher Teacher, Time à Course R TEACHER TIME COURSE Maria oktober *62 Maria november *62 Petia november *58 Maria december DSVL1:5 R1 COURSE TEACHER *62 Maria *58 Petia DSVL1:5 Maria R2 COURSE TIME *62 oktober *62 november *58 november DSVL1:5 december
COURSE TEACHER TIME A A B R1 R2 A B A Controll of "non-loss” via the algorithm R(TEACHER, TIME, COURSE) Functional dependencies: COURSE --> TEACHER, TEACHER, TI´ME --> COURSE Dekomposition: R1(COURSE, TEACHER) R2(COURSE, TIME) In the matrix, all attributes present in the relation is encoded as an ”a” and those missing with a ”b”. A
R1 COURSE TEACHER *62 Maria *58 Petia DSVL1:5 Maria R’ TEACHER TIME COURSE Maria oktober *62 Maria november *62 Petia november *58 Maria december DSVL1:5 Yes, the same tuples! Test of non-loss via join R TEACHER TIME COURSE Maria oktober *62 Maria november *62 Petia november *58 Maria december DSVL1:5 R2 COURSE TIME *62 oktober *62 november *58 november DSVL1:5 december
COURSE TEACHER TIME A A B R1 R2 B A A Control of "non-loss” on NEW decomposition via the algorithm R(TEACHER, TIME, COURSE) Functional dependencies: COURSE --> TEACHER, TEACHER, TIME --> COURSE Dekomposition: R1(COURSE, TEACHER) R2(TEACHER, TIME) In the matrix, all attributes present in the relation is encoded as an ”a” and those not present with a ”b”.
R2 TEACHER TIME Maria oktober Maria november Petia november Maria december R1 COURSE TEACHER *62 Maria *58 Petia DSVL1:5 Maria Test of non-loss on NEW decomp. via join R TEACHER TIME COURSE Maria oktober *62 Maria november *62 Petia november *58 Maria december DSVL1:5 R’ TEACHER TIME COURSE Maria oktober *62 Maria oktober DSVL1:5 Maria november *62 Maria november DSVL1:5 Petia november *58 Maria december *62 Maria december DSVL1:5 Several new, not correct, tuples!
4NF A relation is in fourth normal form iff it is in BCNF and there are no multivalued dependencies
4NF Formal definition of a multivalued dependency: R = Relation, X subset of R, where X is one/several attributes Y subset of R, where Y is one/several attributes If X Y (i.e. Y is multivalued determined by X) the following shall hold: If two tuples t1 and t2 exist such that t1(X) = t2(X) then there must also exist two tuples t3 och t4 with the following properties: t3(X) = t4(X) = t1(X) = t2(X). t3(Y) = t1(Y) OCH t2(Y) = t4(Y) t3(Z) = t2(Z) OCH t4(Z) = t1(Z) where Z are the attributes left in relation R (besides X and Y) In addition, X Z also holds!
4NF ENAME PNAME DNAME Smith A John t1 t2 Smith B Anna t3 Smith A Anna t4 Smith B John All terms of former slide holds! We have a multivalued dependency both between X Y andbetween X Z Where X stands for ENAME, Y is PNAME and Z DNAME!
4NF cont. R (ENAME, PNAME, DNAME) ENAME PNAME ENAME DNAME dekomponeras till R1 (ENAME, PNAME) och R2 (ENAME, DNAME) R R1 ENAME PNAME ENAME PNAME DNAME Smith X John Smith X Smith Y Anna Smith Y Smith X Anna Brown X Smith Y John Brown Y Brown W Jim Brown Z Brown X Jim Brown W Brown Y Jim R2 Brown Z Jim Brown W Joan ENAME DNAME Brown X Joan Brown Y Joan Smith John Brown Z Joan Smith Anna Brown W Bob Brown Jim Brown X Bob Brown Joan Brown Y Bob Brown Bob Brown Z Bob
”Over-normalisation” Relation anst ( anst-nr , namn, avd, lön ) is already in highest normalform. No further normalisation is necessary! Avoid compositions like anst-namn ( anst-nr , namn ) anst-avd ( anst-nr , avd ) anst-lön ( anst-nr , lön ) • No reason from a logical point of view • Join is expensive (the currency is time) • Updates of a view based on a join is as a rule not allowed most DBMS