330 likes | 485 Views
Normalization. Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies. Decompose relation schema into Normal forms: Boyce-Codd Normal Form (BCNF) Third Normal Form (3NF) Fourth Normal Form (4NF)
E N D
Normalization • Purpose: process to eliminate redundancy in relations due to functional or multi-valued dependencies. • Decompose relation schema into Normal forms: • Boyce-Codd Normal Form (BCNF) • Third Normal Form (3NF) • Fourth Normal Form (4NF) • To obtain the new relations, project the schemas onto the original relation schema (e.g. Movie) • To recover information (I.e. Movie) from the new relations: natural join the new relations.
BCNF Decomposition Example 3.24 pp 104 • Relation: Movie(title, year, length, filmType, studioName, starName) • Key: {title, year, starName} • FD’s: title year length filmType studioName is a BCNF violation, so Movie not in BCNF • Decomposition: Schema 1: {title, year,length, filmType, studioName} Schema 2: {title, year, starName} • To obtain the new relations, project the schemas onto Movie • To recover information (I.e. Movie) from the new relations: natural join the new relations. Does not lose information.
Functional Dependencies (FD) • Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X Y means X functionally determines Y e.g. A1A2…An B1B2…Bm • A1A2…An B1B2…Bm is an assertion about R that two attributes or sets of attributes in R are dependent of one another.
Mutivalued Dependencies (MVD) • Given: relation schema R, and A1A2…An and B1B2…Bm be subsets of attributes of R. MVD : A1A2…An B1B2…Bm holds in R if : For each pair of tuples t and u of relation R that agree on all the A’s, we can find in R some tuple v that agrees: • With both t and u on the A’s, • With t on the B’s, and • With u on all attributes of R that are not among the A’s or B’s • A1A2…An B1B2…Bm is an assertion about R that two attributes or sets of attributes in R are independent of one another. • Cause redundancy not related to FD’s in a BCNF schema. • Most common source: putting 2 or more many-many relationships in a single relation.
MVD Rules • Trivial dependencies rule If A1A2…An B1B2…Bm holds for R, then A1A2…An C1C2…Ck holds where the C’s are the B’s + one or more of the A’s. The converse also hold. • Transitive rule If A1A2…An B1B2…Bm and B1B2…Bm C1C2…Ck then A1A2…An C1C2…Ck • Splitting rule does not hold E.g. name street city, but not name street So, always start with set of attributes on the R.S. because splitting rule does not hold.
A’s B’s t u More MVD Rules • Every FD is an MVD Because If FD A1A2…An B1B2…Bm, then swapping B’s between tuples that agree on A’s doesn’t create new tuples. • Complementation rule If X Y, then X Z, where Z is all attributes not in X or Y e.g. Star_Star_In {name, street, city, title, year} name street city name title year
Nontrivial MVD A1A2…An B1B2…Bm for a relation R is nontrivial if: • B1B2…Bm is not a subset of A1A2…An • A1A2…An B1B2…Bm is not all attributes of R
Fourth Normal Form (4NF) • Decompose relations that has MVD’s into 4NF to eliminate MVD’s. • Definition: R is in 4NF if A1A2…An B1B2…Bm is a nontrivial MVD, {A1A2…An} is a superkey. • Since every FD is an MVD, so 4NF is more stringent than BCNF • Only nontrivial MVD’s has the potential to violate 4NF
R Y X 4NF Decomposition Given: relation R, and nontrivial MVD X Y that violate 4NF • Decompose X Y into XY and X (R-Y) • Produce the relations by projecting R onto XY and X (R-Y) • Reconstruct R from the new relations using natural join e.g. Star_Star_In {name, street, city, title, year} and name street city Decompose Star_Star_In using name street city into {name, street, city} and {name, title, year}
Relationships among normal forms • 4NF is the most stringent • 4NF BCNF 3NF
Lossless-join decomposition Given: Relation R, decomposed into schemes R1, R2, … Rk, and D is a set of dependencies. Definition: R1, R2, … Rk is a lossless-join (w.r.t. D) if for every relation r for R satisfying D: r = R1(r) R2(r) …Rk(r) i.e. Every relation r for R is the natural join of its projections onto the Ri’s. The lossless-join property is necessary if the decomposed relation is to be recoverable from its decomposition. However, joins are expensive. So, don’t over decompose!
Structured Query Language (SQL) • A DDL and DML for relational DBMSs • History: ANSI SQL, , SQL-92 (SQL2), SQL-99 (SQL3) • SQL-99 extends SQL2 with object-relational features and other new features • Most DBMS vendors implements the core, and then add bells and whistles and variations • Query capability is close to relational algebra, with lots of extensions. • Case insensitive except characters inside quoted strings ' ' e.g. 'Smith' 'SMITH' • ; as statement delimiter
Example database schema Movie(title, year, length, inColor, studioName, producerC#) StartIn(movieTitle, movieYear, starName) MovieStar(name, address, gender, birthdate) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#)
SQL Quries – basic form SELECT attribute/s FROM relations / views /subqury WHERE conditional expression;
SQL query examples • Example 1: SELECT * FROM Movie; -- * => all attributes of Movie • Example 2: SELECT * FROM Movie WHERE studioName = 'Disney' AND year = 1990; • Example 3: SELECT title, length FROM Movie WHERE studioName = 'Disney' AND year = 1990;
Duplicates • SQL generally operates using bags instead of sets • Exception: UNION, INTERSECT, EXCEPT operation • To eliminate duplicates, add keyword DISTINCT to the SELECT clause e.g. SELECT DISTINCT starName FROM StarsIn; • Duplicate elimination is costly. Use judiciously.
SQL Correspondence to Relational Algebra SELECT L-- R.A. project FROM R-- R.A. operands WHERE C ; -- R.A. select R.A. expression: L(C(R)) When reading and writing queries: • FROM -- what relations are involved • WHERE -- what's the tuples selection criteria • SELECT -- what columns to output
Union, Intersection, Difference of Queries • UNION : R1 UNION R2 or (Q1) UNION (Q2) e.g. (SELECT title, year FROM Movie) UNION (SELECT movieTitle AS title, movieYear AS year FROM StarsIn); • INTERSECT : R1 INTERSECT R2 or (Q1) INTERSECT (Q2) • EXCEPT: R1 EXCEPT R2 -- difference (Q1) EXCEPT (Q2)
Union, Intersection, Difference of Queries (continued) • Q1 and Q2 are queries that produce relations • R1 and R2, or results of Q1 and Q2 should have the same list of attributes and attribute types. Rename if necessary. • Duplicates are eliminated automatically • Add the keyword ALL after UNION, INTERSECT, or EXCEPT to prevent duplicates elimination
SQL and Relational Algebra • The six independent operations are implemented by SQL • SQL is relational complete
Some data values in SQL • Strings • Dates and Times • Null values • Truth value of Unknown
1. Strings • Comparison operators (according to lexicographical order) <, >, <=, >= = • LIKE -- pattern matching • % -- matches any sequence of 0 or more characters • _ -- matches any one character • E.g.: title LIKE 'Star _ _ _ _' • E.g.: title LIKE '%''s%' • Can specify escape character • E.g. title LIKE 'x%%x%' ESCAPE 'x'
2. Dates and Times • Date constant: DATE '2002-10-01' • Time constant: TIME '15:00:02.5' • Timestamp (combines dates and times): TIMESTAMP '2002-10-01 15:00:02.5‘ (beware of implementation differences!) • Comparison operators apply
3. Null Values • NULL to represent: • Value unknown • Value inapplicable • Value withheld • Operations involving NULL • Arithmetic operation: result is NULL • Comparison: result is UNKNOWN • NULL is not a constant, therefore NULL cannot be used explicitly as an operand. • IS NULL and IS NOT NULL checks • Read "Pitfalls Regarding Nulls" pp. 250
4. UNKNOWN • Consider TRUE = 1, FALSE = 0, UNKNOWN = 0.5 • AND of 2 truth-value = min. of the 2 values • OR of 2 truth-value = max. of the 2 values • Negation of v = 1-v • Refer Fig. 6.2 pp. 250 for truth table for 3-valued logic
The Six Clauses in SQL Queries • SELECT -- required • FROM -- required • WHERE • GROUP BY • HAVING -- if used, must follows a group by clause • ORDER BY • Subqueries may appear in the FROM clause and the WHERE clause • Comments begins with ‘--’
Table level SQL (ref. 6.6, pp. 292) • Create table – to define the schema of a base table (Ref. 6.6.1 for data types syntax) E.g. create table EMP ( empno int not null, lastName varchar(30) not null, firstName varchar(30) not null, num_of_children int, constraint pk_EMP primary key(empno) ); • Drop table – to destroy a base table e.g. drop table EMP;
Tuple Modification Statements (ref. 6.5, pp. 286) • Insert – to add a row Syntax: insert into R(A1..An) values (v1…vn) • E.g. insert into emp(empno, lastName, firstName, num_of_children)values (12345, ‘Doe’, ‘John’, 1) • Or insert into emp values (12345, ‘Doe’, ‘John’, 1) • Delete – to remove a row Syntax: delete from R where <condition> • E.g. delete from emp where empno = 12345 • Update – to modify the contents of a row Syntax: update R set Ai = value where Aj = targetValue • E.g. update emp set num_of_children = 2 where empno = 12345
Some JOINS in SQL. (ref. pp. 270) • CROSS JOIN -- R.A. cartesian product e.g. Movie CROSS JOIN StarsIn; • JOIN … ON -- R.A. theta-join e.g. Movie JOIN StarsIn ON title = movieTitle AND year = movieYear; • [NATURAL] JOIN -- R.A. natural join e.g. MovieStar NATURAL JOIN MovieExec; or MovieStar JOIN MovieExec; • OUTERJOINS -- joins that include dangling tuples
OUTERJOINS • An operator to augment the result of a join by the dangling tuples, padded with null values. • Full outerjoin of R1 and R2 is a join that includes all rows from R1 and R2 matched or not. Unmatched rows are padded with NULLs. • LEFT outerjoin of R1 and R2 is a join that includes all rows from R1, matched or not, plus the matching values from R2. Unmatched rows are padded with NULLs. • RIGHT outerjoin of R1 and R2 is a join that includes all rows from R2, matched or not, plus the matching values from R1. Unmatched rows are padded with NULLs. • The joining may be NATURAL or theta join
Outerjoins Syntax • R1 NATURAL {FULL | LEFT | RIGHT} OUTER JOIN R2; E.g. 1. MovieStar NATURAL FULL OUTER JOIN MovieExec; E.g. 2. MovieStar NATURAL LEFT OUTER JOIN MovieExec; E.g. 3. MovieStar NATURAL RIGHT OUTER JOIN MovieExec;
Outerjoins Syntax (continued) • R1 {FULL | LEFT | RIGHT} OUTER JOIN R2 ON conditional expression; E.g. 1. Movie FULL OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 2. MovieStar LEFT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear; E.g. 3. MovieStar RIGHT OUTER JOIN StarsIn ON title = movieTitle AND year = movieYear;
Use result of joins as subqueries in queries • E.g. SELECT title, year, length, inColor, studioName, producerC#, starName FROM Movie JOIN StarsIn ON title = movieTitle AND year = movieYear;