CS411 Database Systems

CS411Database Systems Kazuhiro Minami 08: Midterm Review

Boyce-Codd Normal Form A relation R is in BCNFif whenever there is a nontrivial FD A1 ... AnB for R, {A1 ... An} is a superkey for R. An FD is trivial if all the attributes on its right-hand side are also on its left-hand side.

BCNF Summary • BCNF violation occurs when we do not store information on multiple functions efficiently • This does not always mean that there is redundancy in a relation; it means that the relation could have redundant data • If the input of a function f is a key or a super key of a table, we capture information on f without redundancy • If the input of a function f is NOT a key, there are two situations we could have two redundant data

Two Possible Causes of Redundancy • The input of function g is an output or part of output of another function f • The input of function g is a subset of attributes of another function f g f f: X -> Y g: Y-> Z f: X,Y-> W g: Y -> Z

When BCNF Decomposition Breaks FDs? • We say that a decomposition is NOT dependency-preserving if we cannot check each FD with decomposed relations. • Suppose that R(A, B, C) with FDs: A, B -> C, C->B. • By using C -> B, we get R1(B, C) and R2(A,B). • Therefore, we cannot check A,B -> C with R1 and R2.

3rd Normal Form R is in 3NF if for every nontrivial FD A1, ..., An B, either {A1, ..., An} is a superkey, or B is part of a key. Weakens BCNF.

Primary Goal of 3NF • Preserve FDs of the initial relation R with docomposed relations R1,…,Rn • Try to minimize redundancy while achieving the first goal

Basic Approach for 3NF Decomposition • Create a relation for each FD • E.g., If A->B, then create R(A, B) • But, we want to minimize the number of such relations • Thus, we find a minimum set of FDs, from which we can derive all FDs. • E.g., If A->B, B->C, we don’t need to worry about A -> C. Thus, A->C is not part of the minimum set. • Every FD in the minimum set has nice properties: • A1,…,An -> B where 1) no FDs among A1,…,An, 2) a single attribute on the right and 3) no FDs of the form Ai,…,Aj -> B where {Ai,…,Aj} {A1,…,An}

Synthesis Algorithm for 3NF Schemas Result will be lossless and will preserve dependencies. Result will be in 3NF, but might not be in BCNF. 1. Find a minimal basis G of the set of FDs for relation R 2. For each FD XA in G, add a relation with attributes XA 3. If none of the relation schemas from Step 2 is a superkey for R, add a relation whose schema is a key for R

Minimal Basis A set of FD’s F is a minimal basis of a set of dependencies E if • E = F+ • Every dependency in F has a single attribute for its right-hand side • Cannot remove any dependency from F or remove attributes from the left side of any FD in F (minimality) Example: E = {AB, AC, BA, BC, CA, CB} F = {AB, BC, CA}

Example 3NF Decomposition R(A, B, C, D, E), F = {AB -> C, C -> B, C -> D} • We assume that F is the minimum cover • Obtain R1(A, B, C) , R2(B, C), and R3(C, D) from each FD in set F • Since either R1 or R3 does not contain a key {ABE} or {ACE}, add one more relation R4(A, B, E) • We finally get R1(ABC), R3(C, D), and R4(A, B, E) omitting R2(B, C).

Is a decomposed relation for 3NF also in BCNF? • If A1,…,An -> B in the minimum set, then we create R(A1,…, An, B). • Remember that there is no FDs among A1,…,An. • What type of FD in R will violate BCNF? • B -> Ai,…,Aj where {Ai,…,Aj} {A1,…,An} • If we apply BCNF decomposition with the above FD, we break, A1,…,An -> B • Recall the primary goal of 3NF. We need to keep R any way • Then, let’s stop here and find out what property FD: B -> Ai,…,Aj has Ai,…,Aj is part of a key!

Relational Schema Design • What is a keys and a super key? • What is a functional dependency? • How to compute the closure of a set of attributes given FDs? • How to determine keys given functional dependencies? • How to use Armstrong's Axioms and compute a closure of FDs? • What are update and deletion anomalies? Why are they bad? • What is BCNF? • How to decompose relation into BCNF? • Is BCNF decomposition lossless? • What is 3NF? How is it different from BCNF? • Is 3NF decomposition lossless? • What are tradeoffs between BCNF and 3NF? • What’s a multi-valued dependency? Is it related to a FD in some way? • What is the relationship between BCNF and 4NF?

Relational Algebra • What are basic five operators in RA? • What are derived operators? • Do you know the symbols of the operators and what they do? • How to define each derived operator with the basic ones? • What are theta-join and natural join? • What is turned by the following expression on relation R(a)?

Which products are available only at a single store? R(ProductName, Store, ID)

1. Make a copy of T3 T31 := T3 Which products are available only at a single store? T3: Inventory Relation T31: Inventory Relation

Which products are available only at a single store? T3: Inventory Relation T31: Inventory Relation 2. Compute products available at multiple stores A = π[T3.ProductName](T3 ⋈ [T3.ProductName = T31.ProductName AND T3.StoreID != T31.StoreID] T31)

Which products are available only at a single store? T3: Inventory Relation A: Products available at multile stores 3. Subtract stores in A from stores in T3 B = π[T3.ProductName] - A B: Products available at a single store

Find all the store names whose products in their inventories are a subset of the inventory of some other store? T(ProductName, StoreName) Your answer should be “Schnucks”

1. Compute all the possible pair of ProductName and StoreName A =π[ProductName]T3  π[StoreName]T3 T3: Inventory Relation A:

Compute a list of ProductName-StoreName pair (p, s) where product p is NOT available at store s. Negative Inventory Relation B = A - T3 T3: Inventory Relation B:

3. Take the natural join of T3 and Band project onto a pair of stores T3: Inventory Relation B: Negative Inventory Relation C = π[T3.StoreName, B.StoreName] (T3 ⋈ B)

4. Compute all the possible pair of two different stores T31 = T3, D(store1, store2) = π[StoreName] T3⋈[T3.StoreName != T31.StoreName]π[StoreName] T31 T3: Inventory Relation D:

Notice that store1  store2 andstore1  store2are mutually exclusive and collectively exhaustive D(store1, store2) - C(store1, store2) C(store1, store2) where store1  store2 wherestore1  store2 All the possible pair of two stores D(store1, store2)

5. Subtract D from C D: C: E = D - C

SQL • What are the three clauses in a SQL query statement? • How to express a single relation query in SQL with RA? • Do you know how conditions involving NULL are evaluated? • Do you know how to disambiguate attribute names in the WHERE clause when the FROM clause contain multiple relations? • Do you understand the semantics (meaning) of a multi-relation query in SQL in two different ways? • What if a query needs two copies of the same relation? • In which clauses can we use subqueries? • Can you use IN, ALL, ANY, and EXISTS operators on the result relation of a subquery? • How to express set operations such as union, intersect, and set difference in SQL? • Which operations in SQL support bag semantics? • Which operations in SQL support set semantics? • How to remove duplicate tuples from the result of a SQL query? • Do you known when a correlated subquery is evaluated?

SQL Aggregation/Grouping • What are five aggregation function in SQL? • How aggregations in SQL handle NULL values? Is there any difference among the functions? • How to partition the result relation in a SQL query into multiple groups? • Where can you define conditions on each group? • Which attributes can you refer to in the HAVING clause? • Which attributes can you include in the SELECT clause? • In which clauses can you use aggregate functions? • How to write insertion, deletion, and update statements in SQL? • How to create a new table in SQL? • What is a view? • Why some views are not updatable?

Constraints and Triggers • What’s the major difference between constraints and triggers? • What are example constraints in SQL? • How primary keys and a set of attributes declared as UNIQUE handle NULL values differently? • In which situations a foreign-key constraint could be violated? • What are three strategies to prevent dangling tuples? • When an attribute-based or tuple-based CHECK evaluated? • In what situations a DBMS cannot enforce the conditions in CHECKs? • When are the condition in an ASSERTION checked? • What are events in a TRIGGER statement? • What is the difference between statement-level triggers and row-level ones?

CS411 Database Systems