300 likes | 325 Views
Fundamentals/ICY: Databases 2010/11 WEEK 9. John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK. Subsets and Supersets.
E N D
Fundamentals/ICY: Databases2010/11WEEK 9 John Barnden Professor of Artificial Intelligence School of Computer Science University of Birmingham, UK
Subsets and Supersets • A Bmeans that A is a “subset” of B (and that B is a “superset” of A). I.e., every member of A is also a member of B. • Carefully distinguish between subset-of and member-of!!! • The symbol means the same as • does NOTmean that there cannot be equality. • Examples: • {4,5} • {5} {4,5,6}, {6,4} {4,5,6,7}, {6,4,7,5} {4,5,6,7} • {n | n is an even whole number} {n | n is a whole number}
Subsets and Supersets • Afor any set A. • A Afor any set A. (Reflexivity) • If A BandBA then A = B. (Antisymmetry) • If A BandBC then AC. (Transitivity)
Some Operations on Sets • Union of sets A and B: • AB = the set of things that are in A or B (or both). • NB: no repetitions created. • Intersection of sets A and B: • A B = the set of things that are in both A and B. • Difference of sets A and B: • A B = the set of things that are in A but not B. • Note: also notated by a backslash instead of a minus sign.
Some Properties of those Operations • Union and intersection are commutative • (“can switch”): • AB = BA • A B = B A • Union and intersection are associative • (“can group differently”): • A (B C) = (AB) C • A (B C) = (A B) C • Because of associativity, we can omit parentheses: • AB C D A B C D
Two Other Properties • Union distributes over intersection: • A(B C) = (AB) (A C) • Intersection distributes over union: • A(B C) = (AB) (A C)
Same Difference? • Exercises for bath-time: • Is the difference operation commutative or associative? • And does it take part in any distributivity with the other operations?
“Tuples” • A “tuple” is an ordered sequence of items of any sort. We will only deal with finite tuples. Items CAN be duplicated. • Can also be called a “vector” in other CS terminology. • Notation: 6, JAB, 5, “JAB”, 5, , 9> • Singleton and empty tuples: <6>, <> • The concatenation ( ⃘) of two tuples is just the result of putting them end to end to get one tuple. • <6, JAB, 5> ⃘ <5,6> = <6, JAB, 5, 5, 6> • <6, JAB, 5> ⃘ <> = <6, JAB, 5>
Table Rows are “Tuples” • In a table, each attribute has a “domain” – the set of values that the attribute can have. E.g., the set of integers, the set of all character strings of any length, or the set of character strings of a specific format and length. • If the attribute allows NULL values, we include NULL in the value domain as well. • The values in a row form a tuple of values from the respective value domains. Just a list of the values, one for each attribute.
Tuples in a Table People • The tuples are • ‘9568876A’, ‘Chopples’, 37 > • ‘2544799Z’, ‘Blurp’, NULL > • ‘1698674F’, ‘Rumpel’, 88 >
“Cartesian Products” and “Relations” • The set of all possible tuples formed from some sets is called the Cartesian product of the sets. • Notation, e.g.: D E F G H • if D, E, F, G, H are the sets—not necessarily different. • Any subsetat all of that Cartesian product is called a relation on the sets in question (D, E, …) • even the whole of the product (even if infinite) • and even the empty set. • I.e., a relation on D, E, …, H is just some set of tuples that are each of form <d,e, …, h> where d D, e E, …, h H.
Examples • Let A = {3, 8, 2} and B = {‘jjj’, ‘bb’}. • Then A B = • { <3, ‘jjj’>, <3, ‘bb’>, <8, ‘jjj’>, <8, ‘bb’>, <2, ‘jjj’>, <2, ‘bb’> }. • B B = { <‘jjj’, ‘jjj’>, <‘jjj’, ‘bb’>, <‘bb’, ‘jjj’>, <‘bb’, ‘bb’>}. • A = = A • A {TRUE} = { <3, TRUE>, <8, TRUE>, <2, TRUE> } • Some relations on A andB: • {<3, ‘jjj’>, <3, ‘bb’>, <2, ‘jjj’>} • { <2, ‘bb’> } • A B •
Rows as forming a Relation • So, for a given table, the set of all possible rows, considered as tuples, forms the Cartesian product of the value domains of the table. • And, provided the table does not have repeated rows: • AT ANY MOMENTthe actual set of rows, considered as tuples, is a relation on the table’s value domains. • NB: crucial here that no row is exactly repeated, because a mathematical set cannot have repeated elements.
Relation from a Table People • The relationat the moment is • ‘9568876A’, ‘Chopples’, 37 > • ‘2544799Z’, ‘Blurp’, NULL > • ‘1698674F’, ‘Rumpel’, 88 >
A Table as a Relation? • People loosely talk about tables being relations. • This is mathematically inaccurate for several reasons: • The table properly speakingincludes not just the rows but also the attribute names themselves, their domains, specification of primary and foreign keys, etc. • It’s only the rows at any given momentthat form a relation. When a value in the table changes or a row is added or deleted, the mathematical relation is replaced by a different one. • Relations do not cater for tables with repeated rows. • ((But there is a more advanced notion of relation, based on “bags” rather than sets, that does cater for repeated rows.)) • But OK if you know what you (and those people) mean.
((Aside: “Bags” in Maths)) • A variant of sets called “bags” (or “multisets”) is used in maths (and CS) and allows repeated members. There are union, etc. operations that respect the repetitions. • So bags and their operations are a better fit to DB tables and notably their repetition-respecting operations (e.g. UNION ALL) than sets and their operations are. • But bags are non-standard and they’re not normally covered at an introductory level. • See Garcia-Molina et al 2009 for bags and their use in the DB area.
1NF can have Undesirable Dependencies • 1NF tables can contain “partial,” “transitive” and other generally undesirable functional dependencies of an attribute X on a determinant D. • By “undesirable” I will mean mainly that the determinant D is not a superkey, so that at least one attribute Y in the table is not determined by D, • so Y can have different values in the table for equal D values, • so redundancy (repetition of the association betweenD and X values) can arise.
1NF can have Partial Dependencies • Partial dependency: where the determinant is part but not all of the primary key (and NB: is therefore not a superkey) • The determined attribute X is necessarily outside the whole PK—exercise: why?
Second Normal Form • A table is in second normal form (2NF) if: • It is in 1NF and • It includes nopartialdependencies
Conversion to 2NF • For each determinantD involved in a partial dependency in the original table T, • use D as the PK for a new tableNT(D) • and move out the attributes X determined by Dinto NT(D). • D itself stays in T as well as being copied into NT(D).
Second Normal Form (2NF) Conversion results on example on previous slide
But 2NF can still haveUndesirable Dependencies • A prime attribute is one that is within some candidate key • (not necessarily the primary key). • A transitive dependency is where thedeterminant Dis at least partially outside the PK and is not a superkey, • and the determined attribute X is non-prime (and therefore in particular is not inside the PK; the reason for this restriction is on a later slide). • E.g.: previous Figure for simple case of a simple (= one-attribute) determinant. • Above definition is partly based on Garcia-Molina, Ullman & Widom 2009 – see later ref. More general than the account in R&C and R,C&C.
Third Normal Form • A table is in third normal form (3NF)if: • It is in 2NF and • It contains no transitive dependencies
Conversion to 3NF • For each determinantD involved in a transitive dependency in the original table T, • use D as the PK for a new tableNT(D) • and move out the attributes X transitively determined by Dinto NT(D). • NB: the determinants themselves stay in T as well.
Third Normal Form (3NF) Conversion Results on previous example