770 likes | 909 Views
CS157A Lecture 23. Revision for Midterm 3 Part 3. Prof. Sin-Min Lee Department of Computer Science. Set Operators. Relation is a set of tuples, so set operations should apply: , , (set difference)
E N D
CS157A Lecture 23 Revision for Midterm 3 Part 3 Prof. Sin-Min Lee Department of Computer Science
Set Operators • Relation is a set of tuples, so set operations should apply: , , (set difference) • Result of combining two relations with a set operator is a relation; hence all its elements must be tuples having the same structure • Hence, scope of set operations limited to union compatible relations
Union Compatible Relations • Two relations are union compatible if • Both have same number of columns • Names of attributes are the same in both • Attributes with the same name in both relations have the same domain • Union compatible relations can be combined using union, intersection, and setdifference
Example Tables: Person(SSN, Name, Address, Hobby) Professor(Id, Name, Office, Phone) are not union compatible. But Name(Person)and Name(Professor) are union compatible so Name(Person) -Name(Professor) makes sense.
Cartesian Product • If Rand Sare two relations, RS is the set of all concatenated tuples <x,y>, where x is a tuple in R and y is a tuple in S (but see naming problem next) • RS is expensive to compute: • Factor of two in the size of each row • Quadratic in the number of rows A B C D A B C D x1 x2 y1 y2 x1 x2 y1 y2 x3 x4 y3 y4 x1 x2 y3 y4 x3 x4 y1 y2 RS x3 x4 y3 y4 RS
Renaming in Cartesian Product Result of expression evaluation is a relation. Attributes of relation must have distinct names. So what do we do if they don’t? E.g., suppose R(A,B) and S(A,C) and we wish to compute RS . One solution is to rename the attributes of the answer: RS( R.A, R.B, S.A, S.C) Although only A needs to be renamed, it is“cleaner” to rename them all.
Renaming Operator • Previous solution is used whenever possible but it won’t work when R is the same as S. • Renaming operator resolves this. It allows to assign any desired names, say A1, A2,… An , to the attributes of the n column relation produced by expression expr with the syntax expr [A1, A2, … An]
Example Transcript (StudId, CrsCode, Semester, Grade) Teaching (ProfId, CrsCode, Semester) StudId, CrsCode(Transcript)[StudId, CrsCode1] ProfId, CrsCode(Teaching) [ProfId, CrsCode2] This is a relation with 4 attributes: StudId, CrsCode1, ProfId, CrsCode2
Derived Operation: Join A (general or theta) join of R and S is the expression Rjoin-conditionS where join-condition is a conjunction of terms: Ai oper Bi in which Ai is an attribute of R;Bi is an attribute of S; and oper is one of =, <, >, , . The meaning is: join-condition´(R S) where join-condition and join-condition´ are the same, except for possible renamings of attributes caused by the Cartesian product.
Theta Join – Example Employee(Name,Id,MngrId,Salary) Manager(Name,Id,Salary) Output the names of all employees that earn more than their managers. Employee.Name(EmployeeMngrId=Id AND Salary>SalaryManager) The join yields a table with attributes: Employee.Name, Employee.Id, Employee.Salary, Employee.MngrId Manager.Name, Manager.Id, Manager.Salary
Relational Algebra • Relational algebra operations operate on relations and produce relations (“closure”) f: Relation -> Relation f: Relation x Relation -> Relation • Six basic operations: • Projection A (R) • Selection (R) • Union R1[ R2 • Difference R1– R2 • Product R1£ R2 • (Rename) A->B (R)
Equijoin Join - Example Equijoin: Join condition is a conjunction of equalities. Name,CrsCode(Student Id=StudId Grade=‘A’ (Transcript)) Student Transcript Id Name Addr Status 111 John ….. ….. 222 Mary ….. ….. 333 Bill ….. ….. 444 Joe ….. ….. StudId CrsCode Sem Grade 111 CSE305 S00 B 222 CSE306 S99 A 333 CSE304 F99 A The equijoin is used very frequently since it combines related data in different relations. Mary CSE306 Bill CSE304
Natural Join • Special case of equijoin + a special projection • join condition equates all and only those attributes with the same name (condition doesn’t have to be explicitly stated) • duplicate columns eliminated (projected out) from the result Transcript (StudId, CrsCode, Sem, Grade) Teaching (ProfId, CrsCode, Sem) Teaching = Transcript StudId, Transcript.CrsCode, Transcript.Sem, Grade, ProfId (Transcript CrsCode=CrsCodeANDSem=Sem Teaching ) [StudId, CrsCode, Sem, Grade, ProfId]
Natural Join (cont’d) • More generally: R S = attr-list(join-cond(R × S) ) where attr-list = attributes (R) attributes (S) (duplicates are eliminated) and join-cond has the form: A1 = A1AND … ANDAn = An where {A1 … An} = attributes(R) attributes(S)
Natural Join Example • List all Ids of students who took at least two different courses: StudId( CrsCode CrsCode2 ( Transcript Transcript[StudId, CrsCode2, Sem2, Grade2] )) We don’t want to join on CrsCode, Sem, and Grade attributes, hence renaming!
Example Data Instance STUDENT COURSE Takes PROFESSOR Teaches
Natural Join and Intersection Natural join: special case of join where is implicit – attributes with same name must be equal: STUDENT ⋈ Takes ´ STUDENT ⋈STUDENT.sid = Takes.sid Takes Intersection: as with set operations, derivable from difference A B B-A A-B A B
Division • A somewhat messy operation that can be expressed in terms of the operations we have already defined • Used to express queries such as “The fid's of faculty who have taught all subjects” • Paraphrased: “The fid’s of professors for which there does not exist a subject that they haven’t taught”
Division Using Our Existing Operators • All possible teaching assignments: Allpairs: • NotTaught, all (fid,subj) pairs for which professor fidhas not taught subj: • Answer is all faculty not in NotTaught: fid,subj (PROFESSOR £subj(COURSE)) Allpairs - fid,subj(Teaches ⋈ COURSE) fid(PROFESSOR) - fid(NotTaught) ´ fid(PROFESSOR) - fid( fid,subj (PROFESSOR £subj(COURSE)) - fid,subj(Teaches ⋈ COURSE))
Division: R1¸ R2 • Requirement: schema(R1) ¾ schema(R2) • Result schema: schema(R1) – schema(R2) • “Professors who have taught all courses”: • What about “Courses that have been taught by all faculty”? fid (fid,subj(Teaches ⋈ COURSE) ¸subj(COURSE))
Division • Goal: Produce the tuples in one relation, r, that match all tuples in another relation, s • r (A1, …An, B1, …Bm) • s (B1 …Bm) • r/s, with attributes A1, …An, is the set of all tuples <a> such that for every tuple <b> ins,<a,b> is in r • Can be expressed in terms of projection, set difference, and cross-product
Division - Example • List the Ids of students who have passed all courses that were taught in spring 2000 • Numerator: • StudId and CrsCode for every course passed by every student: StudId, CrsCode (Grade ‘F’ (Transcript) ) • Denominator: • CrsCode of all courses taught in spring 2000 CrsCode(Semester=‘S2000’ (Teaching) ) • Result is numerator/denominator
Relational Calculus • Important features: • Declarative formal query languages for relational model • Based on the branch mathematical logic known as predicate calculus • Two types of RC: • 1) tuple relational calculus • 2) domain relational calculus • A single statement can be used to perform a query
Tuple Relational Calculus • based on specifying a number of tuple variables • a tuple variable refers to any tuple
Generic Form • {t | COND (t)} • where • t is a tuple variable and • COND(t) is Boolean expression involving t
Simple example 1 • To find all employees whose salary is greater than $50,000 • {t| EMPLOYEE(t) and t.Salary>5000} • where • EMPLOYEE(t) specifies the range of tuple variable t • The above operation selects all the attributes
Simple example 2 • To find only the names of employees whose salary is greater than $50,000 • {t.FNAME, t.NAME| EMPLOYEE(t) and t.Salary>5000} • The above is equivalent to • SELECT T.FNAME, T.LNAME • FROM EMPLOYEE T • WHERE T.SALARY > 5000
Elements of a tuple calculus • In general, we need to specify the following in a tuple calculus expression: • Range Relation (I.e, R(t)) = FROM • Selected combination= WHERE • Requested attributes= SELECT
More Example:Q0 • Retrieve the birthrate and address of the employee(s) whose name is ‘John B. Smith’ • {t.BDATE, t.ADDRESS| EMPLOYEE(t) AND t.FNAME=‘John’ AND t.MINIT=‘B” AND t.LNAME=‘Smith}
Formal Specification of tuple Relational Calculus • A general format: • {t1.A1, t2.A2,…,tn.An |COND ( t1 ,t2 ,…, tn, tn+1, tn+2,…,tn+m)} • where • t1,…,tn+m are tuple var • Ai : attributeR(ti) • COND (formula) • Where COND corresponds to statement about the world, which can be True or False
Elements of formula • A formula is made of Predicate Calculus atoms: • an atom of the from R(ti) • ti.A op tj.B op{=, <,>,..} • F1 And F2 where F1 and F2 are formulas • F1 OR F2 • Not (F1) • F’=(t) (F) or F’= (t) (F) • Y friends (Y, John) • X likes(X, ICE_CREAM)
Example Queries Using the Existential Quantifier • Retrieve the name and address of all employees who work for the ‘ Research ’ department • {t.FNAME, t.LNAME, t.ADDRESS| EMPLOYEE(t) AND ( d) (DEPARTMENT (d) AND d.DNAME=‘Research’ AND d.DNUMBER=t.DNO)}
More Example • For every project located in ‘Stafford’, retrieve the project number, the controlling department number, and the last name, birthrate, and address of the manger of that department.
Cont. • {p.PNUMBER,p.DNUM,m.LNAME,m.BDATE, m.ADDRESS|PROJECT(p) and EMPLOYEE(M) and P.PLOCATION=‘Stafford’ and ( d) (DEPARTMENT(D) AND P.DNUM=d.DNUMBER and d.MGRSSN=m.SSN))}
Logical Equivalences • There are two logical equivalences that will be heavily used: • pq p q (Whenever p is true, q must also be true.) • x. p(x) x. p(x) (p is true for all x) • The second can be a lot easier to check!
Normalization • Review on Keys • superkey: a set of attributes which will uniquely identify each tuple in a relation • candidate key: a minimal superkey • primary key: a chosen candidate key • secondary key: all the rest of candiate keys • prime attribute: an attribute that is a part of a candidate key (key column) • nonprime attribute: a nonkey column
Normalization • Functional Dependency Type by Keys • ‘whole (candidate) key nonprime attribute’: full FD (no violation) • ‘partial key nonprime attribute’: partial FD (violation of 2NF) • ‘nonprime attribute nonprime attribute’: transitive FD (violation of 3NF) • ‘not a whole key prime attribute’: violation of BCNF
Functional Dependencies • Let R be a relation schema R and R • The functional dependency holds onR iff for any legal relations r(R), whenever two tuples t1and t2 of r have same values for , they have same values for . t1[] = t2 [] t1[ ] = t2 [ ] • On this instance, AB does NOT hold, but BA does hold. A B • 4 • 1 5 • 3 7
1. Closure • Given a set of functional dependencies, F, its closure, F+ , is all FDs that are implied by FDs in F. • e.g. If A B, and B C, • then clearly A C
Armstrong’s Axioms • We can find F+ by applying Armstrong’s Axioms: • if , then (reflexivity) • if , then (augmentation) • if , and , then (transitivity) • These rules are • sound (generate only functional dependencies that actually hold) and • complete (generate all functional dependencies that hold).
Additional rules • If and , then (union) • If , then and (decomposition) • If and , then (pseudotransitivity) The above rules can be inferred from Armstrong’s axioms.
Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • Some members of F+ • A H • by transitivity from A B and B H • AG I • by augmenting A C with G, to get AG CG and then transitivity with CG I • CG HI • by augmenting CG I to infer CG CGI, and augmenting of CG H to inferCGI HI, and then transitivity
2. Closure of an attribute set • Given a set of attributes A and a set of FDs F, closure of A under F is the set of all attributes implied by A • In other words, the largest B such that: • A B • Redefining super keys: • The closure of a super key is the entire relation schema • Redefining candidate keys: • 1. It is a super key • 2. No subset of it is a super key
Computing the closure for A • Simple algorithm • 1. Start with B = A. • 2. Go over all functional dependencies, , in F+ • 3. If B, then • Add to B • 4. Repeat till B changes
Example • R = (A, B, C, G, H, I)F = { A BA CCG HCG IB H} • (AG) + ? • 1. result = AG 2. result = ABCG (A C and A B) 3. result = ABCGH (CG H and CG AGBC) 4. result = ABCGHI (CG I and CG AGBCH Is (AG) a candidate key ? 1. It is a super key. 2. (A+) = BC, (G+) = G. YES.
Uses of attribute set closures • Determining superkeys and candidate keys • Determining if A B is a valid FD • Check if A+ contains B • Can be used to compute F+