120 likes | 292 Views
6.830/6.814 Lecture 3. Sam Madden Relational Algebra and Normalization Sept 10, 2014. Relational Algebra. Projection π(R, c1, …, cn ) = π c1… cn R select a subset c1 … cn of columns of R Selection σ ( R , pred ) = σ pred R select a subset of rows that satisfy pred
E N D
6.830/6.814 Lecture 3 Sam Madden Relational Algebra and Normalization Sept 10, 2014
Relational Algebra Projection π(R,c1, …, cn) = πc1…cnR select a subset c1 … cn of columns of R Selection σ(R, pred) = σpredR select a subset of rows that satisfy pred Cross Product (||R|| = #attrs in R, |R| = #rows in row) R1 X R2 (aka Cartesian product) combine R1 and R2, producing a new relation with ||R1|| + ||R2|| attrs, |R1| * |R2| rows Join ⨝(R1, R2, pred) = R1 ⨝predR2 = σpred(R1 X R2)
Relational Algebra SQL • SELECT List Projection • FROM List all tables referenced • WHERE SELECT and JOIN Many equivalent relational algebra expressions to any one SQL query (due to relational identities) Join reordering Select reordering Select pushdown
Example animals(name,age,species,cageno,keptby,feedtime) keepers(kid,name) Cages kept by Joe: πcageno(σname=‘joe’(animals ⨝keptby=kid keepers)) SELECT cageno FROM keepers,animals WHERE keptby=kid AND keeper.name = ‘joe’
Multiple Feedtimes animals:(name STRING,cagenoINT,keptbyINT,ageINT,feedtime TIME) CREATE TABLE feedtimes(aname STRING, feedtimeTIME); ALTER TABLE animals RENAME TO animals2; ALTER TABLE animals2 DROP COLUMN feedtime; CREATE VIEW animals AS SELECT name, cageno, keptby, age, (SELECT feedtime FROM feedtimes WHERE aname=name LIMIT 1) AS feedtime FROM animals2 Views enable logical data independence by emulating old schema in new schema
Study Break # 1 Schema: classes: (cid, c_name, c_rid, …) rooms: (rid, bldg, …) students: (sid, s_name, …) takes: (t_sid, t_cid)
Questions 1) What SQL query is this expression equivalent to: πbldg(rooms ⨝rid=c_rid (σc_name=‘6.830’classes)) 2) Write an equivalent relational algebra expression to: SELECT s_name FROM student,takes,classes WHERE t_sid=sid AND t_cid=cid AND c_name=‘6.830’ a) Are there other possible expressions? b) Do you think one would be more “efficient” to execute? Why?
Hobby Schema Table key is Hobby, SSN “Wide” schema – has redundancy and anomalies in the presence of updates, inserts, and deletes Entity Relationship Diagram SSN Name n:n Person Hobby Address Cost Name
Boyce-CoddNormal Form (BCNF) Aset of relations is in BCNF if: For every functional dependency XY, in a set of functional dependencies F over a relation R, X is a superkey key of R, (where superkey means that X contains a key of R )
BCNFify Start with one "universal relation” While some relation R is not in BCNF Find an FD F=XY that violates BCNF on R Split R into R1 = (X U Y), R2 = R – Y
BCNFify Example for Hobbies Iter 1 S = SSN, H = Hobby, N = Name, A = Addr, C = Cost Iter 2 violates bcnf violates bcnf key Iter 3
Study Break # 2 • Patient database • Want to represent patients at hospitals with doctors • Patients have names, birthdates • Doctors have names, specialties • Hospitals have names, addresses • One doctor can treat multiple patients, each patient has one doctor • Each patient in one hospital, hospitals have many patients • Doctors work for one hospital, hospitals have many doctors 1) Draw an ER diagram 2) What are the functional dependencies 3) What is the normalized schema? Is it redundancy free?