680 likes | 827 Views
The Relational Model. CS145 Lecture 2 September 24, 2014. Announcements. Piazza works! There are links on the page Please please sign up. Almost all course announcements on Piazza.
E N D
The Relational Model CS145 Lecture 2 September 24, 2014
Announcements • Piazza works! There are links on the page • Please please sign up. Almost all course announcements on Piazza. • Students with documented disabilities should send in their accommodation letter from O.A.E. (Office of Accessible Education) by the end of this week to Firas at fabuzaid@stanford.edu. • All quizzes for EdX are online • They may cover slightly different material… Take them! • Quizzes are due either before midterm or final day. • Problem sets will be posted on the main web page. • Direct link to this class video is up • Firas is doing a great job.
Outline • Motivation and History • The Relational (data) model • Relational Algebra
Motivation Relational model is precise, implementable, and we can operate on it (query/update, etc.) It also happens to be the dominant paradigm for managing data in the world.
A little history Relational model due to Edgar “Ted” Codd, a mathematician at IBM in 1970 A Relational Model of Data for Large Shared Data Banks". Communications of the ACM 13 (6): 377–387 Won Turing award 1981 IBM didn’t want to use relational model (take money from IMS)
Outline • Motivation and History • The Relational (data) model • Relational Algebra
The Relational Model: Schemata • Part (I) Relational Schema: • Students(sid: string, name: string, gpa: float) Attributes Relation name string, float, int are the domains of the attributes
The Relational Model II: Data • Students(sid: string, name: string, gpa: float) Attributes # of attributes is the arity NB: In practice, DB systems relax the set requirement. Why? Tuples / Records # of tuples is the cardinality A relational instance is a set of tuples all conforming to the same schema
To reiterate • A relational schema describes the data that is contained in a relational instance A relational instance is a set of tuples of the same type. Let R(f1:D1,…,fm:Dm) be a relational schemathen, an instance of R is a subset of Dom1 x Dom2 x … x Domn A tuple viewed as a total function from attribute names to types (names important)
One more time • A relational schema describes the data that is contained in a relational instance • A relation R is of arity t is a function: • R : D1 x … x Dt {0,1} The schema is simply the signature of the function. In this model : order matters, but attribute name doesn’t. In other model: names matter, but order doesn’t
A relational database • A relational database schema is a set of relational schemata, one for each relation • A relational database instance is a set of relational instances, one for each relation • Two conventions: • We call instances as simply databases • We assume all instances are valid, i.e., satisfy the domain constraints
Remember the CMS Domain Constraints: “Apple” is not a valid GPA; no instance contains it. • Relation DB Schema • Students(sid: string, name: string, gpa: float) • Courses(cid: string, cname: string, credits: int) • Enrolled(sid: string, cid: string, grade: string) Relation Instances Students Courses Enrolled
Ok, but you said this was operational… • SQL Data Definition Language (DDL) is one method of defining relations • In SQL, Relation → table • Students(sid: string, name: string, gpa: float) CREATE TABLE Students ( sid CHAR(20), Name CHAR(50), Gpa REAL); This awesome prehistoric syntax says same thing!
2nd Part of the Model: Querying SELECT S.name FROM Students S WHERE S.GPA > 3.5 “Find names of all students with GPA > 3.5” We will spend more time on this! We don’t tell the system how or where to get the data. The query is declarative Actually, I showed how to do this translation for a much richer language!
Virtues of the model • Physical independence (logical too), Declarative • Simple, elegant clean: Everything is a relation • Why did it take multiple years? • Doubted it could be done efficiently.
RDBMS Architecture How does a SQL engine work ? • SQL query relational algebra plan • Relational algebra plan Optimized plan • Execute each operator of the plan This week we go top-down to learn how this works We cover Ch. 4 and start Ch.14
Relational Algebra • Formalism for creating new relations from existing ones • Its place in the big picture: Declartivequerylanguage Algebra Implementation Relational algebraRelational bag algebra SQL,relational calculus
Relational Algebra • Five operators: • Union: • Difference: - • Selection:s • Projection: P • Cartesian Product: • Derived or auxiliary operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • Renaming:r • Division
1. Union and 2. Difference • R1 R2 • Example: • ActiveEmployees RetiredEmployees • R1 – R2 • Example: • AllEmployees -- RetiredEmployees
What about Intersection ? • It is a derived operator • R1 R2 = R1 – (R1 – R2) • Also expressed as a join (will see later) • Example • UnionizedEmployees RetiredEmployees
3. Selection • Returns all tuples which satisfy a condition • Notation: sc(R) • Examples • sSalary > 40000(Employee) • sname = “Smith”(Employee) • The condition c can be =, <, , >,, <>
4. Projection • Eliminates columns, then removes duplicates • Notation: P A1,…,An(R) • Example: project social-security number and names: • PSSN, Name (Employee) • Output schema: Answer(SSN, Name)
5. Cartesian Product • Each tuple in R1 with each tuple in R2 • Notation: R1 R2 • Example: • Employee Dependents • Very rare in practice; mainly used to express joins
Relational Algebra • Five operators: • Union: • Difference: - • Selection: s • Projection: P • Cartesian Product: • Derived or auxiliary operators: • Intersection, complement • Joins (natural,equi-join, theta join, semi-join) • Renaming: r
Renaming • Changes the schema, not the instance • Notation: rB1,…,Bn (R) • Example: • rLastName, SocSocNo (Employee) • Output schema: Answer(LastName, SocSocNo)
Renaming Example Employee Name SSN John 999999999 Tony 777777777 • LastName, SocSocNo (Employee) LastName SocSocNo John 999999999 Tony 777777777
Natural Join • Notation: R1 || R2 • Meaning: R1 || R2 = PA(sC(R1 R2)) • Where: • The selection sCchecks equality of all common attributes • The projection eliminates the duplicate common attributes
Employee Dependents = PName, SSN, Dname(sSSN=SSN2(Employee xrSSN2, Dname(Dependents)) Natural Join Example Employee Name SSN John 999999999 Tony 777777777 Dependents SSN Dname 999999999 Emily 777777777 Joe Name SSN Dname John 999999999 Emily Tony 777777777 Joe
Natural Join • R= S= • R || S=
Natural Join • Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R || S ? • Given R(A, B, C), S(D, E), what is R || S ? • Given R(A, B), S(A, B), what is R || S ?
Theta Join • A join that involves a predicate • R1 ||q R2 = sq (R1 R2) • Here q can be any condition
Eq-join • A theta join where q is an equality • R1 ||A=B R2 = s A=B (R1 R2) • Example: • Employee ||SSN=SSN Dependents • Most useful join in practice
Semijoin • R | S = PA1,…,An (R || S) • Where A1, …, An are the attributes in R • Example: • Employee | Dependents
Semijoins in Distributed Databases • Semijoins are used in distributed databases Dependents Employee network Employee ||ssn=ssn (s age>71 (Dependents)) T = PSSNs age>71 (Dependents) R = Employee | T Answer = R || Dependents
seller-ssn=ssn pid=pid buyer-ssn=ssn Complex RA Expressions Pname Person Purchase Person Product Pssn Ppid sname=fred sname=gizmo
Division • Not a primitive operator: “find sailors who have reserved all boats” • Let A have 2 fields x,y and B have one field then y then • A/B = { x | (x,y) in A . Forally in B } • A/B = C implies BC <= A (and the largest)
Expressing A/B using other ops • Idea: For A/B, compute all x values that are disqualified by some y value in B πx((πx(A) times B)) – A Then remove these from the set.
Quiz (Not really)Express these in RA • “Find names of Sailors who’ve reserved boat #103” • “Find names of sailors who’ve reserved a red boat” • “Find sailors who reserved a red or green boat” Sailors(sid,sname,rating,age) Reserves(sid,bid,date) Boats(bid,bname,color)
Operations on Bags A bag = a set with repeated elements All operations need to be defined carefully on bags • {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} • {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b} • sC(R): preserve the number of occurrences • PA(R): no duplicate elimination • Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets !
Finally: RA has Limitations ! • Cannot compute “transitive closure” • Find all direct and indirect relatives of Fred • Cannot express in RA !!! • Need to write C program, use a graph engine, or modern SQL…
Relational Algebra • Five operators: • Union: • Difference: - • Selection:s • Projection: P • Cartesian Product: • Derived or auxiliary operators: • Intersection, complement • Joins (natural, equi-join, theta join, semi-join) • Renaming:r • Division Maps sets of tuples to sets of tuples Why are joins typically included?
Operations on Bags A bag = a set with repeated elements All operations need to be defined carefully on bags • {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} • {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b} • sC(R): preserve the number of occurrences • PA(R): no duplicate elimination • Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets !
Logical Equivalence of Plans R(A,B) S(B,C) s[A=5] (p[A] R) = p[A] (s[A=5] R) Here, projection and selection commute p[B] (s[A=5] R)? Can we play the same game here?
A simple plan p[B] R(A,B) S(B,C) What SQL query does this correspond to?
Pushing down projection p[B] p[B] p[B] R(A,B) S(B,C) R(A,B) S(B,C) Why might we prefer this plan?
Takeaways • This process is called logical optimization • Many equivalent plans used to search for “good plans” (more later) • Relational Algebra is an important abstraction.