Lecture 4: Relational algebra

Lecture 4:Relational algebra www.cl.cam.ac.uk/Teaching/current/Databases/

Today’s lecture • What’s the (core) relational algebra? • How can we write queries using the relational algebra? • How powerful is the relational algebra?

Relational query languages • Query languages allow the manipulation and retrieval of data from a database • The relational model supports simple, powerful query languages • Strong formal foundation • Allows for much (provably correct) optimisation • NOTE: Query languages are not (necessarily) programming languages

Formal relational query languages • Two formal query languages • Relational Algebra • Simple ‘operational’ model, useful for expressing execution plans • RelationalCalculus • Logical model (‘declarative’), useful for theoretical results • Both languages were introduced by Codd in a series of papers • They have equivalent expressive power They are the key to understanding SQL query processing!

Preliminaries • A query is applied to relation instances, and the result of a query is also a relation instance • Schema of relations are fixed (cf. types) • The query will then execute over any valid instance • The schema of the result can also be determined

Example relation instances • A database of boats, sailors, and reservations S2 R1 S1 B1

Core relational algebra • Five basic operator classes: • Selection • Selects a subset of rows • Projection • Picking certain columns • Renaming • Renaming attributes • Set theoretic operations • The familiar operations: union, intersection, difference, … • Products and joins • Combining relations in useful ways

Selection • Selects rows that satisfy a condition, written R1 = c(R2) • where c is a condition involving the attributes of R2, e.g. rating>8(S2) returns the relation instance

Selection cont. • Note: • The schema of the result is exactly the same as the schema of the input relation instance • There are no duplicates in the resulting relation instance (why?) • The resulting relation instance can be used as the input for another relational algebra operator, e.g. sname=“Julia”(rating>8(S2))

Projection Deletes fields that are not in the projection list R1=A(R2) where A is a list of attributes from the schema of R2, e.g. sname,rating(S2) returns the relation instance

Projection cont. • Note: • Projection operator has to eliminate duplicates (why?) • Aside: Real systems don’t normally perform duplicate elimination unless the user explicitly asks for it (why not?)

Renaming R1= A:=B(R2) • Returns a relation instance identical to R2 except that field A is renamed B • For example, sname:=nom(S1)

Familiar set operations • We have the familiar set-theoretic operators, e.g. , , - • There is a restriction on their input relation instances: they must be union compatible • Same number of fields • Same field names and domains • E.g. S1S2 is valid, but S1R1 is not

Cartesian products AB • Concatenate every row of A with every row of B • What do we do if A and B have some field names in common? • Several choices, but we’ll simply assume that the resulting duplicate field names will have the suffix 1 and 2

Note! Example S1R1

Theta join • Theoretically, it is a derived operator R1Vc R2@c(R1R2) • E.g., S1 Vsid.1<=sid.2R1

Theta join cont. • The result schema is the same as for a cross-product • Sometimes this operator is called a conditional join • Most commonly the condition is an equality on field names, e.g. S1 Vsid.1=sid.2R1

Equi- and natural join • Equi-join is a special case of theta join where the condition is equality of field names, e.g. S1 Vsid R1 • Natural join is an equi-join on all common fields where the duplicate fields are removed. It is written simply A V B

Natural join cont. • Note that the common fields appear only once in the resulting relation instance • This operator appears very frequently in real-life queries • It is always implemented directly by the query engine (why?)

Division • Not a primitive operator, but useful to express queries such as Find sailors who have reserved all the boats • Consider the simple case, where relation A has fields x and y, and relation B has field y • A/B is the set of xs (sailors) such that for every y (boat) in B, there is a row (x,y) in A

Division cont. • Can you code this up in the relational algebra?

Division cont. • Can you code this up in the relational algebra? x’s that are disqualified: x((x(A)  B) – A) Thus: x(A)-x((x(A)  B) – A)

Example 1 Find names of sailors who’ve reserved boat 103 Solution 1:sname(bid=103(Reserves) V Sailors) Solution 2: sname(bid=103(Reserves V Sailors)) Which is more efficient? Query optimisation

Example 2 Find names of sailors who’ve reserved a red boat

Example 2 Find names of sailors who’ve reserved a red boat sname(colour=“red”(Boats) V Reserves V Sailors) Better: sname(sid(bid(colour=“red”(Boats)) V Reserves) V Sailors)

Example 3 Find sailors who’ve reserved a red or a green boat

Example 3 Find sailors who’ve reserved a red or a green boat let T = colour=“red”colour=“green”(Boats) in sname(T V Reserves V Sailors)

Example 4 Find sailors who’ve reserved a red and a green boat

Example 4 Find sailors who’ve reserved a red and a green boat NOTE: Can’t just trivially modify last solution! let T1 = sid (colour=“red”(Boats) V Reserves) T2 = sid (colour=“green”(Boats) V Reserves) in sname((T1  T2) V Sailors)

Example 5 Find the names of sailors who’ve reserved at least two boats let T = sid.1:=sid (sid.1,sname,bid (Sailors V Reserves)) in sname.1 (sid.1=sid.2bid.1bid.2(T  T))

Example 6 Find the names of sailors who’ve reserved all boats let T = sid,bid (Reserves) / bid (Boats) in sname(T V Sailors)

Computational limitations • Suppose we have a relation SequelOf of movies and their immediate sequels • We want to compute the relation ‘isFollowedBy’ …

Computational limitations • We could compute fst,thd(movie:=fst,sequel:=snd(SequelOf) V movie:=snd,sequel:=thd(SequelOf)) • This provides us with sequels-of-sequels • We could write three joins to get sequels-of- sequels-of-sequels and union the results • What about Friday the 13th (9 sequels)?  • In general we need to be able to write an arbitrarily large union… • The relational algebra needs to be extended to handle these sorts of queries

Summary You should now understand: • The core relational algebra • Operations and semantics • Union compatibility • Computational limitations of the relational algebra Next lecture: Relational calculus

Lecture 4: Relational algebra