The Relational Model: Relational Calculus

The Relational Model: Relational Calculus CSE330/CIS550 Handout 2

Relational Calculus • First-order logic (FOL) can also be thought of as a query language, and can be used in two ways: • Tuple relational calculus • Domain relational calculus • The difference is the level at which variables are used: for attributes (domains) or for tuples. • The calculus is non-procedural (declarative) as compared to the algebra. CSE330/CIS550 Handout 2

Domain relational calculus • Queries have form: {<x1,x2, …, xn>|p} where x1,x2, …, xn are domain variables and p is a predicate which may mention the variables x1,x2, …, xn • Example: simple projection {<RN,H>|RI,G,R. <RI,RN,G,R,H>Routes} • Example: selection and projection: {<RN,H>|RI,G,R. <RI,RN,G,R,H>Routes  G >15} CSE330/CIS550 Handout 2

DRC examples, cont • Join: {<CI,R>|RI,RN,G,H,RI’,Da,Du. <RI,RN,G,R,H>Routes <CI,RI’,Da,Du>Climbs  RI=RI’} We could also have written the above as: {<CI,R>|RI,RN,G,H,Da,Du. <RI,RN,G,R,H>Routes <CI,RI,Da,Du>Climbs} CSE330/CIS550 Handout 2

Predicate Logic - a quick review • The syntax of predicate logic starts with variables, constants and predicates that can be built using a collection of boolean-valued operators (boolean expressions) • Examples:1=2, x  y, prime(x), contains(t,”Joe”). Precisely what operations are available depends on the domain and on the query language. • For now we will assume the following boolean expressions: • <X,Y,…> Rel, X op Y, X op constant, or constant op X, where op is , , , , ,  and X,Y,… are domain variables CSE330/CIS550 Handout 2

Predicate Logic, cont. • Starting with these basic predicates (also called atomic) , we can build up new predicates by the following rules: • Logical connectives: If p and q are predicates, then so are pq, pq, p, and pq • (x>2) (x<4) • (x>2)  (x>0) • Existential quantification: If p is a predicate, then so isx.p • x. (x>2) (x<4) • Universal quantification: If p is a predicate, then so isx.p • x.x>2 • x. y.y>x CSE330/CIS550 Handout 2

Logical Equivalences • There are two logical equivalences that will be heavily used: • pq p  q (Whenever p is true, q must also be true.) • x. p(x)  x. p(x) (p is true for all x) • The second will be especially important when we study SQL. CSE330/CIS550 Handout 2

Free and bound variables • A variable v is bound in a predicate p when p is of the form v… orv… • A variable occurs free in p if it occurs in a position where it is not bound by an enclosing or  • Examples: • x is free in x>2 • x is bound inx.x>y • x is free in (x>17)  (x.x>2) • Note that there are two occurrences of x in the last example. CSE330/CIS550 Handout 2

Renaming variables • When a variable is boundone can replace it with some other variable without altering the meaning of the expression, providing there are no name clashes • Example: x.x>2 is equivalent toy.y>2 CSE330/CIS550 Handout 2

Some queries… • Try the following examples: • The names and ages of climbers • The names and ages of climbers who have climbed route 214 • The names of climbers who have climbed “Last Tango” • The names of climbers who have climbed all routes with rating greater than 15 • The names of climbers who have climbed the same route twice CSE330/CIS550 Handout 2

Safety • There is a problem with what we have done so far. How should we treat a query like: {<CI,CN,S,A>|<CI,CN,S,A>  Climbers>} • This presumably means the set of all four-tuples that are not climbers, which is presumably an infinite set (and unsafe query). • A query is safe if no matter how we instantiate the relations, it always produces a finite answer. • In particular, the query should be domain independent, meaning that the answer is the same regardless of the domain in which it is evaluated. • Unfortunately, both this definition of safety and domain independence are semantic conditions, and are undecidable. CSE330/CIS550 Handout 2

Syntactic Safety • There are syntactic conditions that are used to define “safe” formulas. In particular: • Every “safe” formula is domain independent. • It is implementable. • The formulas that are expressible in real query languages based on relational calculus are all “safe” • The definition is complicated, and is not in the text book. It can be found in Ullman’s book on databases (Principles of Database and Knowledge-Base Systems). CSE330/CIS550 Handout 2

Translating from RA to DRC • Recall that the relational algebra consists of , , , x, -. We need to work our way through the structure of an RA expression, translating each possible form. • Let TR[e] be the translation of RA expression e into DRC. • Relation names: For the RA expression R, the DRC expression is {<x1,x2, …, xn>| <x1,x2, …, xn> R} CSE330/CIS550 Handout 2

Selection • Suppose the RA expression is c(e’), where e’ is another RA expression with TR[e’]= {<x1,x2, …, xn>| p} Then the translation ofc(e’) is {<x1,x2, …, xn>| pC’}, where C’ is the condition obtained from C by replacing each attribute with the corresponding variable. • Example: TR[#1=#2 #4>2.5R] (where R has arity 4) is {<x1,x2, x3, x4>|< x1,x2, x3, x4>  R  x1=x2  x4>2.5} CSE330/CIS550 Handout 2

Projection • If TR[e]= {<x1,x2, …, xn>| p} then TR[i1,i2,…,im(e)]= {<x i1,x i2, …, x im>|  xj1,xj2, …, xjk.p}, where xj1,xj2, …, xjk are variablesin x1,x2, …, xn that are not inx i1,x i2, …, x im • Example: With R as before, #1,#3 (R)={<x1,x3>| x2,x4. <x1,x2, x3,x4> R} CSE330/CIS550 Handout 2

Union • We know that R and S in RS must be union compatible, so they must have the same arity. Therefore we can assume that for e1e2, where e1, e2are algebra expressions, TR[e1]={<x1,…,xn>|p} and TR[e2]={<y1,…yn>|q}. Relabel the variables in the second so that TR[e2]={< x1,…,xn>|q’}. This may involve relabeling bound variables in q to avoid clashes. Then TR[e1e2]={<x1,…,xn>|pq’}. • Example:TR[RS]= {< x1,x2, x3,x4>| <x1,x2, x3,x4>R  <x1,x2, x3,x4>S CSE330/CIS550 Handout 2

Other binary operators • Difference: The same conditions hold as for union. So TR[e1]={<x1,…,xn>|p} and TR[e2]={< x1,…,xn>|q}. Then TR[e1-e2]= {<x1,…,xn>|pq} • Product: If TR[e1]={<x1,…,xn>|p} and TR[e2]={< y1,…,ym>|q}, then TR[e1 e2]= {<x1,…,xn, y1,…,ym >| pq} • Example: TR[RS]= {<x1,…,xn, y1,…,ym >| <x1,…,xn> R  <y1,…,ym > S } CSE330/CIS550 Handout 2

Summary • We’ve seen how to translate relational algebra into (domain) relational calculus. • There are various syntactic restrictions for guaranteeing the safety of a DRC query. From any of these we can translate back into relational algebra • It was this correspondence between an (implementable and optimizable) algebra and first-order logic that was responsible for the initial development of relational databases – a prime example of some theory leading to highly successful practical developments! CSE330/CIS550 Handout 2

What we cannot compute with relational algebra • Aggregate operations, e.g. “The number of climbers who have climbed ‘Last Tango’” or “The average age of climbers.” These are possible in SQL. • Recursive queries. Given a relation Parent(Parent, Child) compute the ancestor relation. This appears to call for an arbitrary number of joins. It is known that it cannot be expressed in first-order logic, hence it cannot be expressed in relational algebra. CSE330/CIS550 Handout 2

What we cannot compute with relational algebra, cont • Computing with complex structures that are not (1NF) relations, e.g. lists, arrays, multisets. • Of course, we can always compute such things if we can “talk to” a database from a full-blown (Turing complete) programming language, and we’ll see how to do this later. However, communicating with a database in this way may well be inefficient, and adding computational power to a query language remains an important research topic. CSE330/CIS550 Handout 2

The Relational Model: Relational Calculus