430 likes | 441 Views
Learn the theory behind relational algebra and understand how it empowers you to query databases effectively, exploring operators, set operations, selections, projections, and joins.
E N D
CS4433Database Systems Relational Algebra
Why Do We Learn This? • Until now, we learn how to create a database for a domain • Create ER diagram • Convert ER to Relational Model • Write SQL to create tables for relations in our RM • Now, Querying the database: specify what we want from our database • Find all the people who earn more than $1,000,000 and pay taxes in Oklahoma
Why Do We Learn This? • In SQL we write WHAT we want to get from the data • The database system needs to figure out HOW to get the data we want • The passage from WHAT to HOW goes through the Relational Algebra • Querying the database: specify what we want from our database • Find all the people who earn more than $1,000,000 and pay taxes in Oklahoma • Could write in C++/Java, but a bad idea • Instead use high-level query languages: • Theoretical: Relational Algebra • Practical: SQL • Relational algebra: a basic set of operations on relations that provide the basic principles
What is an “Algebra”? • Mathematical system consisting of: • Operands--- variables or values from which new values can be constructed • Operators--- symbols denoting procedures that construct new values from given values • Examples • Arithmetic(Elementary) algebra, linear algebra, Boolean algebra …… • What are operands? • Arithmetic Algebra: variables like x and constants like 15 • What are operators? • Arithmetic Algebra: addition, subtraction…
What is Relational Algebra? • An algebra • Whose operands are relations or variables that represent relations • Whose operators are designed to do common things that we need to do with relations in a database • relations as input, new relation as output • Can be used as a query languagefor relations!
Relational Operators at a Glance • Five basic RA operations: • Basic Set Operations • union, difference (no intersection, no complement) • Selection: (s) eliminates some rows of relations • Projection: (p) eliminates some columns of relations • Cartesian Product: (X) combine the tuples of two relations • When our relations have attribute names: • Renaming: (r) change the relation schema • Derived operations: • Intersection, complement • Joins (natural join, equi-join, theta join, semi-join, ……)
Set Operations • Union: all tuples in R1 or R2, denoted as R1 U R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • Active-Employees U Retired-Employees • If any, is duplicate elimination required? • yes • Intersection: all tuples in both R1 and R2 as R1 ∩ R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • Active-Employees U Retired-Employees • Difference: all tuples in R1 but not in R2, denoted as R1 – R2 • R1, R2 must have the same schema • R1 - R2 has the same schema as R1, R2 • Example • All-Employees - Retired-Employees
Selection • Applied to a relation R, produce a new relation with a subset of R’s tuples • Returns all tuples (rows) which satisfy a condition that involves the attributes of R, denoted assc(R) • c is a condition: =, <, >,≤, ≥, ≠, AND, OR, NOT • Output schema: same as input schema • Find all employees with salary more than $40,000: • sSalary > 40000(Employee)
Projection • Applied to a relation R, produce a new relation with a subset of R’s columns • Unary operation: returns certain columns, denoted as P A1,…,An(R) • Eliminates duplicate tuples ! • since relations are sets • Input schema R(B1, …, Bm) • Condition: {A1, …, An} {B1, …, Bm} • Output schema S(A1, …, An) • Example: project social-security number and names: • PSSN, Name (Employee)
Selection vs. Projection • Think of relation as a table • How are they similar? • Selection of subsets from table • How are they different? • Horizontal vs. vertical? • selection-projection • Duplicate elimination for both? • No duplicate in the result of selection
Cartesian Product • Each tuple in R1 with each tuple in R2, denoted as R1 x R2 • Input schemas R1(A1,…,An), R2(B1,…,Bm) • Output schema is S(A1, …, An, B1, …, Bm) • Two relations are combined! • Very rare in practice; but joins are very common • Example: Employee x Dependent
Example Employee Dependent Employee x Dependent
Renaming • Does not change the relational instance, denoted as Notation: rS(B1,…,Bn) (R) • Changes the relational schema only • Input schema: R(A1, …, An) • Output schema: S(B1, …, Bn) • Example: • Soc-sec-num, firstname(Employee)
Set Operations: Intersection • Intersection: all tuples both in R1 and in R2, denoted as R1 R2 • R1, R2 must have the same schema • R1 R2 has the same schema as R1, R2 • Example: UnionizedEmployeesRetiredEmployees • Intersection is derived: • R1 R2 = R1 – (R1 – R2) why ? R1 R2 R1 ∩ R2
Theta Join • A join that forces us to pair tuples using one specific condition • involves a predicate q, denoted as R1 q R2 • Input schemas: R1(A1,…,An), R2(B1,…,Bm) • Output schema: S(A1,…,An,B1,…,Bm) • Derived operator: R1 q R2 = sq (R1 x R2) • Take the (Cartisian) product R1 x R2 • Then apply SELECTC to the result • As for SELECT, C can be any Boolean-valued condition
Equi-join • Special case of theta join: condition c contains only conjunctions of equalities • Result schema is the same as that of Cartesian product • May have fewer tuples than Cartesian product • Most frequently used in practice: R1 A=BR2
Example Employee Dependent Employee SSN=Employee-SSN Dependent
Theta Join: Example Sells Bar BarInfo := Sells Sells.Bar=Bar.Name Bar
Natural Join Notation: R1 R2==PL(sq (R1 x R2)) • L is the list of attributes in R and in S that are also not in R • Input Schema: R1(A1, …, An), R2(B1, …, Bm) • Output Schema: S(C1,…,Cp) • Where{C1, …, Cp} = {A1, …, An} U{B1, …, Bm} • Meaning: combine all pairs of tuples in R1 and R2 that agree on the join attributes: • {A1,…,An} {B1,…, Bm}(common attributes in both the schema of R1 and R2 called the join attributes) • Natural join is a particular case of equi-join • A lot of research on how to do it efficiently
Natural Join: Examples Employee Dependent • Employee Dependent = • PSSN, Name, Dependent-Name(sEmployee.SSN=Dependent.SSN(Employee x Dependent)
Natural Join: Examples R S R S
So Which Join Is It ? • When we write R ⨝ S we usually mean an eq-join, but we often omit the equality predicate when it is clear from the context • The join operation in all its variants (eqjoin, natural join, semi-join, outer-join) is at the heart of relational database systems
Building Complex Expressions • Algebras allow us to express sequences of operations in a natural way • Example • In arithmetic algebra: (x + 4)*(y - 3) • Relational algebra allows the same • Three notations, just as in arithmetic: • Sequences of assignment statements • Expressions with several operators • Expression trees
Sequences of Assignments • Create temporary relation names • Renaming can be implied by giving relations a list of attributes • Example: R3 := R1 C R2 can be written: R4 := R1 x R2 (R4: temporary relation) R3 := sC(R4)
Expressions with Several Operators • Example: the theta-join R3 := R1 JOINC R2 can be written: R3 := sC (R1 x R2) • Precedence of relational operators: • Unary operators --- select, project, rename --- have highest precedence, bind first • Then cartesian products and joins • Then intersection • Finally, union and set difference bind last • But you can always insert parentheses to force the order you desire
Expression Trees • Leaves are operands • either variables standing for relations or particular constant relations • Interior nodes are operators, applied to their child or children
UNION RENAMER(name) PROJECTname PROJECTbar SELECTaddr = “Tennessee St.” SELECT price<3 AND beer=“Bud” Expression Tree: Examples Given Bars(name, addr), Sells(bar, beer, price), find the names of all the bars that are either on Tennessee St. or sell Bud for less than $3 Bars Sells
Relational Algebra as a Constraint Language • Two ways to express constraints. • If R is an expression of relational algebra, then R=∅ is a constraint that says ”The value of R must be empty,” or equivalently ”There are no tuples in the result of R”. • If R and S are expressions of relational algebra, then R ⊆ S is a constraint that says ”Every tuple in the result of R must also in the result of S.” Of course the result of S may contain additional tuples not produced by R.
Referential Integrity Constraints • Problem: Assert that a value appearing in one context also appears in another, related context. If any value v as component in attribute A of some tuple in one relation R must appear in a particular component (say for attribute B) of some tuple of another relation S. • Expression: (R) ⊆ (S), or equivalently, (R) − (S) = ∅ • Example: • Movie(title, producerC#) • MovieExec(name, cert#) • Constraint: Every producerC# in Movie table must appear in the cert# of MovieExec. • Solution: (Movie) ⊆ (MovieExec)
Key Constraints • Problem: Express algebraically the constraint that a certain attribute or set of attributes is a key for a relation. No two tuples of relation R agree on attribute A but not agree on attribute B and C. • Expression: =R2.A and (R1.B≠R2.B or R1.C≠R2.C (R1xR2)=∅ • Analysis: R1 and R2 are relations renamed from R. Find all the violate tuples using =R2.A and (R1.B≠R2.B or R1.C≠R2.C (R1xR2) And then assert the constraint by equating the result to ∅.
Additional Constraints • Example: We want the values that may appear in an attribute to be restricted to a small enumerated set of values • Suppose we wish to specify that the only legal values for the gender attribute of MovieStar are ‘F’ and ‘M’. We can express this constrains with algebraically by;
Division Operator (÷) • Attributes of B is proper subset of Attributes of A. • The relation returned by division operator will have attributes = (All attributes of A – All Attributes of B) • The relation returned by division operator will return those tuples from relation A which are associated to every B’s tuple.
Examples • EMP(ssn,name,gender,salary) • Emp_proj(essn,pnumber, hours)
EMP(ssn,name,gender,salary) • Emp_proj(essn,pnumber, hours)
Extended Algebra Operators • Sets v.s. Bags • Sets: {a,b,c}, {a,d,e,f}, { }, . . . • Bags: {a, a, b, c}, {b, b, b, b, b}, . • Multisets allow the sample tuple to appear more than once in a relation • Extensions for bags: • Duplicate elimination δ • Grouping and aggregation γ • Sorting τ
Operators on Bags • Duplicate elimination δ • δ(R) = select distinct * from R • Grouping γ • γA,sum(B) = select A,sum(B) from R group by A • Sorting τ
Relational Algebra • RA = Dataflow Program • Several operations, plus strictly specified order • In RDBMS the dataflow graph is always a tree