1 / 43

CS4433 Database Systems

Learn the theory behind relational algebra and understand how it empowers you to query databases effectively, exploring operators, set operations, selections, projections, and joins.

Download Presentation

CS4433 Database Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS4433Database Systems Relational Algebra

  2. Why Do We Learn This? • Until now, we learn how to create a database for a domain • Create ER diagram • Convert ER to Relational Model • Write SQL to create tables for relations in our RM • Now, Querying the database: specify what we want from our database • Find all the people who earn more than $1,000,000 and pay taxes in Oklahoma

  3. Why Do We Learn This? • In SQL we write WHAT we want to get from the data • The database system needs to figure out HOW to get the data we want • The passage from WHAT to HOW goes through the Relational Algebra • Querying the database: specify what we want from our database • Find all the people who earn more than $1,000,000 and pay taxes in Oklahoma • Could write in C++/Java, but a bad idea • Instead use high-level query languages: • Theoretical: Relational Algebra • Practical: SQL • Relational algebra: a basic set of operations on relations that provide the basic principles

  4. What is an “Algebra”? • Mathematical system consisting of: • Operands--- variables or values from which new values can be constructed • Operators--- symbols denoting procedures that construct new values from given values • Examples • Arithmetic(Elementary) algebra, linear algebra, Boolean algebra …… • What are operands? • Arithmetic Algebra: variables like x and constants like 15 • What are operators? • Arithmetic Algebra: addition, subtraction…

  5. What is Relational Algebra? • An algebra • Whose operands are relations or variables that represent relations • Whose operators are designed to do common things that we need to do with relations in a database • relations as input, new relation as output • Can be used as a query languagefor relations!

  6. Relational Operators at a Glance • Five basic RA operations: • Basic Set Operations • union, difference (no intersection, no complement) • Selection: (s) eliminates some rows of relations • Projection: (p) eliminates some columns of relations • Cartesian Product: (X) combine the tuples of two relations • When our relations have attribute names: • Renaming: (r) change the relation schema • Derived operations: • Intersection, complement • Joins (natural join, equi-join, theta join, semi-join, ……)

  7. Set Operations • Union: all tuples in R1 or R2, denoted as R1 U R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • Active-Employees U Retired-Employees • If any, is duplicate elimination required? • yes • Intersection: all tuples in both R1 and R2 as R1 ∩ R2 • R1, R2 must have the same schema • R1 U R2 has the same schema as R1, R2 • Example: • Active-Employees U Retired-Employees • Difference: all tuples in R1 but not in R2, denoted as R1 – R2 • R1, R2 must have the same schema • R1 - R2 has the same schema as R1, R2 • Example • All-Employees - Retired-Employees

  8. Selection • Applied to a relation R, produce a new relation with a subset of R’s tuples • Returns all tuples (rows) which satisfy a condition that involves the attributes of R, denoted assc(R) • c is a condition: =, <, >,≤, ≥, ≠, AND, OR, NOT • Output schema: same as input schema • Find all employees with salary more than $40,000: • sSalary > 40000(Employee)

  9. Projection • Applied to a relation R, produce a new relation with a subset of R’s columns • Unary operation: returns certain columns, denoted as P A1,…,An(R) • Eliminates duplicate tuples ! • since relations are sets • Input schema R(B1, …, Bm) • Condition: {A1, …, An} {B1, …, Bm} • Output schema S(A1, …, An) • Example: project social-security number and names: • PSSN, Name (Employee)

  10. Selection vs. Projection • Think of relation as a table • How are they similar? • Selection of subsets from table • How are they different? • Horizontal vs. vertical? • selection-projection • Duplicate elimination for both? • No duplicate in the result of selection

  11. Cartesian Product • Each tuple in R1 with each tuple in R2, denoted as R1 x R2 • Input schemas R1(A1,…,An), R2(B1,…,Bm) • Output schema is S(A1, …, An, B1, …, Bm) • Two relations are combined! • Very rare in practice; but joins are very common • Example: Employee x Dependent

  12. Example Employee Dependent Employee x Dependent

  13. Renaming • Does not change the relational instance, denoted as Notation: rS(B1,…,Bn) (R) • Changes the relational schema only • Input schema: R(A1, …, An) • Output schema: S(B1, …, Bn) • Example: • Soc-sec-num, firstname(Employee)

  14. Set Operations: Intersection • Intersection: all tuples both in R1 and in R2, denoted as R1 R2 • R1, R2 must have the same schema • R1 R2 has the same schema as R1, R2 • Example: UnionizedEmployeesRetiredEmployees • Intersection is derived: • R1 R2 = R1 – (R1 – R2) why ? R1 R2 R1 ∩ R2

  15. Theta Join • A join that forces us to pair tuples using one specific condition • involves a predicate q, denoted as R1 q R2 • Input schemas: R1(A1,…,An), R2(B1,…,Bm) • Output schema: S(A1,…,An,B1,…,Bm) • Derived operator: R1 q R2 = sq (R1 x R2) • Take the (Cartisian) product R1 x R2 • Then apply SELECTC to the result • As for SELECT, C can be any Boolean-valued condition

  16. Equi-join • Special case of theta join: condition c contains only conjunctions of equalities • Result schema is the same as that of Cartesian product • May have fewer tuples than Cartesian product • Most frequently used in practice: R1 A=BR2

  17. Example Employee Dependent Employee SSN=Employee-SSN Dependent

  18. Theta Join: Example Sells Bar BarInfo := Sells Sells.Bar=Bar.Name Bar

  19. Natural Join Notation: R1 R2==PL(sq (R1 x R2)) • L is the list of attributes in R and in S that are also not in R • Input Schema: R1(A1, …, An), R2(B1, …, Bm) • Output Schema: S(C1,…,Cp) • Where{C1, …, Cp} = {A1, …, An} U{B1, …, Bm} • Meaning: combine all pairs of tuples in R1 and R2 that agree on the join attributes: • {A1,…,An} {B1,…, Bm}(common attributes in both the schema of R1 and R2 called the join attributes) • Natural join is a particular case of equi-join • A lot of research on how to do it efficiently

  20. Natural Join: Examples Employee Dependent • Employee Dependent = • PSSN, Name, Dependent-Name(sEmployee.SSN=Dependent.SSN(Employee x Dependent)

  21. Natural Join: Examples R S R S

  22. So Which Join Is It ? • When we write R ⨝ S we usually mean an eq-join, but we often omit the equality predicate when it is clear from the context • The join operation in all its variants (eqjoin, natural join, semi-join, outer-join) is at the heart of relational database systems

  23. Building Complex Expressions • Algebras allow us to express sequences of operations in a natural way • Example • In arithmetic algebra: (x + 4)*(y - 3) • Relational algebra allows the same • Three notations, just as in arithmetic: • Sequences of assignment statements • Expressions with several operators • Expression trees

  24. Sequences of Assignments • Create temporary relation names • Renaming can be implied by giving relations a list of attributes • Example: R3 := R1 C R2 can be written: R4 := R1 x R2 (R4: temporary relation) R3 := sC(R4)

  25. Expressions with Several Operators • Example: the theta-join R3 := R1 JOINC R2 can be written: R3 := sC (R1 x R2) • Precedence of relational operators: • Unary operators --- select, project, rename --- have highest precedence, bind first • Then cartesian products and joins • Then intersection • Finally, union and set difference bind last • But you can always insert parentheses to force the order you desire

  26. Expression Trees • Leaves are operands • either variables standing for relations or particular constant relations • Interior nodes are operators, applied to their child or children

  27. UNION RENAMER(name) PROJECTname PROJECTbar SELECTaddr = “Tennessee St.” SELECT price<3 AND beer=“Bud” Expression Tree: Examples Given Bars(name, addr), Sells(bar, beer, price), find the names of all the bars that are either on Tennessee St. or sell Bud for less than $3 Bars Sells

  28. Relational Algebra as a Constraint Language • Two ways to express constraints. • If R is an expression of relational algebra, then R=∅ is a constraint that says ”The value of R must be empty,” or equivalently ”There are no tuples in the result of R”. • If R and S are expressions of relational algebra, then R ⊆ S is a constraint that says ”Every tuple in the result of R must also in the result of S.” Of course the result of S may contain additional tuples not produced by R.

  29. Referential Integrity Constraints • Problem: Assert that a value appearing in one context also appears in another, related context. If any value v as component in attribute A of some tuple in one relation R must appear in a particular component (say for attribute B) of some tuple of another relation S. • Expression: (R) ⊆ (S), or equivalently, (R) − (S) = ∅ • Example: • Movie(title, producerC#) • MovieExec(name, cert#) • Constraint: Every producerC# in Movie table must appear in the cert# of MovieExec. • Solution: (Movie) ⊆ (MovieExec)

  30. Key Constraints • Problem: Express algebraically the constraint that a certain attribute or set of attributes is a key for a relation. No two tuples of relation R agree on attribute A but not agree on attribute B and C. • Expression: =R2.A and (R1.B≠R2.B or R1.C≠R2.C (R1xR2)=∅ • Analysis: R1 and R2 are relations renamed from R. Find all the violate tuples using =R2.A and (R1.B≠R2.B or R1.C≠R2.C (R1xR2) And then assert the constraint by equating the result to ∅.

  31. Additional Constraints • Example: We want the values that may appear in an attribute to be restricted to a small enumerated set of values • Suppose we wish to specify that the only legal values for the gender attribute of MovieStar are ‘F’ and ‘M’. We can express this constrains with algebraically by;

  32. Division Operator (÷) • Attributes of B is proper subset of Attributes of A. • The relation returned by division operator will have attributes = (All attributes of A – All Attributes of B) • The relation returned by division operator will return those tuples from relation A which are associated to every B’s tuple.

  33. Examples • EMP(ssn,name,gender,salary) • Emp_proj(essn,pnumber, hours)

  34. EMP(ssn,name,gender,salary) • Emp_proj(essn,pnumber, hours)

  35. EMP(ssn,name,gender,salary)Emp_proj(essn,pnumber, hours)

  36. EMP(ssn,name,gender,salary)Emp_proj(essn,pnumber, hours

  37. Extended Algebra Operators • Sets v.s. Bags • Sets: {a,b,c}, {a,d,e,f}, { }, . . . • Bags: {a, a, b, c}, {b, b, b, b, b}, . • Multisets allow the sample tuple to appear more than once in a relation • Extensions for bags: • Duplicate elimination δ • Grouping and aggregation γ • Sorting τ

  38. Operators on Bags • Duplicate elimination δ • δ(R) = select distinct * from R • Grouping γ • γA,sum(B) = select A,sum(B) from R group by A • Sorting τ

  39. Complex RA Expressions

  40. Relational Algebra • RA = Dataflow Program • Several operations, plus strictly specified order • In RDBMS the dataflow graph is always a tree

More Related