Distributed Database Management Systems

Distributed DatabaseManagement Systems Lecture - 2

Father of Relational Model Edgar F. Codd (1923-2003) • PhD from U. of Michigan, Ann Arbor • Received Turing Award in 1981.

The Role of Relational Algebra in a DBMS

What is Relational Algebra? • It is a language in which we can ask questions (query) of a database. • Basic premise is that tables are sets (mathematical) and so our query language should manipulate sets with ease. • Traditional Set Operations: • union, intersection, Cartesian product, set difference • Extended Set Operations: • selection, projection, join

Supplier-Part Example

SELECTION • Selection returns a subset of the rows of a single table. • Syntax: select <table_name> where <condition> /* the <condition> must involve only columns from the indicated table */ alternatively σ<condition> (table_name) Find all suppliers from Boston. Select Supplier where Location = ‘Bos’σLocation = ‘Bos’ (Supplier)

SELECTION Exercise • Find the Cardholders from Modena. • Observations: • There is only one input table. • Both Cardholder and the answer table have the same schema (list of columns) • Every row in the answer has the value ‘Modena’ in the b_addr column. select Cardholder where b_addr = ‘Modena’ alternatively σb_addr = ‘Modena’ (Cardholder)

SELECTION same schema Answer All rows in the answer havethe value ‘Modena’ in theb_addr column

Produce table containing subset of rows of argument table satisfying condition condition (relation) Example: Person Hobby=‘stamps’(Person) Select Operator Id Name Address Hobby Id Name Address Hobby 1123 John 123 Main stamps 1123 John 123 Main coins 5556 Mary 7 Lake Dr hiking 9876 Bart 5 Pine St stamps 1123 John 123 Main stamps 9876 Bart 5 Pine St stamps

Operators: <, , , >, =,  Simple selection condition: <attribute> operator <constant> <attribute> operator <attribute> <condition> AND <condition> <condition> OR <condition> NOT <condition> Selection Condition

 Id>3000OR Hobby=‘hiking’ (Person)  Id>3000 AND Id <3999(Person)  NOT(Hobby=‘hiking’)(Person)  Hobby‘hiking’(Person) Selection Condition - Examples

PROJECTION • Projection returns a subset of the columns of a single table. • Syntax: project <table_name> over <list of columns> /* the columns in <list of columns> must come from the indicated table */ alternatively π<list of columns> (table_name) Find all supplier names Project Supplier over SnameπSname (Supplier)

PROJECTION Exercise • Find the addresses of all Cardholders. • Observations • There is only one input table. • The schema of the answer table is the list of columns • If there are many Cardholders living at the same address these are not duplicated in the answer table. project Cardholder over b_addr alternatively πb_addr (Cardholder)

PROJECTION schema of answer tableis the same as the list ofcolumns in the query Answer Duplicate ‘New Paltz’ valuesin the Cardholder table aredropped from the Answer table

CARTESIAN PRODUCT • The Cartesian product of two sets is a set of pairs of elements (tuples), one from each set. • If the original sets are already sets of tuples then the tuples in the Cartesian product are all that bigger. • Syntax: • As we have seen, Cartesian products are usually unrelated to a real-world thing. They normally contain some noise tuples. • However they may be useful as a first step. <table_name> x <table_name>

CARTESIAN PRODUCT 4 rows 5 rows 20 rows noise:13 rowsin total info:7 rowsin total

If R and S are two relations, RS is the set of all concatenated tuples <x,y>, where x is a tuple in R and y is a tuple in S R and S need not be union compatible. But R and S must have distinct attribute names. Why? RS is expensive to compute. But why? A B C D A B C D x1 x2 y1 y2 x1 x2 y1 y2 x3 x4 y3 y4 x1 x2 y3 y4 x3 x4 y1 y2 RS x3 x4 y3 y4 RS Cartesian Product

UNION: • Treat two tables as sets and perform a set union • Syntax: • Observations: • This operation is impossible unless both tables involved have the same schemas. Why? • Because rows from both tables must fit into a single answer table; hence they must “look alike”. • Because some rows might already belong to both tables Table1 UNION Table2alternativelyTable1 U Table2

UNION Example: Part1Suppliers = project (select Supplies where Pno = ‘p1’) over Sno Part2Suppliers = project (select Supplies where Pno = ‘p2’) over Sno Part1Suppliers UNION Part2Suppliers alternatively Part1Suppliers = πSno(σPno = ‘p1’ (Supplies) ) Part2Suppliers = πSno(σPno = ‘p2’ (Supplies) ) Answer = Part1SuppliersPart2Suppliers ∩

UNION Exercise: • Find the borrower numbers of all borrowers who have either borrowed or reserved a book (any book). Reservers = project Reserves over borrowerid Borrowers = project Borrows over borrowerid Answer = Borrowers union Reservers alternatively Reservers = πborrowerid (Reserves) Borrowers = πborrowerid(Borrows)Answer = Borrowers Reservers ∩ not duplicated

INTERSECTION: • Treat two tables as sets and perform a set intersection • Syntax: • Observations: • This operation is impossible unless both tables involved have the same schemas. Why? • Because rows from both tables must fit into a single answer table; hence they must “look alike”. Table1 INTERSECTION Table2alternativelyTable1 Table2 ∩

INTERSECTION Example: Part1Suppliers = project (select Supplies where Pno = ‘p1’) over Sno Part2Suppliers = project (select Supplies where Pno = ‘p2’) over Sno Part1Suppliers INTERSECT Part2Suppliers alternatively Part1Suppliers = πSno(σPno = ‘p1’(part2suppliers) ) (part1suppliers) Part2Suppliers = πSno(σPno = ‘p2’ (Supplies) ) Answer = Part1SuppliersPart2Suppliers ∩

INTERSECTION Exercise: • Find the borrower numbers of all borrowers who have borrowed and reserved a book. Reservers = project Reserves over borrowerid Borrowers = project Borrows over borrowerid Answer = Borrowers intersect Reservers alternatively Reservers = πborrowerid (Reserves) Borrowers = πborrowerid(Borrows)Answer = Borrowers Reservers ∩

SET DIFFERENCE: • Treat two tables as sets and perform a set intersection • Syntax: • Observations: • This operation is impossible unless both tables involved have the same schemas. Why? • Because it only makes sense to calculate the set difference if the two sets have elements in common. Table1 MINUS Table2alternativelyTable1 \ Table2

SET DIFFERENCE Example: Part1Suppliers = project (select Supplies where Pno = ‘p1’) over Sno Part2Suppliers = project (select Supplies where Pno = ‘p2’) over Sno Part1Suppliers MINUS Part2Suppliers alternatively Part1Suppliers = πSno(σPno = ‘p1’ (Supplies) ) Part2Suppliers = πSno(σPno = ‘p2’ (Supplies) ) Answer = Part1Suppliers\Part2Suppliers

SET DIFFERENCE Exercise: • Find the borrower numbers of all borrowers who have borrowed something and reserved nothing. Reservers = project Reserves over borrowerid Borrowers = project Borrows over borrowerid Answer = Borrowers minus Reservers alternatively Reservers = πborrowerid (Reserves) Borrowers = πborrowerid(Borrows)Answer = Borrowers \ Reservers

A derivative of RxS (Join is a derivative of Cartesian product) S= F (R x S) R  Theta Join • R and S are denoted as relations while the F or ΘPredicate here involves attributes from both tables

In the equivalence above, we should note that if F involves attributes of the two relations that are common to both of them, a projection is necessary to make sure that those attributes do not appear twice in the result

Theta Join • Let us consider that the EMP relation in above figure and add two more tuples as depicted in Figure a. Then Figure b shows the Θ-join of relations EMP and ASG over the join predicate EMP.ENO=ASG.ENO • The same result could have been obtained as • Notice that the result does not have tuples E9 and E10 since these employees have not yet been assigned to a project

Is also an equi-join over two relations S R A Natural Join • Attribute name need not to be same

Subset of tuples of R that participate in join of R with S A(RA S) B Semi join S = R F

The advantage of semijoin is that it decreases the number of tuples that need to be handled to form the join. • In centralized database systems, this is important because it usually results in a decreased number of secondary storage accesses by making better use of the memory. • It is even more important in distributed databases since it usually reduces the amount of data that needs to be transmitted between sites in order to evaluate a query

Relational Calculus

Relational calculus • Relational calculus languages fall into two groups: tuple relational calculus and domain relational calculus • The difference between the two is in terms of the primitive variable used in specifying the queries • Semantics are given to formulas by interpreting them as assertions on the database • A relational database can be viewed as a collection of tuples or a collection of domains

Tuple relational calculus interprets a variable in a formula as a tuple of a relation, whereas domain relational calculus interprets a variable as the value of a domain

Tuple relational calculus • The primitive variable used in tuple relational calculus is a tuple variable which specifies a tuple of a relation • We can also define, it ranges over the tuples of a relation • Tuple calculus is the original relational calculus developed by Codd [1970] • In tuple relational calculus queries are specified as {t|F(t)}, where t is a tuple variable and F is a well-formed formula.

The atomic formulas are of two forms 1. Tuple-variable membership expressions. If t is a tuple variable ranging over the tuples of relation R (predicate symbol), the expression “tuple t belongs to relation R” is an atomic formula, which is usually specified as R.tor R(t). Atomic Formulas

Atomic Formulas 2. Conditions 1. s[A]θt [B], where s and t are tuple variables and A and B are components of s and t, respectively. θis one of the arithmetic comparison operators <, >, =, ≠, ≤ and ≥. This condition specifies that component A of s stands in relation θto the B component of t: • e.g, s[SAL] > t[SAL] 2. s[A] θ c, where s, A, and θare as defined above and c is a constant. • e.g, s[ENAME] = “Smith”.

Language based on Tuple-oriented Calculs……? SQL

Select EMP.eName, DEP.dName from EMP, DEP where EMP.dNo = DEP.dNo Example

The domain relational calculus was first proposed by Lacroix and Pirotte [1977]. Domain variable ranges over values in a domain and specifies a tuple A query in DRC x1,x2,….xn|F(x1,….xn) where F is a wff(well formed formula) and X’s are free variables Implementation: QBE Domain Relational Calculus

Domain Relational Calculus • The success of domain relational calculus languages is due mainly to QBE [Zloof, 1977] • QBE is a visual application of domain calculus • QBE, designed only for interactive use from a visual terminal, is user friendly

Thanks

Distributed Database Management Systems